January 10, 2024

Public Platform with 39,472 Exome Control Samples Enables Association Studies without Genotype Sharing

Mykyta Artomov, Alexander A. Loboda, Maxim N. Artyomov & Mark J. Daly

Genetic association analyses rely on quality and size of case and control samples. However, maintaining a shared genetic background poses challenges. In addition to that, public databases for controls encounter size and accessibility issues. Also, solutions like UNICORN face implementation challenges, and computational approaches like TRAPD, CoCoRV, and Summix have limitations, particularly for ancestry matching. Established frameworks like AllofUS and UK Biobank provide secure environment but limit external use. SVD-based Control Repository (SCoRe) tried to address these issues. It utilizes singular value decomposition for background-matched controls without sharing individual-level data. With 39,472 controls, this online portal facilitates global research, improving statistical power in association studies while adhering to data sharing regulations.


Initially, Singular Value Decomposition (SVD) was applied to the control pool’s genotype matrix to establish a basis for case-control matching. The resulting basis can be shared locally, allowing the projection of case genotypes. Subsequently, SVD was used on case coordinates in the control basis, providing information for covariance matrix computation. This process enabled the creation of SCoRe.


This approach is validated through cross-validation in a random set of cases, demonstrating robust control selection for Finnish ancestry. The investigation also considers the impact of exome sequencing platforms, revealing platform insensitivity in control selection. The study introduces a call rate-based matching method for samples from specific sequencing platforms, ensuring platform-matched control sets without individual genotype sharing. The framework’s applicability is illustrated through case studies, including an analysis of early-onset breast cancer patients and African-American populations, demonstrating successful control selection for both common and rare-variant association studies.


In summary, the proposed tool, SCoRe, offers a pool of 39,472 exome sequences and facilitates the prompt selection of matched control groups without genotype sharing, providing allele frequency statistics for association trials. While limitations exist, such as reliance on a prespecified set of variants, SCoRe’s compatibility with other methods addressing technical differences enhances its utility. This contributes to the progress of genetic association analyses, especially in situations where direct data sharing is either not allowed or poses significant challenges in international settings.


Learn more? Take a look at our Geneyx Courses.





Selected Videos

Geneyx Analysis Version 5.12 Release


Schedule Demo

Contact us to set a live demo

Contact Us

Whether you have general questions about our solutions or would like to schedule a demo or to suggest collaboration – our team is on hand for you.