Russian Journal of Genetics, volume 60, issue 11, pages 1563-1569
Imputed Genotypes Versus Sequenced Genotypes for the Association Analysis of Rare Variants
I V Zorkoltseva
1
,
T I Axenovich
1
,
Y A Tsepilov
1
1
National Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
Publication type: Journal Article
Publication date: 2024-11-25
Journal:
Russian Journal of Genetics
scimago Q4
wos Q4
SJR: 0.185
CiteScore: 1.0
Impact factor: 0.6
ISSN: 10227954, 16083369
Abstract
Exome-sequenced genotypes provide the most informative material for the analysis of rare genetic variants. However, their widespread use is currently limited by the relatively small number of sequenced samples compared to imputed samples and the lack of free access to personal genotypes. This latter drawback of sequenced data is not critical for imputed data that combine genotypes collected on microarray platforms and missing genotypes reconstructed using reference haplotype panels. The results of genome-wide association studies (GWAS) of imputed genotypes are freely available for thousands of traits and millions of genetic variants. These data can be used for gene-based association analysis, which is the primary tool for studying rare variants. However, imputed genotypes have disadvantages compared to sequenced genotypes. The number and quality of imputed genotypes are lower than those of the sequenced genotypes. We aimed to test how these disadvantages affect the results of rare variant analysis. We considered 188 236 participants in the UK Biobank project who had both imputed and sequenced genotypes. The results of the single-variant association analysis showed a high quality of imputation. Inflation factors for 47 traits were around 1, and p-values were very close to those obtained for sequenced genotypes (r2 = 0.994). We performed the gene-based association analysis using imputed and sequenced genotypes. The number of association signals identified using imputed data was approximately half that for sequenced data. It is expected that if the sample of imputed genotypes is twice as large as the sample of sequenced data, the power of the imputed data analysis should be equivalent to that of the sequenced data for the protein-coding variants.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.