TY - JOUR
T1 - Assessment of population structure and its effects on genome-wide association studies
AU - Xu, Hongyan
AU - George, Varghese
N1 - Funding Information:
We thank the two anonymous reviewers for their constructive comments which helped us improve the manuscript. This work was partially supported by the grant NS057506 from the National Institutes of Health and the Scientist Training Grant from the Medical College of Georgia to HX.
PY - 2009/1
Y1 - 2009/1
N2 - Large-scale genome-wide association studies are promising for unraveling the genetic basis of complex diseases. However, population structure is a potential problem, the effects of which on genetic association studies are controversial. Quantification of the effects of population structure on large scale genetic association studies is needed for valid analysis of data and correct interpretation of results. In this study, we performed extensive coalescent-based simulation study with varying levels of population structure to investigate the effects of population structure on large-scale genetic association studies. The effects of population structure are measured by the multiplicative changes of the probability of Type I error, which is then correlated with the levels of population structure. It is found that at each nominal level of association tests, there is a positive relationship between the level of population structure and its effects, which could be summarized well with a regression function. It is also found that at a specific level of population structure, its effect on association study increases drastically as the significance level of the test decreases. The Type I error is inflated by an amount approximately equal to Wright's FST, a measure that is used to quantify the magnitude of population structure. Therefore, in genome-wide association studies, the effects of population structure cannot be safely ignored, and must be accounted for with proper methods. This study provides quantitative guidelines to account for the effects of population structure on genome-wide association studies in admixed populations.
AB - Large-scale genome-wide association studies are promising for unraveling the genetic basis of complex diseases. However, population structure is a potential problem, the effects of which on genetic association studies are controversial. Quantification of the effects of population structure on large scale genetic association studies is needed for valid analysis of data and correct interpretation of results. In this study, we performed extensive coalescent-based simulation study with varying levels of population structure to investigate the effects of population structure on large-scale genetic association studies. The effects of population structure are measured by the multiplicative changes of the probability of Type I error, which is then correlated with the levels of population structure. It is found that at each nominal level of association tests, there is a positive relationship between the level of population structure and its effects, which could be summarized well with a regression function. It is also found that at a specific level of population structure, its effect on association study increases drastically as the significance level of the test decreases. The Type I error is inflated by an amount approximately equal to Wright's FST, a measure that is used to quantify the magnitude of population structure. Therefore, in genome-wide association studies, the effects of population structure cannot be safely ignored, and must be accounted for with proper methods. This study provides quantitative guidelines to account for the effects of population structure on genome-wide association studies in admixed populations.
KW - Complex diseases
KW - False positives
KW - Genetic variation
KW - Genome-wide association
KW - Heterozygosity
KW - Population structure
KW - SNP
UR - http://www.scopus.com/inward/record.url?scp=70249143972&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70249143972&partnerID=8YFLogxK
U2 - 10.1080/03610920902947188
DO - 10.1080/03610920902947188
M3 - Article
AN - SCOPUS:70249143972
SN - 0361-0926
VL - 38
SP - 2843
EP - 2855
JO - Communications in Statistics - Theory and Methods
JF - Communications in Statistics - Theory and Methods
IS - 16-17
ER -