Assessment of population structure and its effects on genome-wide association studies

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Large-scale genome-wide association studies are promising for unraveling the genetic basis of complex diseases. However, population structure is a potential problem, the effects of which on genetic association studies are controversial. Quantification of the effects of population structure on large scale genetic association studies is needed for valid analysis of data and correct interpretation of results. In this study, we performed extensive coalescent-based simulation study with varying levels of population structure to investigate the effects of population structure on large-scale genetic association studies. The effects of population structure are measured by the multiplicative changes of the probability of Type I error, which is then correlated with the levels of population structure. It is found that at each nominal level of association tests, there is a positive relationship between the level of population structure and its effects, which could be summarized well with a regression function. It is also found that at a specific level of population structure, its effect on association study increases drastically as the significance level of the test decreases. The Type I error is inflated by an amount approximately equal to Wright's FST, a measure that is used to quantify the magnitude of population structure. Therefore, in genome-wide association studies, the effects of population structure cannot be safely ignored, and must be accounted for with proper methods. This study provides quantitative guidelines to account for the effects of population structure on genome-wide association studies in admixed populations.

Original languageEnglish (US)
Pages (from-to)2843-2855
Number of pages13
JournalCommunications in Statistics - Theory and Methods
Volume38
Issue number16-17
DOIs
StatePublished - Jan 1 2009

Fingerprint

Population Structure
Genome
Genetic Association
Type I error
Approximately equal
Significance level
Potential Problems
Fuzzy Set Theory
Regression Function
Quantification
Categorical or nominal
Multiplicative
Quantify
Simulation Study

Keywords

  • Complex diseases
  • False positives
  • Genetic variation
  • Genome-wide association
  • Heterozygosity
  • Population structure
  • SNP

ASJC Scopus subject areas

  • Statistics and Probability

Cite this

@article{59789c5f22e04f66b288725a53b16142,
title = "Assessment of population structure and its effects on genome-wide association studies",
abstract = "Large-scale genome-wide association studies are promising for unraveling the genetic basis of complex diseases. However, population structure is a potential problem, the effects of which on genetic association studies are controversial. Quantification of the effects of population structure on large scale genetic association studies is needed for valid analysis of data and correct interpretation of results. In this study, we performed extensive coalescent-based simulation study with varying levels of population structure to investigate the effects of population structure on large-scale genetic association studies. The effects of population structure are measured by the multiplicative changes of the probability of Type I error, which is then correlated with the levels of population structure. It is found that at each nominal level of association tests, there is a positive relationship between the level of population structure and its effects, which could be summarized well with a regression function. It is also found that at a specific level of population structure, its effect on association study increases drastically as the significance level of the test decreases. The Type I error is inflated by an amount approximately equal to Wright's FST, a measure that is used to quantify the magnitude of population structure. Therefore, in genome-wide association studies, the effects of population structure cannot be safely ignored, and must be accounted for with proper methods. This study provides quantitative guidelines to account for the effects of population structure on genome-wide association studies in admixed populations.",
keywords = "Complex diseases, False positives, Genetic variation, Genome-wide association, Heterozygosity, Population structure, SNP",
author = "Hongyan Xu and Varghese George",
year = "2009",
month = "1",
day = "1",
doi = "10.1080/03610920902947188",
language = "English (US)",
volume = "38",
pages = "2843--2855",
journal = "Communications in Statistics - Theory and Methods",
issn = "0361-0926",
publisher = "Taylor and Francis Ltd.",
number = "16-17",

}

TY - JOUR

T1 - Assessment of population structure and its effects on genome-wide association studies

AU - Xu, Hongyan

AU - George, Varghese

PY - 2009/1/1

Y1 - 2009/1/1

N2 - Large-scale genome-wide association studies are promising for unraveling the genetic basis of complex diseases. However, population structure is a potential problem, the effects of which on genetic association studies are controversial. Quantification of the effects of population structure on large scale genetic association studies is needed for valid analysis of data and correct interpretation of results. In this study, we performed extensive coalescent-based simulation study with varying levels of population structure to investigate the effects of population structure on large-scale genetic association studies. The effects of population structure are measured by the multiplicative changes of the probability of Type I error, which is then correlated with the levels of population structure. It is found that at each nominal level of association tests, there is a positive relationship between the level of population structure and its effects, which could be summarized well with a regression function. It is also found that at a specific level of population structure, its effect on association study increases drastically as the significance level of the test decreases. The Type I error is inflated by an amount approximately equal to Wright's FST, a measure that is used to quantify the magnitude of population structure. Therefore, in genome-wide association studies, the effects of population structure cannot be safely ignored, and must be accounted for with proper methods. This study provides quantitative guidelines to account for the effects of population structure on genome-wide association studies in admixed populations.

AB - Large-scale genome-wide association studies are promising for unraveling the genetic basis of complex diseases. However, population structure is a potential problem, the effects of which on genetic association studies are controversial. Quantification of the effects of population structure on large scale genetic association studies is needed for valid analysis of data and correct interpretation of results. In this study, we performed extensive coalescent-based simulation study with varying levels of population structure to investigate the effects of population structure on large-scale genetic association studies. The effects of population structure are measured by the multiplicative changes of the probability of Type I error, which is then correlated with the levels of population structure. It is found that at each nominal level of association tests, there is a positive relationship between the level of population structure and its effects, which could be summarized well with a regression function. It is also found that at a specific level of population structure, its effect on association study increases drastically as the significance level of the test decreases. The Type I error is inflated by an amount approximately equal to Wright's FST, a measure that is used to quantify the magnitude of population structure. Therefore, in genome-wide association studies, the effects of population structure cannot be safely ignored, and must be accounted for with proper methods. This study provides quantitative guidelines to account for the effects of population structure on genome-wide association studies in admixed populations.

KW - Complex diseases

KW - False positives

KW - Genetic variation

KW - Genome-wide association

KW - Heterozygosity

KW - Population structure

KW - SNP

UR - http://www.scopus.com/inward/record.url?scp=70249143972&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70249143972&partnerID=8YFLogxK

U2 - 10.1080/03610920902947188

DO - 10.1080/03610920902947188

M3 - Article

VL - 38

SP - 2843

EP - 2855

JO - Communications in Statistics - Theory and Methods

JF - Communications in Statistics - Theory and Methods

SN - 0361-0926

IS - 16-17

ER -