A general method for accurate estimation of false discovery rates in identification of differentially expressed genes

Yuan De Tan; Hongyan Xu

doi:10.1093/bioinformatics/btu124

A general method for accurate estimation of false discovery rates in identification of differentially expressed genes

Yuan De Tan, Hongyan Xu

Biostats & Data Science

Research output: Contribution to journal › Article › peer-review

21 Scopus citations

Abstract

The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.

Original language	English (US)
Pages (from-to)	2018-2025
Number of pages	8
Journal	Bioinformatics
Volume	30
Issue number	14
DOIs	https://doi.org/10.1093/bioinformatics/btu124
State	Published - Jul 15 2014

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btu124

Cite this

@article{2354d2ee6e0846cdad961a5beca19c8c,

title = "A general method for accurate estimation of false discovery rates in identification of differentially expressed genes",

abstract = "The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.",

author = "Tan, {Yuan De} and Hongyan Xu",

year = "2014",

month = jul,

day = "15",

doi = "10.1093/bioinformatics/btu124",

language = "English (US)",

volume = "30",

pages = "2018--2025",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "14",

}

TY - JOUR

T1 - A general method for accurate estimation of false discovery rates in identification of differentially expressed genes

AU - Tan, Yuan De

AU - Xu, Hongyan

PY - 2014/7/15

Y1 - 2014/7/15

N2 - The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.

AB - The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.

UR - http://www.scopus.com/inward/record.url?scp=84904013348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904013348&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu124

DO - 10.1093/bioinformatics/btu124

M3 - Article

C2 - 24632499

AN - SCOPUS:84904013348

SN - 1367-4803

VL - 30

SP - 2018

EP - 2025

JO - Bioinformatics

JF - Bioinformatics

IS - 14

ER -

A general method for accurate estimation of false discovery rates in identification of differentially expressed genes

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this