A general method for accurate estimation of false discovery rates in identification of differentially expressed genes

Yuan De Tan, Hongyan Xu

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.

Original languageEnglish (US)
Pages (from-to)2018-2025
Number of pages8
JournalBioinformatics
Volume30
Issue number14
DOIs
StatePublished - Jul 15 2014

Fingerprint

Genes
Gene
Testing
Multiple Testing
Nucleotides
Polymorphism
Bonferroni
Throughput
False
Familywise Error Rate
Multiple Tests
Single nucleotide Polymorphism
Proteomics
High Throughput
Genomics
Extremes
Choose
Valid
Single Nucleotide Polymorphism
Scenarios

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

A general method for accurate estimation of false discovery rates in identification of differentially expressed genes. / Tan, Yuan De; Xu, Hongyan.

In: Bioinformatics, Vol. 30, No. 14, 15.07.2014, p. 2018-2025.

Research output: Contribution to journalArticle

@article{2354d2ee6e0846cdad961a5beca19c8c,
title = "A general method for accurate estimation of false discovery rates in identification of differentially expressed genes",
abstract = "The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.",
author = "Tan, {Yuan De} and Hongyan Xu",
year = "2014",
month = "7",
day = "15",
doi = "10.1093/bioinformatics/btu124",
language = "English (US)",
volume = "30",
pages = "2018--2025",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "14",

}

TY - JOUR

T1 - A general method for accurate estimation of false discovery rates in identification of differentially expressed genes

AU - Tan, Yuan De

AU - Xu, Hongyan

PY - 2014/7/15

Y1 - 2014/7/15

N2 - The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.

AB - The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets.

UR - http://www.scopus.com/inward/record.url?scp=84904013348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904013348&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu124

DO - 10.1093/bioinformatics/btu124

M3 - Article

C2 - 24632499

AN - SCOPUS:84904013348

VL - 30

SP - 2018

EP - 2025

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 14

ER -