Ranking analysis of F-statistics for microarray data

Yuan De Tan, Myriam Fornage, Hongyan Xu

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Background: Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data. Results: We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7% FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups. Conclusion: Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small.

Original languageEnglish (US)
Article number142
JournalBMC Bioinformatics
Volume9
DOIs
StatePublished - Mar 6 2008

Fingerprint

F-statistics
Microarrays
Microarray Data
Ranking
Genes
Statistics
Gene
Microarray
Physiological Phenomena
Two-sample Test
F Test
Type I Error Rate
Differential Expression
t-test
Expressed Sequence Tags
Null Distribution
Neurology
Microarray Analysis
Sample Size
Functional groups

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Ranking analysis of F-statistics for microarray data. / Tan, Yuan De; Fornage, Myriam; Xu, Hongyan.

In: BMC Bioinformatics, Vol. 9, 142, 06.03.2008.

Research output: Contribution to journalArticle

Tan, Yuan De ; Fornage, Myriam ; Xu, Hongyan. / Ranking analysis of F-statistics for microarray data. In: BMC Bioinformatics. 2008 ; Vol. 9.
@article{6b230ddd413141dda49685d803ca1133,
title = "Ranking analysis of F-statistics for microarray data",
abstract = "Background: Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data. Results: We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7{\%} FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups. Conclusion: Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small.",
author = "Tan, {Yuan De} and Myriam Fornage and Hongyan Xu",
year = "2008",
month = "3",
day = "6",
doi = "10.1186/1471-2105-9-142",
language = "English (US)",
volume = "9",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Ranking analysis of F-statistics for microarray data

AU - Tan, Yuan De

AU - Fornage, Myriam

AU - Xu, Hongyan

PY - 2008/3/6

Y1 - 2008/3/6

N2 - Background: Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data. Results: We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7% FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups. Conclusion: Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small.

AB - Background: Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data. Results: We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7% FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups. Conclusion: Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small.

UR - http://www.scopus.com/inward/record.url?scp=42149166602&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=42149166602&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-9-142

DO - 10.1186/1471-2105-9-142

M3 - Article

VL - 9

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 142

ER -