Accurate and efficient estimation of small P-values with the cross-entropy method: Applications in genomic data analysis

Yang Shi, Mengqiao Wang, Weiping Shi, Ji Hyun Lee, Huining Kang, Hui Jiang

Research output: Contribution to journalArticle

Abstract

Motivation: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. Results: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.

Original languageEnglish (US)
Article numberbty1005
Pages (from-to)2441-2448
Number of pages8
JournalBioinformatics
Volume35
Issue number14
DOIs
StatePublished - Jul 15 2019

Fingerprint

Cross-entropy Method
Efficient Estimation
Entropy
Genomics
Data analysis
Statistics
Markov processes
Test Statistic
Distribution functions
Sampling
Multiple Tests
Markov Chains
Monte Carlo Sampling
Evaluate
Cumulative distribution function
Hypothesis Test
Statistical Significance
Markov Chain Monte Carlo
Ranking
Adjustment

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Accurate and efficient estimation of small P-values with the cross-entropy method : Applications in genomic data analysis. / Shi, Yang; Wang, Mengqiao; Shi, Weiping; Lee, Ji Hyun; Kang, Huining; Jiang, Hui.

In: Bioinformatics, Vol. 35, No. 14, bty1005, 15.07.2019, p. 2441-2448.

Research output: Contribution to journalArticle

Shi, Yang ; Wang, Mengqiao ; Shi, Weiping ; Lee, Ji Hyun ; Kang, Huining ; Jiang, Hui. / Accurate and efficient estimation of small P-values with the cross-entropy method : Applications in genomic data analysis. In: Bioinformatics. 2019 ; Vol. 35, No. 14. pp. 2441-2448.
@article{3cdbae753b2645bab1cfafe1d60bd432,
title = "Accurate and efficient estimation of small P-values with the cross-entropy method: Applications in genomic data analysis",
abstract = "Motivation: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. Results: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.",
author = "Yang Shi and Mengqiao Wang and Weiping Shi and Lee, {Ji Hyun} and Huining Kang and Hui Jiang",
year = "2019",
month = "7",
day = "15",
doi = "10.1093/bioinformatics/bty1005",
language = "English (US)",
volume = "35",
pages = "2441--2448",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "14",

}

TY - JOUR

T1 - Accurate and efficient estimation of small P-values with the cross-entropy method

T2 - Applications in genomic data analysis

AU - Shi, Yang

AU - Wang, Mengqiao

AU - Shi, Weiping

AU - Lee, Ji Hyun

AU - Kang, Huining

AU - Jiang, Hui

PY - 2019/7/15

Y1 - 2019/7/15

N2 - Motivation: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. Results: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.

AB - Motivation: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. Results: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.

UR - http://www.scopus.com/inward/record.url?scp=85068936231&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068936231&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty1005

DO - 10.1093/bioinformatics/bty1005

M3 - Article

AN - SCOPUS:85068936231

VL - 35

SP - 2441

EP - 2448

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 14

M1 - bty1005

ER -