Accurate and efficient estimation of small P-values with the cross-entropy method: Applications in genomic data analysis

Yang Shi; Mengqiao Wang; Weiping Shi; Ji Hyun Lee; Huining Kang; Hui Jiang

doi:10.1093/bioinformatics/bty1005

Accurate and efficient estimation of small P-values with the cross-entropy method: Applications in genomic data analysis

Yang Shi, Mengqiao Wang, Weiping Shi, Ji Hyun Lee, Huining Kang, Hui Jiang

Biostats & Data Science

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

Motivation: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. Results: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10^-6 to 10^-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.

Original language	English (US)
Article number	bty1005
Pages (from-to)	2441-2448
Number of pages	8
Journal	Bioinformatics
Volume	35
Issue number	14
DOIs	https://doi.org/10.1093/bioinformatics/bty1005
State	Published - Jul 15 2019

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/bty1005

Cite this

@article{3cdbae753b2645bab1cfafe1d60bd432,

title = "Accurate and efficient estimation of small P-values with the cross-entropy method: Applications in genomic data analysis",

abstract = "Motivation: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. Results: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.",

author = "Yang Shi and Mengqiao Wang and Weiping Shi and Lee, {Ji Hyun} and Huining Kang and Hui Jiang",

year = "2019",

month = jul,

day = "15",

doi = "10.1093/bioinformatics/bty1005",

language = "English (US)",

volume = "35",

pages = "2441--2448",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "14",

}

TY - JOUR

T1 - Accurate and efficient estimation of small P-values with the cross-entropy method

T2 - Applications in genomic data analysis

AU - Shi, Yang

AU - Wang, Mengqiao

AU - Shi, Weiping

AU - Lee, Ji Hyun

AU - Kang, Huining

AU - Jiang, Hui

PY - 2019/7/15

Y1 - 2019/7/15

N2 - Motivation: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. Results: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.

AB - Motivation: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. Results: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.

UR - http://www.scopus.com/inward/record.url?scp=85068936231&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068936231&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty1005

DO - 10.1093/bioinformatics/bty1005

M3 - Article

C2 - 30521030

AN - SCOPUS:85068936231

SN - 1367-4803

VL - 35

SP - 2441

EP - 2448

JO - Bioinformatics

JF - Bioinformatics

IS - 14

M1 - bty1005

ER -

Accurate and efficient estimation of small P-values with the cross-entropy method: Applications in genomic data analysis

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this