A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups

Yinglei Lai; Bao Ling Adam; Robert Podolsky; Jin Xiong She

doi:10.1093/bioinformatics/btm103

A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups

Yinglei Lai, Bao Ling Adam, Robert Podolsky, Jin Xiong She

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

Motivation: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. Methods: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distributionquantile-based method is used for data transformation. Results: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms.

Original language	English (US)
Pages (from-to)	1243-1250
Number of pages	8
Journal	Bioinformatics
Volume	23
Issue number	10
DOIs	https://doi.org/10.1093/bioinformatics/btm103
State	Published - May 15 2007

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btm103

Cite this

@article{dd95151babb84c2aaeda9c2a735820f7,

title = "A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups",

abstract = "Motivation: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. Methods: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distributionquantile-based method is used for data transformation. Results: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms.",

author = "Yinglei Lai and Adam, {Bao Ling} and Robert Podolsky and She, {Jin Xiong}",

note = "Funding Information: We thank the associate editor and two anonymous reviewers for their valuable comments. This work was partially supported by NIH grants DK-75004 (Y.L.), HD-37800 (J-X.S.) and HD-50196 (J-X.S.).",

year = "2007",

month = may,

day = "15",

doi = "10.1093/bioinformatics/btm103",

language = "English (US)",

volume = "23",

pages = "1243--1250",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "10",

}

TY - JOUR

T1 - A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups

AU - Lai, Yinglei

AU - Adam, Bao Ling

AU - Podolsky, Robert

AU - She, Jin Xiong

N1 - Funding Information: We thank the associate editor and two anonymous reviewers for their valuable comments. This work was partially supported by NIH grants DK-75004 (Y.L.), HD-37800 (J-X.S.) and HD-50196 (J-X.S.).

PY - 2007/5/15

Y1 - 2007/5/15

N2 - Motivation: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. Methods: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distributionquantile-based method is used for data transformation. Results: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms.

AB - Motivation: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. Methods: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distributionquantile-based method is used for data transformation. Results: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms.

UR - http://www.scopus.com/inward/record.url?scp=34447345702&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34447345702&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm103

DO - 10.1093/bioinformatics/btm103

M3 - Article

C2 - 17384018

AN - SCOPUS:34447345702

SN - 1367-4803

VL - 23

SP - 1243

EP - 1250

JO - Bioinformatics

JF - Bioinformatics

IS - 10

ER -

A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this