A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups

Yinglei Lai, Bao Ling Adam, Robert Podolsky, Jin-Xiong She

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Motivation: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. Methods: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distributionquantile-based method is used for data transformation. Results: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms.

Original languageEnglish (US)
Pages (from-to)1243-1250
Number of pages8
JournalBioinformatics
Volume23
Issue number10
DOIs
StatePublished - May 15 2007

Fingerprint

Concordance
Microarrays
Mixture Model
Parameter estimation
Mass spectrometry
Statistical methods
Nuclear magnetic resonance
Experiment
Costs
Experiments
Normal Mixture
Data Transformation
Datasets
Nuclear Magnetic Resonance
Expectation-maximization Algorithm
Sample Size
Mass Spectrometry
Likelihood Ratio Test
Microarray
Statistical method

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups. / Lai, Yinglei; Adam, Bao Ling; Podolsky, Robert; She, Jin-Xiong.

In: Bioinformatics, Vol. 23, No. 10, 15.05.2007, p. 1243-1250.

Research output: Contribution to journalArticle

@article{dd95151babb84c2aaeda9c2a735820f7,
title = "A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups",
abstract = "Motivation: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. Methods: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distributionquantile-based method is used for data transformation. Results: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms.",
author = "Yinglei Lai and Adam, {Bao Ling} and Robert Podolsky and Jin-Xiong She",
year = "2007",
month = "5",
day = "15",
doi = "10.1093/bioinformatics/btm103",
language = "English (US)",
volume = "23",
pages = "1243--1250",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "10",

}

TY - JOUR

T1 - A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups

AU - Lai, Yinglei

AU - Adam, Bao Ling

AU - Podolsky, Robert

AU - She, Jin-Xiong

PY - 2007/5/15

Y1 - 2007/5/15

N2 - Motivation: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. Methods: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distributionquantile-based method is used for data transformation. Results: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms.

AB - Motivation: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. Methods: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distributionquantile-based method is used for data transformation. Results: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms.

UR - http://www.scopus.com/inward/record.url?scp=34447345702&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34447345702&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm103

DO - 10.1093/bioinformatics/btm103

M3 - Article

C2 - 17384018

AN - SCOPUS:34447345702

VL - 23

SP - 1243

EP - 1250

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 10

ER -