RNA-Seq Accurately Identifies Cancer Biomarker Signatures to Distinguish Tissue of Origin

Iris H. Wei, Yang Shi, Hui Jiang, Chandan Kumar-Sinha, Arul M. Chinnaiyan

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Metastatic cancer of unknown primary (CUP) accounts for up to 5% of all new cancer cases, with a 5-year survival rate of only 10%. Accurate identification of tissue of origin would allow for directed, personalized therapies to improve clinical outcomes. Our objective was to use transcriptome sequencing (RNA-Seq) to identify lineage-specific biomarker signatures for the cancer types that most commonly metastasize as CUP (colorectum, kidney, liver, lung, ovary, pancreas, prostate, and stomach). RNA-Seq data of 17,471 transcripts from a total of 3,244 cancer samples across 26 different tissue types were compiled from in-house sequencing data and publically available International Cancer Genome Consortium and The Cancer Genome Atlas datasets. Robust cancer biomarker signatures were extracted using a 10-fold cross-validation method of log transformation, quantile normalization, transcript ranking by area under the receiver operating characteristic curve, and stepwise logistic regression. The entire algorithm was then repeated with a new set of randomly generated training and test sets, yielding highly concordant biomarker signatures. External validation of the cancer-specific signatures yielded high sensitivity (92.0% ± 3.15%; mean ± standard deviation) and specificity (97.7% ± 2.99%) for each cancer biomarker signature. The overall performance of this RNA-Seq biomarker-generating algorithm yielded an accuracy of 90.5%. In conclusion, we demonstrate a computational model for producing highly sensitive and specific cancer biomarker signatures from RNA-Seq data, generating signatures for the top eight cancer types responsible for CUP to accurately identify tumor origin.

Original languageEnglish (US)
Pages (from-to)918-927
Number of pages10
JournalNeoplasia
Volume16
Issue number11
DOIs
StatePublished - Jan 1 2014
Externally publishedYes

Fingerprint

Tumor Biomarkers
RNA
Neoplasms
Biomarkers
Genome
RNA Sequence Analysis
Atlases
Transcriptome
ROC Curve
Prostate
Pancreas
Ovary
Stomach
Logistic Models
Kidney
Lung
Liver

ASJC Scopus subject areas

  • Cancer Research
  • Medicine(all)

Cite this

RNA-Seq Accurately Identifies Cancer Biomarker Signatures to Distinguish Tissue of Origin. / Wei, Iris H.; Shi, Yang; Jiang, Hui; Kumar-Sinha, Chandan; Chinnaiyan, Arul M.

In: Neoplasia, Vol. 16, No. 11, 01.01.2014, p. 918-927.

Research output: Contribution to journalArticle

Wei, Iris H. ; Shi, Yang ; Jiang, Hui ; Kumar-Sinha, Chandan ; Chinnaiyan, Arul M. / RNA-Seq Accurately Identifies Cancer Biomarker Signatures to Distinguish Tissue of Origin. In: Neoplasia. 2014 ; Vol. 16, No. 11. pp. 918-927.
@article{f3c1462b7bde4c38ba2b3e663b719385,
title = "RNA-Seq Accurately Identifies Cancer Biomarker Signatures to Distinguish Tissue of Origin",
abstract = "Metastatic cancer of unknown primary (CUP) accounts for up to 5{\%} of all new cancer cases, with a 5-year survival rate of only 10{\%}. Accurate identification of tissue of origin would allow for directed, personalized therapies to improve clinical outcomes. Our objective was to use transcriptome sequencing (RNA-Seq) to identify lineage-specific biomarker signatures for the cancer types that most commonly metastasize as CUP (colorectum, kidney, liver, lung, ovary, pancreas, prostate, and stomach). RNA-Seq data of 17,471 transcripts from a total of 3,244 cancer samples across 26 different tissue types were compiled from in-house sequencing data and publically available International Cancer Genome Consortium and The Cancer Genome Atlas datasets. Robust cancer biomarker signatures were extracted using a 10-fold cross-validation method of log transformation, quantile normalization, transcript ranking by area under the receiver operating characteristic curve, and stepwise logistic regression. The entire algorithm was then repeated with a new set of randomly generated training and test sets, yielding highly concordant biomarker signatures. External validation of the cancer-specific signatures yielded high sensitivity (92.0{\%} ± 3.15{\%}; mean ± standard deviation) and specificity (97.7{\%} ± 2.99{\%}) for each cancer biomarker signature. The overall performance of this RNA-Seq biomarker-generating algorithm yielded an accuracy of 90.5{\%}. In conclusion, we demonstrate a computational model for producing highly sensitive and specific cancer biomarker signatures from RNA-Seq data, generating signatures for the top eight cancer types responsible for CUP to accurately identify tumor origin.",
author = "Wei, {Iris H.} and Yang Shi and Hui Jiang and Chandan Kumar-Sinha and Chinnaiyan, {Arul M.}",
year = "2014",
month = "1",
day = "1",
doi = "10.1016/j.neo.2014.09.007",
language = "English (US)",
volume = "16",
pages = "918--927",
journal = "Neoplasia (United States)",
issn = "1522-8002",
publisher = "Elsevier Inc.",
number = "11",

}

TY - JOUR

T1 - RNA-Seq Accurately Identifies Cancer Biomarker Signatures to Distinguish Tissue of Origin

AU - Wei, Iris H.

AU - Shi, Yang

AU - Jiang, Hui

AU - Kumar-Sinha, Chandan

AU - Chinnaiyan, Arul M.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Metastatic cancer of unknown primary (CUP) accounts for up to 5% of all new cancer cases, with a 5-year survival rate of only 10%. Accurate identification of tissue of origin would allow for directed, personalized therapies to improve clinical outcomes. Our objective was to use transcriptome sequencing (RNA-Seq) to identify lineage-specific biomarker signatures for the cancer types that most commonly metastasize as CUP (colorectum, kidney, liver, lung, ovary, pancreas, prostate, and stomach). RNA-Seq data of 17,471 transcripts from a total of 3,244 cancer samples across 26 different tissue types were compiled from in-house sequencing data and publically available International Cancer Genome Consortium and The Cancer Genome Atlas datasets. Robust cancer biomarker signatures were extracted using a 10-fold cross-validation method of log transformation, quantile normalization, transcript ranking by area under the receiver operating characteristic curve, and stepwise logistic regression. The entire algorithm was then repeated with a new set of randomly generated training and test sets, yielding highly concordant biomarker signatures. External validation of the cancer-specific signatures yielded high sensitivity (92.0% ± 3.15%; mean ± standard deviation) and specificity (97.7% ± 2.99%) for each cancer biomarker signature. The overall performance of this RNA-Seq biomarker-generating algorithm yielded an accuracy of 90.5%. In conclusion, we demonstrate a computational model for producing highly sensitive and specific cancer biomarker signatures from RNA-Seq data, generating signatures for the top eight cancer types responsible for CUP to accurately identify tumor origin.

AB - Metastatic cancer of unknown primary (CUP) accounts for up to 5% of all new cancer cases, with a 5-year survival rate of only 10%. Accurate identification of tissue of origin would allow for directed, personalized therapies to improve clinical outcomes. Our objective was to use transcriptome sequencing (RNA-Seq) to identify lineage-specific biomarker signatures for the cancer types that most commonly metastasize as CUP (colorectum, kidney, liver, lung, ovary, pancreas, prostate, and stomach). RNA-Seq data of 17,471 transcripts from a total of 3,244 cancer samples across 26 different tissue types were compiled from in-house sequencing data and publically available International Cancer Genome Consortium and The Cancer Genome Atlas datasets. Robust cancer biomarker signatures were extracted using a 10-fold cross-validation method of log transformation, quantile normalization, transcript ranking by area under the receiver operating characteristic curve, and stepwise logistic regression. The entire algorithm was then repeated with a new set of randomly generated training and test sets, yielding highly concordant biomarker signatures. External validation of the cancer-specific signatures yielded high sensitivity (92.0% ± 3.15%; mean ± standard deviation) and specificity (97.7% ± 2.99%) for each cancer biomarker signature. The overall performance of this RNA-Seq biomarker-generating algorithm yielded an accuracy of 90.5%. In conclusion, we demonstrate a computational model for producing highly sensitive and specific cancer biomarker signatures from RNA-Seq data, generating signatures for the top eight cancer types responsible for CUP to accurately identify tumor origin.

UR - http://www.scopus.com/inward/record.url?scp=84937541141&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937541141&partnerID=8YFLogxK

U2 - 10.1016/j.neo.2014.09.007

DO - 10.1016/j.neo.2014.09.007

M3 - Article

C2 - 25425966

AN - SCOPUS:84937541141

VL - 16

SP - 918

EP - 927

JO - Neoplasia (United States)

JF - Neoplasia (United States)

SN - 1522-8002

IS - 11

ER -