Metastatic cancer of unknown primary (CUP) accounts for up to 5% of all new cancer cases, with a 5-year survival rate of only 10%. Accurate identification of tissue of origin would allow for directed, personalized therapies to improve clinical outcomes. Our objective was to use transcriptome sequencing (RNA-Seq) to identify lineage-specific biomarker signatures for the cancer types that most commonly metastasize as CUP (colorectum, kidney, liver, lung, ovary, pancreas, prostate, and stomach). RNA-Seq data of 17,471 transcripts from a total of 3,244 cancer samples across 26 different tissue types were compiled from in-house sequencing data and publically available International Cancer Genome Consortium and The Cancer Genome Atlas datasets. Robust cancer biomarker signatures were extracted using a 10-fold cross-validation method of log transformation, quantile normalization, transcript ranking by area under the receiver operating characteristic curve, and stepwise logistic regression. The entire algorithm was then repeated with a new set of randomly generated training and test sets, yielding highly concordant biomarker signatures. External validation of the cancer-specific signatures yielded high sensitivity (92.0% ± 3.15%; mean ± standard deviation) and specificity (97.7% ± 2.99%) for each cancer biomarker signature. The overall performance of this RNA-Seq biomarker-generating algorithm yielded an accuracy of 90.5%. In conclusion, we demonstrate a computational model for producing highly sensitive and specific cancer biomarker signatures from RNA-Seq data, generating signatures for the top eight cancer types responsible for CUP to accurately identify tumor origin.
ASJC Scopus subject areas
- Cancer Research