An integrated network representation of multiple cancer-specific data for graph-based machine learning

Limeng Pu; Manali Singha; Hsiao Chun Wu; Costas Busch; J. Ramanujam; Michal Brylinski

doi:10.1038/s41540-022-00226-9

An integrated network representation of multiple cancer-specific data for graph-based machine learning

Limeng Pu, Manali Singha, Hsiao Chun Wu, Costas Busch, J. Ramanujam, Michal Brylinski

Computer Science

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

Genomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to drug treatment. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurate prediction of the effect of pharmacotherapy on a specific cell line based on the genetic information alone is problematic. Emphasizing on the system-level complexity of cancer, we devised a procedure to integrate multiple heterogeneous data, including biological networks, genomics, inhibitor profiling, and gene-disease associations, into a unified graph structure. In order to construct compact, yet information-rich cancer-specific networks, we developed a novel graph reduction algorithm. Driven by not only the topological information, but also the biological knowledge, the graph reduction increases the feature-only entropy while preserving the valuable graph-feature information. Subsequent comparative benchmarking simulations employing a tissue level cross-validation protocol demonstrate that the accuracy of a graph-based predictor of the drug efficacy is 0.68, which is notably higher than those measured for more traditional, matrix-based techniques on the same data. Overall, the non-Euclidean representation of the cancer-specific data improves the performance of machine learning to predict the response of cancer to pharmacotherapy. The generated data are freely available to the academic community at https://osf.io/dzx7b/.

Original language	English (US)
Article number	14
Journal	npj Systems Biology and Applications
Volume	8
Issue number	1
DOIs	https://doi.org/10.1038/s41540-022-00226-9
State	Published - Dec 2022

ASJC Scopus subject areas

Modeling and Simulation
General Biochemistry, Genetics and Molecular Biology
Drug Discovery
Computer Science Applications
Applied Mathematics

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1038/s41540-022-00226-9

Cite this

@article{35808b21d8094b31a878436a6c40f9a1,

title = "An integrated network representation of multiple cancer-specific data for graph-based machine learning",

abstract = "Genomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to drug treatment. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurate prediction of the effect of pharmacotherapy on a specific cell line based on the genetic information alone is problematic. Emphasizing on the system-level complexity of cancer, we devised a procedure to integrate multiple heterogeneous data, including biological networks, genomics, inhibitor profiling, and gene-disease associations, into a unified graph structure. In order to construct compact, yet information-rich cancer-specific networks, we developed a novel graph reduction algorithm. Driven by not only the topological information, but also the biological knowledge, the graph reduction increases the feature-only entropy while preserving the valuable graph-feature information. Subsequent comparative benchmarking simulations employing a tissue level cross-validation protocol demonstrate that the accuracy of a graph-based predictor of the drug efficacy is 0.68, which is notably higher than those measured for more traditional, matrix-based techniques on the same data. Overall, the non-Euclidean representation of the cancer-specific data improves the performance of machine learning to predict the response of cancer to pharmacotherapy. The generated data are freely available to the academic community at https://osf.io/dzx7b/.",

author = "Limeng Pu and Manali Singha and Wu, {Hsiao Chun} and Costas Busch and J. Ramanujam and Michal Brylinski",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2022",

month = dec,

doi = "10.1038/s41540-022-00226-9",

language = "English (US)",

volume = "8",

journal = "npj Systems Biology and Applications",

issn = "2056-7189",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - An integrated network representation of multiple cancer-specific data for graph-based machine learning

AU - Pu, Limeng

AU - Singha, Manali

AU - Wu, Hsiao Chun

AU - Busch, Costas

AU - Ramanujam, J.

AU - Brylinski, Michal

PY - 2022/12

Y1 - 2022/12

N2 - Genomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to drug treatment. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurate prediction of the effect of pharmacotherapy on a specific cell line based on the genetic information alone is problematic. Emphasizing on the system-level complexity of cancer, we devised a procedure to integrate multiple heterogeneous data, including biological networks, genomics, inhibitor profiling, and gene-disease associations, into a unified graph structure. In order to construct compact, yet information-rich cancer-specific networks, we developed a novel graph reduction algorithm. Driven by not only the topological information, but also the biological knowledge, the graph reduction increases the feature-only entropy while preserving the valuable graph-feature information. Subsequent comparative benchmarking simulations employing a tissue level cross-validation protocol demonstrate that the accuracy of a graph-based predictor of the drug efficacy is 0.68, which is notably higher than those measured for more traditional, matrix-based techniques on the same data. Overall, the non-Euclidean representation of the cancer-specific data improves the performance of machine learning to predict the response of cancer to pharmacotherapy. The generated data are freely available to the academic community at https://osf.io/dzx7b/.

AB - Genomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to drug treatment. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurate prediction of the effect of pharmacotherapy on a specific cell line based on the genetic information alone is problematic. Emphasizing on the system-level complexity of cancer, we devised a procedure to integrate multiple heterogeneous data, including biological networks, genomics, inhibitor profiling, and gene-disease associations, into a unified graph structure. In order to construct compact, yet information-rich cancer-specific networks, we developed a novel graph reduction algorithm. Driven by not only the topological information, but also the biological knowledge, the graph reduction increases the feature-only entropy while preserving the valuable graph-feature information. Subsequent comparative benchmarking simulations employing a tissue level cross-validation protocol demonstrate that the accuracy of a graph-based predictor of the drug efficacy is 0.68, which is notably higher than those measured for more traditional, matrix-based techniques on the same data. Overall, the non-Euclidean representation of the cancer-specific data improves the performance of machine learning to predict the response of cancer to pharmacotherapy. The generated data are freely available to the academic community at https://osf.io/dzx7b/.

UR - http://www.scopus.com/inward/record.url?scp=85129109919&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85129109919&partnerID=8YFLogxK

U2 - 10.1038/s41540-022-00226-9

DO - 10.1038/s41540-022-00226-9

M3 - Article

C2 - 35487924

AN - SCOPUS:85129109919

SN - 2056-7189

VL - 8

JO - npj Systems Biology and Applications

JF - npj Systems Biology and Applications

IS - 1

M1 - 14

ER -

An integrated network representation of multiple cancer-specific data for graph-based machine learning

Abstract

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this