Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

Yiheng Wang, Tong Liu, Dong Xu, Huidong Shi, Chaoyang Zhang, Yin Yuan Mo, Zheng Wang

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

Original languageEnglish (US)
Article number19598
JournalScientific Reports
Volume6
DOIs
StatePublished - Jan 22 2016

Fingerprint

DNA Methylation
Methylation
Genome
Learning
Human Genome
Myeloid Leukemia
Epigenomics
Leukemia
Software
Cell Line

ASJC Scopus subject areas

  • General

Cite this

Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks. / Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin Yuan; Wang, Zheng.

In: Scientific Reports, Vol. 6, 19598, 22.01.2016.

Research output: Contribution to journalArticle

Wang, Yiheng ; Liu, Tong ; Xu, Dong ; Shi, Huidong ; Zhang, Chaoyang ; Mo, Yin Yuan ; Wang, Zheng. / Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks. In: Scientific Reports. 2016 ; Vol. 6.
@article{e4b00db1cf564998b326e64b3c4fc615,
title = "Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks",
abstract = "The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named {"}DeepMethyl{"} to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7{\%} for GM12878 and 88.6{\%} for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82{\%} for GM12878 and 72.01{\%} for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.",
author = "Yiheng Wang and Tong Liu and Dong Xu and Huidong Shi and Chaoyang Zhang and Mo, {Yin Yuan} and Zheng Wang",
year = "2016",
month = "1",
day = "22",
doi = "10.1038/srep19598",
language = "English (US)",
volume = "6",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

AU - Wang, Yiheng

AU - Liu, Tong

AU - Xu, Dong

AU - Shi, Huidong

AU - Zhang, Chaoyang

AU - Mo, Yin Yuan

AU - Wang, Zheng

PY - 2016/1/22

Y1 - 2016/1/22

N2 - The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

AB - The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

UR - http://www.scopus.com/inward/record.url?scp=84955475516&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84955475516&partnerID=8YFLogxK

U2 - 10.1038/srep19598

DO - 10.1038/srep19598

M3 - Article

C2 - 26797014

AN - SCOPUS:84955475516

VL - 6

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

M1 - 19598

ER -