De novo identification of LTR retrotransposons in eukaryotic genomes

Mina Rho, Jeong-Hyeon Choi, Sun Kim, Michael Lynch, Haixu Tang

Research output: Contribution to journalArticle

41 Citations (Scopus)

Abstract

Background: LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Currently, LTR retrotransposons are annotated in eukaryotic genomes mainly through the conventional homology searching approach. Hence, it is limited to annotating known elements. Results: In this paper, we report a de novo computational method that can identify new LTR retrotransposons without relying on a library of known elements. Specifically, our method identifies intact LTR retrotransposons by using an approximate string matching technique and protein domain analysis. In addition, it identifies partially deleted or solo LTRs using profile Hidden Markov Models (pHMMs). As a result, this method can de novo identify all types of LTR retrotransposons. We tested this method on the two pairs of eukaryotic genomes, C. elegans vs. C. briggsae and D. melanogaster vs. D. pseudoobscura. LTR retrotransposons in C. elegans and D. melanogaster have been intensively studied using conventional annotation methods. Comparing with previous work, we identified new intact LTR retroelements and new putative families, which may imply that there may still be new retroelements that are left to be discovered even in well-studied organisms. To assess the sensitivity and accuracy of our method, we compared our results with a previously published method, LTR_STRUC, which predominantly identifies full-length LTR retrotransposons. In summary, both methods identified comparable number of intact LTR retroelements. But our method can identify nearly all known elements in C. elegans, while LTR_STRUCT missed about 1/3 of them. Our method also identified more known LTR retroelements than LTR_STRUCT in the D. melanogaster genome. We also identified some LTR retroelements in the other two genomes, C. briggsae and D. pseudoobscura, which have not been completely finished. In contrast, the conventional method failed to identify those elements. Finally, the phylogenetic and chromosomal distributions of the identified elements are discussed. Conclusion: We report a novel method for de novo identification of LTR retrotransposons in eukaryotic genomes with favorable performance over the existing methods.

Original languageEnglish (US)
Article number90
JournalBMC Genomics
Volume8
DOIs
StatePublished - Apr 3 2007
Externally publishedYes

Fingerprint

Retroelements
Terminal Repeat Sequences
Genome
Interspersed Repetitive Sequences

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

De novo identification of LTR retrotransposons in eukaryotic genomes. / Rho, Mina; Choi, Jeong-Hyeon; Kim, Sun; Lynch, Michael; Tang, Haixu.

In: BMC Genomics, Vol. 8, 90, 03.04.2007.

Research output: Contribution to journalArticle

Rho, Mina ; Choi, Jeong-Hyeon ; Kim, Sun ; Lynch, Michael ; Tang, Haixu. / De novo identification of LTR retrotransposons in eukaryotic genomes. In: BMC Genomics. 2007 ; Vol. 8.
@article{d29deb7c820345259ace928e2b702b3f,
title = "De novo identification of LTR retrotransposons in eukaryotic genomes",
abstract = "Background: LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Currently, LTR retrotransposons are annotated in eukaryotic genomes mainly through the conventional homology searching approach. Hence, it is limited to annotating known elements. Results: In this paper, we report a de novo computational method that can identify new LTR retrotransposons without relying on a library of known elements. Specifically, our method identifies intact LTR retrotransposons by using an approximate string matching technique and protein domain analysis. In addition, it identifies partially deleted or solo LTRs using profile Hidden Markov Models (pHMMs). As a result, this method can de novo identify all types of LTR retrotransposons. We tested this method on the two pairs of eukaryotic genomes, C. elegans vs. C. briggsae and D. melanogaster vs. D. pseudoobscura. LTR retrotransposons in C. elegans and D. melanogaster have been intensively studied using conventional annotation methods. Comparing with previous work, we identified new intact LTR retroelements and new putative families, which may imply that there may still be new retroelements that are left to be discovered even in well-studied organisms. To assess the sensitivity and accuracy of our method, we compared our results with a previously published method, LTR_STRUC, which predominantly identifies full-length LTR retrotransposons. In summary, both methods identified comparable number of intact LTR retroelements. But our method can identify nearly all known elements in C. elegans, while LTR_STRUCT missed about 1/3 of them. Our method also identified more known LTR retroelements than LTR_STRUCT in the D. melanogaster genome. We also identified some LTR retroelements in the other two genomes, C. briggsae and D. pseudoobscura, which have not been completely finished. In contrast, the conventional method failed to identify those elements. Finally, the phylogenetic and chromosomal distributions of the identified elements are discussed. Conclusion: We report a novel method for de novo identification of LTR retrotransposons in eukaryotic genomes with favorable performance over the existing methods.",
author = "Mina Rho and Jeong-Hyeon Choi and Sun Kim and Michael Lynch and Haixu Tang",
year = "2007",
month = "4",
day = "3",
doi = "10.1186/1471-2164-8-90",
language = "English (US)",
volume = "8",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - De novo identification of LTR retrotransposons in eukaryotic genomes

AU - Rho, Mina

AU - Choi, Jeong-Hyeon

AU - Kim, Sun

AU - Lynch, Michael

AU - Tang, Haixu

PY - 2007/4/3

Y1 - 2007/4/3

N2 - Background: LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Currently, LTR retrotransposons are annotated in eukaryotic genomes mainly through the conventional homology searching approach. Hence, it is limited to annotating known elements. Results: In this paper, we report a de novo computational method that can identify new LTR retrotransposons without relying on a library of known elements. Specifically, our method identifies intact LTR retrotransposons by using an approximate string matching technique and protein domain analysis. In addition, it identifies partially deleted or solo LTRs using profile Hidden Markov Models (pHMMs). As a result, this method can de novo identify all types of LTR retrotransposons. We tested this method on the two pairs of eukaryotic genomes, C. elegans vs. C. briggsae and D. melanogaster vs. D. pseudoobscura. LTR retrotransposons in C. elegans and D. melanogaster have been intensively studied using conventional annotation methods. Comparing with previous work, we identified new intact LTR retroelements and new putative families, which may imply that there may still be new retroelements that are left to be discovered even in well-studied organisms. To assess the sensitivity and accuracy of our method, we compared our results with a previously published method, LTR_STRUC, which predominantly identifies full-length LTR retrotransposons. In summary, both methods identified comparable number of intact LTR retroelements. But our method can identify nearly all known elements in C. elegans, while LTR_STRUCT missed about 1/3 of them. Our method also identified more known LTR retroelements than LTR_STRUCT in the D. melanogaster genome. We also identified some LTR retroelements in the other two genomes, C. briggsae and D. pseudoobscura, which have not been completely finished. In contrast, the conventional method failed to identify those elements. Finally, the phylogenetic and chromosomal distributions of the identified elements are discussed. Conclusion: We report a novel method for de novo identification of LTR retrotransposons in eukaryotic genomes with favorable performance over the existing methods.

AB - Background: LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Currently, LTR retrotransposons are annotated in eukaryotic genomes mainly through the conventional homology searching approach. Hence, it is limited to annotating known elements. Results: In this paper, we report a de novo computational method that can identify new LTR retrotransposons without relying on a library of known elements. Specifically, our method identifies intact LTR retrotransposons by using an approximate string matching technique and protein domain analysis. In addition, it identifies partially deleted or solo LTRs using profile Hidden Markov Models (pHMMs). As a result, this method can de novo identify all types of LTR retrotransposons. We tested this method on the two pairs of eukaryotic genomes, C. elegans vs. C. briggsae and D. melanogaster vs. D. pseudoobscura. LTR retrotransposons in C. elegans and D. melanogaster have been intensively studied using conventional annotation methods. Comparing with previous work, we identified new intact LTR retroelements and new putative families, which may imply that there may still be new retroelements that are left to be discovered even in well-studied organisms. To assess the sensitivity and accuracy of our method, we compared our results with a previously published method, LTR_STRUC, which predominantly identifies full-length LTR retrotransposons. In summary, both methods identified comparable number of intact LTR retroelements. But our method can identify nearly all known elements in C. elegans, while LTR_STRUCT missed about 1/3 of them. Our method also identified more known LTR retroelements than LTR_STRUCT in the D. melanogaster genome. We also identified some LTR retroelements in the other two genomes, C. briggsae and D. pseudoobscura, which have not been completely finished. In contrast, the conventional method failed to identify those elements. Finally, the phylogenetic and chromosomal distributions of the identified elements are discussed. Conclusion: We report a novel method for de novo identification of LTR retrotransposons in eukaryotic genomes with favorable performance over the existing methods.

UR - http://www.scopus.com/inward/record.url?scp=34247877564&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247877564&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-8-90

DO - 10.1186/1471-2164-8-90

M3 - Article

C2 - 17407597

AN - SCOPUS:34247877564

VL - 8

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - 90

ER -