Development and application of workbench for analysis and visualization of whole genome sequence

Jeong-Hyeon Choi, Hee Jeong Jin, Cheol Min Kim, Chul Hun L. Chang, Hwan Gue Cho

Research output: Contribution to journalArticle

Abstract

An increasing number of genome sequencing projects results in explosive growth of whole genome sequences. Furthermore the number of studies on the functions of individual genes has also been rapidly increased. However on-memory algorithms are not applicable to the analysis of whole genome sequences, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce the development and application of the workbench for the analysis and visualization of whole genome sequences using string B-tree that is suitable for the analysis of huge data. This system consists of two main parts, the analysis query part and the visualization part. The query system supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization system helps biologists to easily understand whole genome structure and specificity by various kinds of visualization such as whole genome sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, k-mer viewer, RWP (Random Walk Plot) viewer, and map viewer. We can find the relationships among organisms, support gene prediction in a genome, and study the function of junk DNA using our workbench. In this paper, we apply our workbench to investigating specific sequence such as avoided sequence, common sequence, and classifiable sequence.

Original languageEnglish (US)
Pages (from-to)205-217
Number of pages13
JournalKorean Journal of Genetics
Volume24
Issue number2
StatePublished - Jun 1 2002
Externally publishedYes

Fingerprint

Visualization
Genes
Genome
Base Pairing
Intergenic DNA
Data storage equipment
Pattern matching
Chaos theory
Data structures
Growth

Keywords

  • Avoided sequence
  • Chaos game representation
  • Classifiable sequence
  • Common sequence
  • Genome
  • k-mer analysis
  • Random walk plot
  • Sequence analysis
  • Workbench

ASJC Scopus subject areas

  • Genetics

Cite this

Development and application of workbench for analysis and visualization of whole genome sequence. / Choi, Jeong-Hyeon; Jin, Hee Jeong; Kim, Cheol Min; Chang, Chul Hun L.; Cho, Hwan Gue.

In: Korean Journal of Genetics, Vol. 24, No. 2, 01.06.2002, p. 205-217.

Research output: Contribution to journalArticle

Choi, Jeong-Hyeon ; Jin, Hee Jeong ; Kim, Cheol Min ; Chang, Chul Hun L. ; Cho, Hwan Gue. / Development and application of workbench for analysis and visualization of whole genome sequence. In: Korean Journal of Genetics. 2002 ; Vol. 24, No. 2. pp. 205-217.
@article{b9e298f83d8144e49d8f4290606c5add,
title = "Development and application of workbench for analysis and visualization of whole genome sequence",
abstract = "An increasing number of genome sequencing projects results in explosive growth of whole genome sequences. Furthermore the number of studies on the functions of individual genes has also been rapidly increased. However on-memory algorithms are not applicable to the analysis of whole genome sequences, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce the development and application of the workbench for the analysis and visualization of whole genome sequences using string B-tree that is suitable for the analysis of huge data. This system consists of two main parts, the analysis query part and the visualization part. The query system supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization system helps biologists to easily understand whole genome structure and specificity by various kinds of visualization such as whole genome sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, k-mer viewer, RWP (Random Walk Plot) viewer, and map viewer. We can find the relationships among organisms, support gene prediction in a genome, and study the function of junk DNA using our workbench. In this paper, we apply our workbench to investigating specific sequence such as avoided sequence, common sequence, and classifiable sequence.",
keywords = "Avoided sequence, Chaos game representation, Classifiable sequence, Common sequence, Genome, k-mer analysis, Random walk plot, Sequence analysis, Workbench",
author = "Jeong-Hyeon Choi and Jin, {Hee Jeong} and Kim, {Cheol Min} and Chang, {Chul Hun L.} and Cho, {Hwan Gue}",
year = "2002",
month = "6",
day = "1",
language = "English (US)",
volume = "24",
pages = "205--217",
journal = "Genes and Genomics",
issn = "1976-9571",
publisher = "Springer Verlag",
number = "2",

}

TY - JOUR

T1 - Development and application of workbench for analysis and visualization of whole genome sequence

AU - Choi, Jeong-Hyeon

AU - Jin, Hee Jeong

AU - Kim, Cheol Min

AU - Chang, Chul Hun L.

AU - Cho, Hwan Gue

PY - 2002/6/1

Y1 - 2002/6/1

N2 - An increasing number of genome sequencing projects results in explosive growth of whole genome sequences. Furthermore the number of studies on the functions of individual genes has also been rapidly increased. However on-memory algorithms are not applicable to the analysis of whole genome sequences, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce the development and application of the workbench for the analysis and visualization of whole genome sequences using string B-tree that is suitable for the analysis of huge data. This system consists of two main parts, the analysis query part and the visualization part. The query system supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization system helps biologists to easily understand whole genome structure and specificity by various kinds of visualization such as whole genome sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, k-mer viewer, RWP (Random Walk Plot) viewer, and map viewer. We can find the relationships among organisms, support gene prediction in a genome, and study the function of junk DNA using our workbench. In this paper, we apply our workbench to investigating specific sequence such as avoided sequence, common sequence, and classifiable sequence.

AB - An increasing number of genome sequencing projects results in explosive growth of whole genome sequences. Furthermore the number of studies on the functions of individual genes has also been rapidly increased. However on-memory algorithms are not applicable to the analysis of whole genome sequences, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce the development and application of the workbench for the analysis and visualization of whole genome sequences using string B-tree that is suitable for the analysis of huge data. This system consists of two main parts, the analysis query part and the visualization part. The query system supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization system helps biologists to easily understand whole genome structure and specificity by various kinds of visualization such as whole genome sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, k-mer viewer, RWP (Random Walk Plot) viewer, and map viewer. We can find the relationships among organisms, support gene prediction in a genome, and study the function of junk DNA using our workbench. In this paper, we apply our workbench to investigating specific sequence such as avoided sequence, common sequence, and classifiable sequence.

KW - Avoided sequence

KW - Chaos game representation

KW - Classifiable sequence

KW - Common sequence

KW - Genome

KW - k-mer analysis

KW - Random walk plot

KW - Sequence analysis

KW - Workbench

UR - http://www.scopus.com/inward/record.url?scp=0038046367&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038046367&partnerID=8YFLogxK

M3 - Article

VL - 24

SP - 205

EP - 217

JO - Genes and Genomics

JF - Genes and Genomics

SN - 1976-9571

IS - 2

ER -