Real-time traversal in grammar-based compressed files

Leszek Ga̧sieniec; Roman Kolpako; Igor Potapov; Paul Sant

Real-time traversal in grammar-based compressed files

Leszek Ga̧sieniec, Roman Kolpako, Igor Potapov, Paul Sant

Research output: Contribution to journal › Conference article › peer-review

Abstract

Historically, one of the main aims of text compression was to reduce the size of data stored for future analysis and processing, see, e.g., [2, 3]. Unfortunately, further processing of the compressed data is usually preceded by complete decompression. This standard approach may cause problems in certain applications, where the original (uncompressed) file is very large and we face running out of space available for the computation. Thus, it is important to be able to process compressed data without requiring (complete) decompression. Moreover, in some applications (e.g., pattern matching problems), the information represented by the compressed file can be processed in relatively small chunks (length of a pattern). In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression, see, e.g., [1]. In this setting, a compressed text is represented as a small (a few kilobytes) dictionary D (containing a set of code words), and a very long (a few megabytes) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression, is comparable with standard compression methods based on the Lempel-Ziv approach [3]. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in [1].

Original language	English (US)
Pages (from-to)	458
Number of pages	1
Journal	Data Compression Conference Proceedings
State	Published - 2005
Externally published	Yes
Event	DCC 2005: Data Compression Conference - Snowbird, UT, United States Duration: Mar 29 2005 → Mar 31 2005

ASJC Scopus subject areas

Computer Networks and Communications

Cite this

@article{1965b4a56f68455bb0e78a24427bb3eb,

title = "Real-time traversal in grammar-based compressed files",

abstract = "Historically, one of the main aims of text compression was to reduce the size of data stored for future analysis and processing, see, e.g., [2, 3]. Unfortunately, further processing of the compressed data is usually preceded by complete decompression. This standard approach may cause problems in certain applications, where the original (uncompressed) file is very large and we face running out of space available for the computation. Thus, it is important to be able to process compressed data without requiring (complete) decompression. Moreover, in some applications (e.g., pattern matching problems), the information represented by the compressed file can be processed in relatively small chunks (length of a pattern). In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression, see, e.g., [1]. In this setting, a compressed text is represented as a small (a few kilobytes) dictionary D (containing a set of code words), and a very long (a few megabytes) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression, is comparable with standard compression methods based on the Lempel-Ziv approach [3]. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in [1].",

author = "Leszek G{\c a}sieniec and Roman Kolpako and Igor Potapov and Paul Sant",

year = "2005",

language = "English (US)",

pages = "458",

journal = "Data Compression Conference Proceedings",

issn = "1068-0314",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

note = "DCC 2005: Data Compression Conference ; Conference date: 29-03-2005 Through 31-03-2005",

}

TY - JOUR

T1 - Real-time traversal in grammar-based compressed files

AU - Ga̧sieniec, Leszek

AU - Kolpako, Roman

AU - Potapov, Igor

AU - Sant, Paul

PY - 2005

Y1 - 2005

N2 - Historically, one of the main aims of text compression was to reduce the size of data stored for future analysis and processing, see, e.g., [2, 3]. Unfortunately, further processing of the compressed data is usually preceded by complete decompression. This standard approach may cause problems in certain applications, where the original (uncompressed) file is very large and we face running out of space available for the computation. Thus, it is important to be able to process compressed data without requiring (complete) decompression. Moreover, in some applications (e.g., pattern matching problems), the information represented by the compressed file can be processed in relatively small chunks (length of a pattern). In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression, see, e.g., [1]. In this setting, a compressed text is represented as a small (a few kilobytes) dictionary D (containing a set of code words), and a very long (a few megabytes) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression, is comparable with standard compression methods based on the Lempel-Ziv approach [3]. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in [1].

AB - Historically, one of the main aims of text compression was to reduce the size of data stored for future analysis and processing, see, e.g., [2, 3]. Unfortunately, further processing of the compressed data is usually preceded by complete decompression. This standard approach may cause problems in certain applications, where the original (uncompressed) file is very large and we face running out of space available for the computation. Thus, it is important to be able to process compressed data without requiring (complete) decompression. Moreover, in some applications (e.g., pattern matching problems), the information represented by the compressed file can be processed in relatively small chunks (length of a pattern). In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression, see, e.g., [1]. In this setting, a compressed text is represented as a small (a few kilobytes) dictionary D (containing a set of code words), and a very long (a few megabytes) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression, is comparable with standard compression methods based on the Lempel-Ziv approach [3]. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in [1].

UR - http://www.scopus.com/inward/record.url?scp=26944445815&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=26944445815&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:26944445815

SN - 1068-0314

SP - 458

JO - Data Compression Conference Proceedings

JF - Data Compression Conference Proceedings

T2 - DCC 2005: Data Compression Conference

Y2 - 29 March 2005 through 31 March 2005

ER -

Real-time traversal in grammar-based compressed files

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this