Efficiency of fast parallel pattern searching in highly compressed texts

Leszek Gąsieniec; Alan Gibbons; Wojciech Rytter

doi:10.1007/3-540-48340-3_5

Efficiency of fast parallel pattern searching in highly compressed texts

Leszek Gąsieniec, Alan Gibbons, Wojciech Rytter

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

We consider efficiency of NC-algorithms for pattern-searching in highly compressed one- and two-dimensional texts. “Highly compressed” means that the text can be exponentially large with respect to its compressed version, and “fast” means “in polylogarithmic time”. Given an uncompressed pattern P and a compressed version of a text T, the compressed matching problem is to test if P occurs in T. Two types of closely related compressed representations of 1-dimensional texts are considered: the Lempel-Ziv encodings (LZ, in short) and restricted LZ encodings (RLZ, in short). For highly compressed texts there is a small difference between them, in extreme situations both of them compress text exponentially, e.g. Fibonacci words of size N have compressed versions of size O(logN) for LZ and Restricted LZ encodings. Despite similarities we prove that LZ-compressed matching is P-complete while RLZ-compressed matching is rather trivially in NC. We show how to improve a naive straightforward NC algorithm and obtain almost optimal parallel RLZ-compressed matching applying tree-contraction techniques to directed acyclic graphs with polynomial tree-size. As a corollary we obtain an almost optimal parallel algorithm for LZW-compressed matching which is simpler than the (more general) algorithm in [11]. Highly compressed 2-dimensional texts are also considered.

Original language	English (US)
Title of host publication	Mathematical Foundations of Computer Science 1999 - 24th International Symposium, MFCS 1999, Proceedings
Editors	Mirosław kutyłowski, Leszek Pacholski, Tomasz Wierzbicki
Publisher	Springer Verlag
Pages	48-58
Number of pages	11
ISBN (Print)	3540664084, 9783540664086
DOIs	https://doi.org/10.1007/3-540-48340-3_5
State	Published - Jan 1 1999
Externally published	Yes
Event	24th International Symposium on Mathematical Foundations of Computer Science, MFCS 1999 - Szklarska Poreba, Poland Duration: Sep 6 1999 → Sep 10 1999

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	1672
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	24th International Symposium on Mathematical Foundations of Computer Science, MFCS 1999
Country/Territory	Poland
City	Szklarska Poreba
Period	9/6/99 → 9/10/99

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/3-540-48340-3_5

Cite this

Gąsieniec, L., Gibbons, A., & Rytter, W. (1999). Efficiency of fast parallel pattern searching in highly compressed texts. In M. kutyłowski, L. Pacholski, & T. Wierzbicki (Eds.), Mathematical Foundations of Computer Science 1999 - 24th International Symposium, MFCS 1999, Proceedings (pp. 48-58). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1672). Springer Verlag. https://doi.org/10.1007/3-540-48340-3_5

Efficiency of fast parallel pattern searching in highly compressed texts. / Gąsieniec, Leszek; Gibbons, Alan; Rytter, Wojciech.
Mathematical Foundations of Computer Science 1999 - 24th International Symposium, MFCS 1999, Proceedings. ed. / Mirosław kutyłowski; Leszek Pacholski; Tomasz Wierzbicki. Springer Verlag, 1999. p. 48-58 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1672).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Gąsieniec, L, Gibbons, A & Rytter, W 1999, Efficiency of fast parallel pattern searching in highly compressed texts. in M kutyłowski, L Pacholski & T Wierzbicki (eds), Mathematical Foundations of Computer Science 1999 - 24th International Symposium, MFCS 1999, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1672, Springer Verlag, pp. 48-58, 24th International Symposium on Mathematical Foundations of Computer Science, MFCS 1999, Szklarska Poreba, Poland, 9/6/99. https://doi.org/10.1007/3-540-48340-3_5

Gąsieniec L, Gibbons A, Rytter W. Efficiency of fast parallel pattern searching in highly compressed texts. In kutyłowski M, Pacholski L, Wierzbicki T, editors, Mathematical Foundations of Computer Science 1999 - 24th International Symposium, MFCS 1999, Proceedings. Springer Verlag. 1999. p. 48-58. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/3-540-48340-3_5

Gąsieniec, Leszek ; Gibbons, Alan ; Rytter, Wojciech. / Efficiency of fast parallel pattern searching in highly compressed texts. Mathematical Foundations of Computer Science 1999 - 24th International Symposium, MFCS 1999, Proceedings. editor / Mirosław kutyłowski ; Leszek Pacholski ; Tomasz Wierzbicki. Springer Verlag, 1999. pp. 48-58 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{48993a785e0e420e8ab04e4c5ffb7eab,

title = "Efficiency of fast parallel pattern searching in highly compressed texts",

abstract = "We consider efficiency of NC-algorithms for pattern-searching in highly compressed one- and two-dimensional texts. “Highly compressed” means that the text can be exponentially large with respect to its compressed version, and “fast” means “in polylogarithmic time”. Given an uncompressed pattern P and a compressed version of a text T, the compressed matching problem is to test if P occurs in T. Two types of closely related compressed representations of 1-dimensional texts are considered: the Lempel-Ziv encodings (LZ, in short) and restricted LZ encodings (RLZ, in short). For highly compressed texts there is a small difference between them, in extreme situations both of them compress text exponentially, e.g. Fibonacci words of size N have compressed versions of size O(logN) for LZ and Restricted LZ encodings. Despite similarities we prove that LZ-compressed matching is P-complete while RLZ-compressed matching is rather trivially in NC. We show how to improve a naive straightforward NC algorithm and obtain almost optimal parallel RLZ-compressed matching applying tree-contraction techniques to directed acyclic graphs with polynomial tree-size. As a corollary we obtain an almost optimal parallel algorithm for LZW-compressed matching which is simpler than the (more general) algorithm in [11]. Highly compressed 2-dimensional texts are also considered.",

author = "Leszek G{\c a}sieniec and Alan Gibbons and Wojciech Rytter",

year = "1999",

month = jan,

day = "1",

doi = "10.1007/3-540-48340-3_5",

language = "English (US)",

isbn = "3540664084",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "48--58",

editor = "Miros{\l}aw kuty{\l}owski and Leszek Pacholski and Tomasz Wierzbicki",

booktitle = "Mathematical Foundations of Computer Science 1999 - 24th International Symposium, MFCS 1999, Proceedings",

note = "24th International Symposium on Mathematical Foundations of Computer Science, MFCS 1999 ; Conference date: 06-09-1999 Through 10-09-1999",

}

TY - GEN

T1 - Efficiency of fast parallel pattern searching in highly compressed texts

AU - Gąsieniec, Leszek

AU - Gibbons, Alan

AU - Rytter, Wojciech

PY - 1999/1/1

Y1 - 1999/1/1

N2 - We consider efficiency of NC-algorithms for pattern-searching in highly compressed one- and two-dimensional texts. “Highly compressed” means that the text can be exponentially large with respect to its compressed version, and “fast” means “in polylogarithmic time”. Given an uncompressed pattern P and a compressed version of a text T, the compressed matching problem is to test if P occurs in T. Two types of closely related compressed representations of 1-dimensional texts are considered: the Lempel-Ziv encodings (LZ, in short) and restricted LZ encodings (RLZ, in short). For highly compressed texts there is a small difference between them, in extreme situations both of them compress text exponentially, e.g. Fibonacci words of size N have compressed versions of size O(logN) for LZ and Restricted LZ encodings. Despite similarities we prove that LZ-compressed matching is P-complete while RLZ-compressed matching is rather trivially in NC. We show how to improve a naive straightforward NC algorithm and obtain almost optimal parallel RLZ-compressed matching applying tree-contraction techniques to directed acyclic graphs with polynomial tree-size. As a corollary we obtain an almost optimal parallel algorithm for LZW-compressed matching which is simpler than the (more general) algorithm in [11]. Highly compressed 2-dimensional texts are also considered.

AB - We consider efficiency of NC-algorithms for pattern-searching in highly compressed one- and two-dimensional texts. “Highly compressed” means that the text can be exponentially large with respect to its compressed version, and “fast” means “in polylogarithmic time”. Given an uncompressed pattern P and a compressed version of a text T, the compressed matching problem is to test if P occurs in T. Two types of closely related compressed representations of 1-dimensional texts are considered: the Lempel-Ziv encodings (LZ, in short) and restricted LZ encodings (RLZ, in short). For highly compressed texts there is a small difference between them, in extreme situations both of them compress text exponentially, e.g. Fibonacci words of size N have compressed versions of size O(logN) for LZ and Restricted LZ encodings. Despite similarities we prove that LZ-compressed matching is P-complete while RLZ-compressed matching is rather trivially in NC. We show how to improve a naive straightforward NC algorithm and obtain almost optimal parallel RLZ-compressed matching applying tree-contraction techniques to directed acyclic graphs with polynomial tree-size. As a corollary we obtain an almost optimal parallel algorithm for LZW-compressed matching which is simpler than the (more general) algorithm in [11]. Highly compressed 2-dimensional texts are also considered.

UR - http://www.scopus.com/inward/record.url?scp=84949211315&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949211315&partnerID=8YFLogxK

U2 - 10.1007/3-540-48340-3_5

DO - 10.1007/3-540-48340-3_5

M3 - Conference contribution

AN - SCOPUS:84949211315

SN - 3540664084

SN - 9783540664086

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 48

EP - 58

BT - Mathematical Foundations of Computer Science 1999 - 24th International Symposium, MFCS 1999, Proceedings

A2 - kutyłowski, Mirosław

A2 - Pacholski, Leszek

A2 - Wierzbicki, Tomasz

PB - Springer Verlag

T2 - 24th International Symposium on Mathematical Foundations of Computer Science, MFCS 1999

Y2 - 6 September 1999 through 10 September 1999

ER -

Efficiency of fast parallel pattern searching in highly compressed texts

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this