TY - GEN

T1 - Efficiency of fast parallel pattern searching in highly compressed texts

AU - Gąsieniec, Leszek

AU - Gibbons, Alan

AU - Rytter, Wojciech

PY - 1999/1/1

Y1 - 1999/1/1

N2 - We consider efficiency of NC-algorithms for pattern-searching in highly compressed one- and two-dimensional texts. “Highly compressed” means that the text can be exponentially large with respect to its compressed version, and “fast” means “in polylogarithmic time”. Given an uncompressed pattern P and a compressed version of a text T, the compressed matching problem is to test if P occurs in T. Two types of closely related compressed representations of 1-dimensional texts are considered: the Lempel-Ziv encodings (LZ, in short) and restricted LZ encodings (RLZ, in short). For highly compressed texts there is a small difference between them, in extreme situations both of them compress text exponentially, e.g. Fibonacci words of size N have compressed versions of size O(logN) for LZ and Restricted LZ encodings. Despite similarities we prove that LZ-compressed matching is P-complete while RLZ-compressed matching is rather trivially in NC. We show how to improve a naive straightforward NC algorithm and obtain almost optimal parallel RLZ-compressed matching applying tree-contraction techniques to directed acyclic graphs with polynomial tree-size. As a corollary we obtain an almost optimal parallel algorithm for LZW-compressed matching which is simpler than the (more general) algorithm in [11]. Highly compressed 2-dimensional texts are also considered.

AB - We consider efficiency of NC-algorithms for pattern-searching in highly compressed one- and two-dimensional texts. “Highly compressed” means that the text can be exponentially large with respect to its compressed version, and “fast” means “in polylogarithmic time”. Given an uncompressed pattern P and a compressed version of a text T, the compressed matching problem is to test if P occurs in T. Two types of closely related compressed representations of 1-dimensional texts are considered: the Lempel-Ziv encodings (LZ, in short) and restricted LZ encodings (RLZ, in short). For highly compressed texts there is a small difference between them, in extreme situations both of them compress text exponentially, e.g. Fibonacci words of size N have compressed versions of size O(logN) for LZ and Restricted LZ encodings. Despite similarities we prove that LZ-compressed matching is P-complete while RLZ-compressed matching is rather trivially in NC. We show how to improve a naive straightforward NC algorithm and obtain almost optimal parallel RLZ-compressed matching applying tree-contraction techniques to directed acyclic graphs with polynomial tree-size. As a corollary we obtain an almost optimal parallel algorithm for LZW-compressed matching which is simpler than the (more general) algorithm in [11]. Highly compressed 2-dimensional texts are also considered.

UR - http://www.scopus.com/inward/record.url?scp=84949211315&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949211315&partnerID=8YFLogxK

U2 - 10.1007/3-540-48340-3_5

DO - 10.1007/3-540-48340-3_5

M3 - Conference contribution

AN - SCOPUS:84949211315

SN - 3540664084

SN - 9783540664086

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 48

EP - 58

BT - Mathematical Foundations of Computer Science 1999 - 24th International Symposium, MFCS 1999, Proceedings

A2 - kutyłowski, Mirosław

A2 - Pacholski, Leszek

A2 - Wierzbicki, Tomasz

PB - Springer Verlag

T2 - 24th International Symposium on Mathematical Foundations of Computer Science, MFCS 1999

Y2 - 6 September 1999 through 10 September 1999

ER -