Time/Space Efficient Compressed Pattern Matching

Leszek Ga̧sieniec, Igor Potapov

Research output: Contribution to journalArticle

7 Scopus citations

Abstract

An exact pattern matching problem is to find all occurrences of a pattern p in a text t. We say that the pattern matching algorithm is optimal if its running time is linear in the sizes of t and p, i.e., 0(t - p). Perhaps one of the most interesting settings of the pattern matching problem is when one has to design an efficient algorithm with a help of a small extra space. In this paper we explore this setting to the extreme. We work under an assumption that the text t is available only in a compressed form, represented by a straight-line program. The compression methods based on efficient construction of straight-line programs are as competitive as the compression standards, including the Lempel-Ziv compression scheme and recently intensively studied text compression via block sorting, due to Burrows and Wheeler. Our main result is an algorithm that solves the compressed string matching problem in an optimal linear time, with a help of a constant extra space. We also discuss an efficient implementation of a version our algorithm showing that the new concept may have also some interesting real applications. Our result is in contrast with many other compressed pattern matching algorithms where the goal is to find all pattern occurrences in time related to the size of the compressed text. However one must remember that all previous algorithms used at least a linear (in a compressed text, a dictionary, or a pattern) extra memory while our algorithm can be implemented in a constant size extra space. Also from the practical point of view, when the compression ratio is constant (very rarely smaller than 25%), there is no dramatic difference between the running time based on the size of the compressed text and the size of the original text, while an extra space resources might be strictly limited.

Original languageEnglish (US)
Pages (from-to)137-154
Number of pages18
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume56
Issue number1-2
StatePublished - Jul 1 2003
Externally publishedYes
Event13th International Symposium on Fundamentals of Computation Theory, FCT 2001 - Riga, Latvia
Duration: Aug 22 2001Aug 24 2001

Keywords

  • Compressed pattern matching
  • Directed acyclic graph traversal
  • Small extra space
  • Straight-line program

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Algebra and Number Theory
  • Information Systems
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Time/Space Efficient Compressed Pattern Matching'. Together they form a unique fingerprint.

  • Cite this