TY - GEN

T1 - Episode matching

AU - Das, Gautam

AU - Fleischer, Rudolf

AU - Gąsieniec, Leszek

AU - Gunopulos, Dimitris

AU - Kärkkäinen, Juha

PY - 1997

Y1 - 1997

N2 - Given two words, text T of length n and episode P of length m, the episode matching problem is to find all minimal length substrings of text T that contain episode P as a subsequence. The respective optimization problem is to find the smallest number w, s.t. text T has a subword of length w which contains episode P. In this paper, we introduce a few efficient off-line as well as on-line algorithms for the entire problem, where by on-line algorithms we mean algorithms which search from left to right consecutive text symbols only once. We present two alphabet independent algorithms which work in time O(nm). The off-line algorithm operates in O(1) additional space while the on-line algorithm pays for its property with O(m) additional space. Two other on-line algorithms have subquadratic time complexity. One of them works in time O(nm/log m) and O(m) additional space. The other one gives a time/space trade-off, i.e., it works in time O(n + s +nm log log s/ log(s/m)) when additional space is limited to O(s). Finally, we present two approximation algorithms for the optimization problem. The off-line algorithm is alphabet independent, it has superlinear time complexity O(n/ε + n log log(n/m)) and it uses only constant space. The on-line algorithm works in time O(n/ε + n) and uses space O(m). Both approximation algorithms achieve 1 + ε approximation ratio, for any e > 0.

AB - Given two words, text T of length n and episode P of length m, the episode matching problem is to find all minimal length substrings of text T that contain episode P as a subsequence. The respective optimization problem is to find the smallest number w, s.t. text T has a subword of length w which contains episode P. In this paper, we introduce a few efficient off-line as well as on-line algorithms for the entire problem, where by on-line algorithms we mean algorithms which search from left to right consecutive text symbols only once. We present two alphabet independent algorithms which work in time O(nm). The off-line algorithm operates in O(1) additional space while the on-line algorithm pays for its property with O(m) additional space. Two other on-line algorithms have subquadratic time complexity. One of them works in time O(nm/log m) and O(m) additional space. The other one gives a time/space trade-off, i.e., it works in time O(n + s +nm log log s/ log(s/m)) when additional space is limited to O(s). Finally, we present two approximation algorithms for the optimization problem. The off-line algorithm is alphabet independent, it has superlinear time complexity O(n/ε + n log log(n/m)) and it uses only constant space. The on-line algorithm works in time O(n/ε + n) and uses space O(m). Both approximation algorithms achieve 1 + ε approximation ratio, for any e > 0.

UR - http://www.scopus.com/inward/record.url?scp=84879531103&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879531103&partnerID=8YFLogxK

U2 - 10.1007/3-540-63220-4_46

DO - 10.1007/3-540-63220-4_46

M3 - Conference contribution

AN - SCOPUS:84879531103

SN - 9783540632207

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 12

EP - 27

BT - Combinatorial Pattern Matching - 8th Annual Symposium, CPM 1997, Proceedings

A2 - Apostolico, Alberto

A2 - Apostolico, Alberto

A2 - Hein, Jotun

PB - Springer Verlag

T2 - 8th Annual Symposium on Combinatorial Pattern Matching, CPM 1997

Y2 - 30 June 1997 through 2 July 1997

ER -