4 research outputs found
Faster Approximate Pattern Matching: A Unified Approach
Approximate pattern matching is a natural and well-studied problem on
strings: Given a text , a pattern , and a threshold , find (the
starting positions of) all substrings of that are at distance at most
from . We consider the two most fundamental string metrics: the Hamming
distance and the edit distance. Under the Hamming distance, we search for
substrings of that have at most mismatches with , while under the
edit distance, we search for substrings of that can be transformed to
with at most edits.
Exact occurrences of in have a very simple structure: If we assume
for simplicity that and trim so that occurs both as a
prefix and as a suffix of , then both and are periodic with a common
period. However, an analogous characterization for the structure of occurrences
with up to mismatches was proved only recently by Bringmann et al.
[SODA'19]: Either there are -mismatch occurrences of in , or
both and are at Hamming distance from strings with a common
period . We tighten this characterization by showing that there are
-mismatch occurrences in the case when the pattern is not
(approximately) periodic, and we lift it to the edit distance setting, where we
tightly bound the number of -edit occurrences by in the
non-periodic case. Our proofs are constructive and let us obtain a unified
framework for approximate pattern matching for both considered distances. We
showcase the generality of our framework with results for the fully-compressed
setting (where and are given as a straight-line program) and for the
dynamic setting (where we extend a data structure of Gawrychowski et al.
[SODA'18]).Comment: 74 pages, 7 figures, FOCS'2