4 research outputs found

    Faster Approximate Pattern Matching: A Unified Approach

    Get PDF
    Approximate pattern matching is a natural and well-studied problem on strings: Given a text TT, a pattern PP, and a threshold kk, find (the starting positions of) all substrings of TT that are at distance at most kk from PP. We consider the two most fundamental string metrics: the Hamming distance and the edit distance. Under the Hamming distance, we search for substrings of TT that have at most kk mismatches with PP, while under the edit distance, we search for substrings of TT that can be transformed to PP with at most kk edits. Exact occurrences of PP in TT have a very simple structure: If we assume for simplicity that ∣T∣≤3∣P∣/2|T| \le 3|P|/2 and trim TT so that PP occurs both as a prefix and as a suffix of TT, then both PP and TT are periodic with a common period. However, an analogous characterization for the structure of occurrences with up to kk mismatches was proved only recently by Bringmann et al. [SODA'19]: Either there are O(k2)O(k^2) kk-mismatch occurrences of PP in TT, or both PP and TT are at Hamming distance O(k)O(k) from strings with a common period O(m/k)O(m/k). We tighten this characterization by showing that there are O(k)O(k) kk-mismatch occurrences in the case when the pattern is not (approximately) periodic, and we lift it to the edit distance setting, where we tightly bound the number of kk-edit occurrences by O(k2)O(k^2) in the non-periodic case. Our proofs are constructive and let us obtain a unified framework for approximate pattern matching for both considered distances. We showcase the generality of our framework with results for the fully-compressed setting (where TT and PP are given as a straight-line program) and for the dynamic setting (where we extend a data structure of Gawrychowski et al. [SODA'18]).Comment: 74 pages, 7 figures, FOCS'2
    corecore