Search CORE

3,861 research outputs found

The k-mismatch problem revisited

Author: Clifford Raphaël
Fontaine Allyx
Porat Ely
Sach Benjamin
Starikovskaya Tatiana
Publication venue
Publication date: 27/08/2015
Field of study

We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No". We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k-mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows: 1) Our first result is a deterministic

O(n k^2\log{k} / m+n \text{polylog} m)

time offline algorithm for k-mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 by Amihood Amir et al. 2) We then give a randomised and online algorithm which runs in the same time complexity but requires only

O(k^2\text{polylog} {m})

space in total. 3) Next we give a randomised

(1+\epsilon)

-approximation algorithm for the streaming k-mismatch problem which uses

O(k^2\text{polylog} m / \epsilon^2)

space and runs in

O(\text{polylog} m / \epsilon^2)

worst-case time per arriving symbol. 4) Finally we combine our new results to derive a randomised

O(k^2\text{polylog} {m})

space algorithm for the streaming k-mismatch problem which runs in

O(\sqrt{k}\log{k} + \text{polylog} {m})

worst-case time per arriving symbol. This improves the best previous space complexity for streaming k-mismatch from FOCS 2009 by Benny Porat and Ely Porat by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors)

arXiv.org e-Print Archive