We revisit the $k$-mismatch problem in the streaming model on a pattern of
length $m$ and a streaming text of length $n$, both over a size-$\sigma$
alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch
problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde
O\big(\sqrt k\big)$ worst-case time per character. The space complexity is
known to be (unconditionally) optimal, and the worst-case time per character
matches a conditional lower bound. However, there is a gap between the total
time cost of the algorithm, which is $\tilde O(n\sqrt k)$, and the fastest
known offline algorithm, which costs $\tilde O\big(n + \min\big(\frac{nk}{\sqrt
m},\sigma n\big)\big)$ time. Moreover, it is not known whether improvements
over the $\tilde O(n\sqrt k)$ total time are possible when using more than
$O(k)$ space.
  We address these gaps by designing a randomized streaming algorithm for the
$k$-mismatch problem that, given an integer parameter $k\le s \le m$, uses
$\tilde O(s)$ space and costs $\tilde O\big(n+\min\big(\frac
{nk^2}m,\frac{nk}{\sqrt s},\frac{\sigma nm}s\big)\big)$ total time. For $s=m$,
the total runtime becomes $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma
n\big)\big)$, which matches the time cost of the fastest offline algorithm.
Moreover, the worst-case time cost per character is still $\tilde O\big(\sqrt
k\big)$.Comment: Extended abstract to appear in CPM 202

Golan, Shay

Kociumaka, Tomasz

Kopelowitz, Tsvi

Porat, Ely

English

arXiv

Dagstuhl Research Online Publication Server

The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time

https://drops.dagstuhl.de/opus/volltexte/2020/12140/pdf/LIPIcs-CPM-2020-15.pdf

The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time

Abstract

Similar works

Full text

Available Versions

Dagstuhl Research Online Publication Server