The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time

Abstract

We revisit the kk-mismatch problem in the streaming model on a pattern of length mm and a streaming text of length nn, both over a size-σ\sigma alphabet. The current state-of-the-art algorithm for the streaming kk-mismatch problem, by Clifford et al. [SODA 2019], uses O~(k)\tilde O(k) space and O~(k)\tilde O\big(\sqrt k\big) worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is O~(nk)\tilde O(n\sqrt k), and the fastest known offline algorithm, which costs O~(n+min(nkm,σn))\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big) time. Moreover, it is not known whether improvements over the O~(nk)\tilde O(n\sqrt k) total time are possible when using more than O(k)O(k) space. We address these gaps by designing a randomized streaming algorithm for the kk-mismatch problem that, given an integer parameter ksmk\le s \le m, uses O~(s)\tilde O(s) space and costs O~(n+min(nk2m,nks,σnms))\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{\sigma nm}s\big)\big) total time. For s=ms=m, the total runtime becomes O~(n+min(nkm,σn))\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big), which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still O~(k)\tilde O\big(\sqrt k\big).Comment: Extended abstract to appear in CPM 202

    Similar works