Improved Circular k-Mismatch Sketches

Abstract

The shift distance sh(S1,S2)\mathsf{sh}(S_1,S_2) between two strings S1S_1 and S2S_2 of the same length is defined as the minimum Hamming distance between S1S_1 and any rotation (cyclic shift) of S2S_2. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings S1S_1 and S2S_2 of length nn are given to two identical players (encoders), who independently compute sketches (summaries) sk(S1)\mathtt{sk}(S_1) and sk(S2)\mathtt{sk}(S_2), respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) sh(S1,S2)\mathsf{sh}(S_1,S_2) with high probability. This paper primarily focuses on the more general kk-mismatch version of the problem, where the decoder is allowed to declare a failure if sh(S1,S2)>k\mathsf{sh}(S_1,S_2)>k, where kk is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular kk-mismatch sketches of size O~(k+D(n))\widetilde{O}(k+D(n)), where D(n)D(n) is the number of divisors of nn. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular kk-mismatch sketch of size O~(k)\widetilde{O}(k); this size matches communication-complexity lower bounds. We also design (1±ε)(1\pm \varepsilon)-approximate circular kk-mismatch sketch of size O~(min(ε2k,ε1.5n))\widetilde{O}(\min(\varepsilon^{-2}\sqrt{k}, \varepsilon^{-1.5}\sqrt{n})), which improves upon an O~(ε2n)\widetilde{O}(\varepsilon^{-2}\sqrt{n})-size sketch of Crouch and McGregor (APPROX'11)

    Similar works