An Algorithmic Bridge Between Hamming and Levenshtein Distances

Abstract

The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions (abbreviated indels) appear frequently but significantly less so than substitutions. To model this, we consider substitutions being cheaper than indels, with cost 1/a1/a for a parameter a1a\ge 1. This basic variant, denoted EDaED_a, bridges classical edit distance (a=1a=1) with Hamming distance (aa\to\infty), leading to interesting algorithmic challenges: Does the time complexity of computing EDaED_a interpolate between that of Hamming distance (linear time) and edit distance (quadratic time)? What about approximating EDaED_a? We first present a simple deterministic exact algorithm for EDaED_a and further prove that it is near-optimal assuming the Orthogonal Vectors Conjecture. Our main result is a randomized algorithm computing a (1+ϵ)(1+\epsilon)-approximation of EDa(X,Y)ED_a(X,Y), given strings X,YX,Y of total length nn and a bound kEDa(X,Y)k\ge ED_a(X,Y). For simplicity, let us focus on k1k\ge 1 and a constant ϵ>0\epsilon > 0; then, our algorithm takes O~(n/a+ak3)\tilde{O}(n/a + ak^3) time. Unless a=O~(1)a=\tilde{O}(1) and for small enough kk, this running time is sublinear in nn. We also consider a very natural version that asks to find a (kI,kS)(k_I, k_S)-alignment -- an alignment with at most kIk_I indels and kSk_S substitutions. In this setting, we give an exact algorithm and, more importantly, an O~(nkI/kS+kSkI3)\tilde{O}(nk_I/k_S + k_S\cdot k_I^3)-time (1,1+ϵ)(1,1+\epsilon)-bicriteria approximation algorithm. The latter solution is based on the techniques we develop for EDaED_a for a=Θ(kS/kI)a=\Theta(k_S / k_I). These bounds are in stark contrast to unit-cost edit distance, where state-of-the-art algorithms are far from achieving (1+ϵ)(1+\epsilon)-approximation in sublinear time, even for a favorable choice of kk.Comment: The full version of a paper accepted to ITCS 2023; abstract shortened to meet arXiv requirement

    Similar works