The edit distance between strings classically assigns unit cost to every
character insertion, deletion, and substitution, whereas the Hamming distance
only allows substitutions. In many real-life scenarios, insertions and
deletions (abbreviated indels) appear frequently but significantly less so than
substitutions. To model this, we consider substitutions being cheaper than
indels, with cost 1/a for a parameter a≥1. This basic variant, denoted
EDa, bridges classical edit distance (a=1) with Hamming distance
(a→∞), leading to interesting algorithmic challenges: Does the time
complexity of computing EDa interpolate between that of Hamming distance
(linear time) and edit distance (quadratic time)? What about approximating
EDa?
We first present a simple deterministic exact algorithm for EDa and
further prove that it is near-optimal assuming the Orthogonal Vectors
Conjecture. Our main result is a randomized algorithm computing a
(1+ϵ)-approximation of EDa(X,Y), given strings X,Y of total
length n and a bound k≥EDa(X,Y). For simplicity, let us focus on k≥1 and a constant ϵ>0; then, our algorithm takes O~(n/a+ak3) time. Unless a=O~(1) and for small enough k, this running
time is sublinear in n. We also consider a very natural version that asks to
find a (kI,kS)-alignment -- an alignment with at most kI indels and
kS substitutions. In this setting, we give an exact algorithm and, more
importantly, an O~(nkI/kS+kS⋅kI3)-time
(1,1+ϵ)-bicriteria approximation algorithm. The latter solution is
based on the techniques we develop for EDa for a=Θ(kS/kI). These
bounds are in stark contrast to unit-cost edit distance, where state-of-the-art
algorithms are far from achieving (1+ϵ)-approximation in sublinear
time, even for a favorable choice of k.Comment: The full version of a paper accepted to ITCS 2023; abstract shortened
to meet arXiv requirement