6,141 research outputs found
Edit Distance in Near-Linear Time: it's a Constant Factor
We present an algorithm for approximating the edit distance between two
strings of length in time , for any , up to a
constant factor. Our result completes the research direction set forth in the
recent breakthrough paper [Chakraborty-Das-Goldenberg-Koucky-Saks, FOCS'18],
which showed the first constant-factor approximation algorithm with a
(strongly) sub-quadratic running time. Several recent results have shown
near-linear complexity under different restrictions on the inputs (eg, when the
edit distance is close to maximal, or when one of the inputs is pseudo-random).
In contrast, our algorithm obtains a constant-factor approximation in
near-linear running time for any input strings
An Algorithmic Bridge Between Hamming and Levenshtein Distances
The edit distance between strings classically assigns unit cost to every
character insertion, deletion, and substitution, whereas the Hamming distance
only allows substitutions. In many real-life scenarios, insertions and
deletions (abbreviated indels) appear frequently but significantly less so than
substitutions. To model this, we consider substitutions being cheaper than
indels, with cost for a parameter . This basic variant, denoted
, bridges classical edit distance () with Hamming distance
(), leading to interesting algorithmic challenges: Does the time
complexity of computing interpolate between that of Hamming distance
(linear time) and edit distance (quadratic time)? What about approximating
?
We first present a simple deterministic exact algorithm for and
further prove that it is near-optimal assuming the Orthogonal Vectors
Conjecture. Our main result is a randomized algorithm computing a
-approximation of , given strings of total
length and a bound . For simplicity, let us focus on and a constant ; then, our algorithm takes time. Unless and for small enough , this running
time is sublinear in . We also consider a very natural version that asks to
find a -alignment -- an alignment with at most indels and
substitutions. In this setting, we give an exact algorithm and, more
importantly, an -time
-bicriteria approximation algorithm. The latter solution is
based on the techniques we develop for for . These
bounds are in stark contrast to unit-cost edit distance, where state-of-the-art
algorithms are far from achieving -approximation in sublinear
time, even for a favorable choice of .Comment: The full version of a paper accepted to ITCS 2023; abstract shortened
to meet arXiv requirement
Near-Linear Time Insertion-Deletion Codes and (1+)-Approximating Edit Distance via Indexing
We introduce fast-decodable indexing schemes for edit distance which can be
used to speed up edit distance computations to near-linear time if one of the
strings is indexed by an indexing string . In particular, for every length
and every , one can in near linear time construct a string
with , such that, indexing
any string , symbol-by-symbol, with results in a string where for which edit
distance computations are easy, i.e., one can compute a
-approximation of the edit distance between and any other
string in time.
Our indexing schemes can be used to improve the decoding complexity of
state-of-the-art error correcting codes for insertions and deletions. In
particular, they lead to near-linear time decoding algorithms for the
insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding
algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi,
Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in
the construction of fast-decodable indexing schemes
Distributed PCP Theorems for Hardness of Approximation in P
We present a new distributed model of probabilistically checkable proofs
(PCP). A satisfying assignment to a CNF formula is
shared between two parties, where Alice knows , Bob knows
, and both parties know . The goal is to have
Alice and Bob jointly write a PCP that satisfies , while
exchanging little or no information. Unfortunately, this model as-is does not
allow for nontrivial query complexity. Instead, we focus on a non-deterministic
variant, where the players are helped by Merlin, a third party who knows all of
.
Using our framework, we obtain, for the first time, PCP-like reductions from
the Strong Exponential Time Hypothesis (SETH) to approximation problems in P.
In particular, under SETH we show that there are no truly-subquadratic
approximation algorithms for Bichromatic Maximum Inner Product over
{0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate
Regular Expression Matching, and Diameter in Product Metric. All our
inapproximability factors are nearly-tight. In particular, for the first two
problems we obtain nearly-polynomial factors of ; only
-factor lower bounds (under SETH) were known before
Constant-factor approximation of near-linear edit distance in near-linear time
We show that the edit distance between two strings of length can be
computed within a factor of in time as long as
the edit distance is at least for some .Comment: 40 pages, 4 figure
- …