Search CORE

6,141 research outputs found

Edit Distance in Near-Linear Time: it's a Constant Factor

Author: Andoni Alexandr
Nosatzki Negev Shekel
Publication venue
Publication date: 15/05/2020
Field of study

We present an algorithm for approximating the edit distance between two strings of length

n

in time

n^{1+\epsilon}

, for any

\epsilon>0

, up to a constant factor. Our result completes the research direction set forth in the recent breakthrough paper [Chakraborty-Das-Goldenberg-Koucky-Saks, FOCS'18], which showed the first constant-factor approximation algorithm with a (strongly) sub-quadratic running time. Several recent results have shown near-linear complexity under different restrictions on the inputs (eg, when the edit distance is close to maximal, or when one of the inputs is pseudo-random). In contrast, our algorithm obtains a constant-factor approximation in near-linear running time for any input strings

arXiv.org e-Print Archive

An Algorithmic Bridge Between Hamming and Levenshtein Distances

Author: Goldenberg Elazar
Kociumaka Tomasz
Krauthgamer Robert
Saha Barna
Publication venue
Publication date: 01/01/2022
Field of study

The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions (abbreviated indels) appear frequently but significantly less so than substitutions. To model this, we consider substitutions being cheaper than indels, with cost

1/a

for a parameter

a\ge 1

. This basic variant, denoted

ED_a

, bridges classical edit distance (

a=1

) with Hamming distance (

a\to\infty

), leading to interesting algorithmic challenges: Does the time complexity of computing

ED_a

interpolate between that of Hamming distance (linear time) and edit distance (quadratic time)? What about approximating

ED_a

? We first present a simple deterministic exact algorithm for

ED_a

and further prove that it is near-optimal assuming the Orthogonal Vectors Conjecture. Our main result is a randomized algorithm computing a

(1+\epsilon)

-approximation of

ED_a(X,Y)

, given strings

X,Y

of total length

n

and a bound

k\ge ED_a(X,Y)

. For simplicity, let us focus on

k\ge 1

and a constant

\epsilon > 0

; then, our algorithm takes

\tilde{O}(n/a + ak^3)

time. Unless

a=\tilde{O}(1)

and for small enough

k

, this running time is sublinear in

n

. We also consider a very natural version that asks to find a

(k_I, k_S)

-alignment -- an alignment with at most

k_I

indels and

k_S

substitutions. In this setting, we give an exact algorithm and, more importantly, an

\tilde{O}(nk_I/k_S + k_S\cdot k_I^3)

-time

(1,1+\epsilon)

-bicriteria approximation algorithm. The latter solution is based on the techniques we develop for

ED_a

for

a=\Theta(k_S / k_I)

. These bounds are in stark contrast to unit-cost edit distance, where state-of-the-art algorithms are far from achieving

(1+\epsilon)

-approximation in sublinear time, even for a favorable choice of

k

.Comment: The full version of a paper accepted to ITCS 2023; abstract shortened to meet arXiv requirement

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

MPG.PuRe

Near-Linear Time Insertion-Deletion Codes and (1+ $\varepsilon$ )-Approximating Edit Distance via Indexing

Author: Approximating
Efficiently
Goldwasser Shafi
Haeupler Bernhard
Haeupler Bernhard
Polylogarithmic
Selected
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/04/2019
Field of study

We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string

I

. In particular, for every length

n

and every

\varepsilon >0

, one can in near linear time construct a string

I \in \Sigma'^n

with

|\Sigma'| = O_{\varepsilon}(1)

, such that, indexing any string

S \in \Sigma^n

, symbol-by-symbol, with

I

results in a string

S' \in \Sigma''^n

where

\Sigma'' = \Sigma \times \Sigma'

for which edit distance computations are easy, i.e., one can compute a

(1+\varepsilon)

-approximation of the edit distance between

S'

and any other string in

O(n \text{poly}(\log n))

time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes

arXiv.org e-Print Archive

Crossref

Distributed PCP Theorems for Hardness of Approximation in P

Author: Abboud Amir
Rubinstein Aviad
Williams Ryan
Publication venue
Publication date: 01/01/1952
Field of study

We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment

x \in \{0,1\}^n

to a CNF formula

\varphi

is shared between two parties, where Alice knows

x_1, \dots, x_{n/2}

, Bob knows

x_{n/2+1},\dots,x_n

, and both parties know

\varphi

. The goal is to have Alice and Bob jointly write a PCP that

x

satisfies

\varphi

, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of

x

. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of