5,980 research outputs found

    Edit Distance in Near-Linear Time: it's a Constant Factor

    Full text link
    We present an algorithm for approximating the edit distance between two strings of length nn in time n1+ϵn^{1+\epsilon}, for any ϵ>0\epsilon>0, up to a constant factor. Our result completes the research direction set forth in the recent breakthrough paper [Chakraborty-Das-Goldenberg-Koucky-Saks, FOCS'18], which showed the first constant-factor approximation algorithm with a (strongly) sub-quadratic running time. Several recent results have shown near-linear complexity under different restrictions on the inputs (eg, when the edit distance is close to maximal, or when one of the inputs is pseudo-random). In contrast, our algorithm obtains a constant-factor approximation in near-linear running time for any input strings

    An Algorithmic Bridge Between Hamming and Levenshtein Distances

    Get PDF
    The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions (abbreviated indels) appear frequently but significantly less so than substitutions. To model this, we consider substitutions being cheaper than indels, with cost 1/a1/a for a parameter a1a\ge 1. This basic variant, denoted EDaED_a, bridges classical edit distance (a=1a=1) with Hamming distance (aa\to\infty), leading to interesting algorithmic challenges: Does the time complexity of computing EDaED_a interpolate between that of Hamming distance (linear time) and edit distance (quadratic time)? What about approximating EDaED_a? We first present a simple deterministic exact algorithm for EDaED_a and further prove that it is near-optimal assuming the Orthogonal Vectors Conjecture. Our main result is a randomized algorithm computing a (1+ϵ)(1+\epsilon)-approximation of EDa(X,Y)ED_a(X,Y), given strings X,YX,Y of total length nn and a bound kEDa(X,Y)k\ge ED_a(X,Y). For simplicity, let us focus on k1k\ge 1 and a constant ϵ>0\epsilon > 0; then, our algorithm takes O~(n/a+ak3)\tilde{O}(n/a + ak^3) time. Unless a=O~(1)a=\tilde{O}(1) and for small enough kk, this running time is sublinear in nn. We also consider a very natural version that asks to find a (kI,kS)(k_I, k_S)-alignment -- an alignment with at most kIk_I indels and kSk_S substitutions. In this setting, we give an exact algorithm and, more importantly, an O~(nkI/kS+kSkI3)\tilde{O}(nk_I/k_S + k_S\cdot k_I^3)-time (1,1+ϵ)(1,1+\epsilon)-bicriteria approximation algorithm. The latter solution is based on the techniques we develop for EDaED_a for a=Θ(kS/kI)a=\Theta(k_S / k_I). These bounds are in stark contrast to unit-cost edit distance, where state-of-the-art algorithms are far from achieving (1+ϵ)(1+\epsilon)-approximation in sublinear time, even for a favorable choice of kk.Comment: The full version of a paper accepted to ITCS 2023; abstract shortened to meet arXiv requirement

    Near-Linear Time Insertion-Deletion Codes and (1+ε\varepsilon)-Approximating Edit Distance via Indexing

    Full text link
    We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string II. In particular, for every length nn and every ε>0\varepsilon >0, one can in near linear time construct a string IΣnI \in \Sigma'^n with Σ=Oε(1)|\Sigma'| = O_{\varepsilon}(1), such that, indexing any string SΣnS \in \Sigma^n, symbol-by-symbol, with II results in a string SΣnS' \in \Sigma''^n where Σ=Σ×Σ\Sigma'' = \Sigma \times \Sigma' for which edit distance computations are easy, i.e., one can compute a (1+ε)(1+\varepsilon)-approximation of the edit distance between SS' and any other string in O(npoly(logn))O(n \text{poly}(\log n)) time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes

    Distributed PCP Theorems for Hardness of Approximation in P

    Get PDF
    We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment x{0,1}nx \in \{0,1\}^n to a CNF formula φ\varphi is shared between two parties, where Alice knows x1,,xn/2x_1, \dots, x_{n/2}, Bob knows xn/2+1,,xnx_{n/2+1},\dots,x_n, and both parties know φ\varphi. The goal is to have Alice and Bob jointly write a PCP that xx satisfies φ\varphi, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of xx. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of 2(logn)1o(1)2^{(\log n)^{1-o(1)}}; only (1+o(1))(1+o(1))-factor lower bounds (under SETH) were known before

    Constant-factor approximation of near-linear edit distance in near-linear time

    Full text link
    We show that the edit distance between two strings of length nn can be computed within a factor of f(ϵ)f(\epsilon) in n1+ϵn^{1+\epsilon} time as long as the edit distance is at least n1δn^{1-\delta} for some δ(ϵ)>0\delta(\epsilon) > 0.Comment: 40 pages, 4 figure
    corecore