1,098 research outputs found

    Hardness of longest common subsequence for sequences with bounded run-lengths

    Get PDF
    International audienceThe longest common subsequence (LCS) problem is a classic and well-studied problem in computer science with extensive applications in diverse areas ranging from spelling error corrections to molecular biology. This paper focuses on LCS for fixed alphabet size and fixed run-lengths (i.e., maximum number of consecutive occurrences of the same symbol). We show that LCS is NP-complete even when restricted to (i) alphabets of size 3 and run-length at most 1, and (ii) alphabets of size 2 and run-length at most 2 (both results are tight). For the latter case, we show that the problem is approximable within ratio 3/5

    Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS

    Get PDF
    We study sketching and streaming algorithms for the Longest Common Subsequence problem (LCS) on strings of small alphabet size |Sigma|. For the problem of deciding whether the LCS of strings x,y has length at least L, we obtain a sketch size and streaming space usage of O(L^{|Sigma| - 1} log L). We also prove matching unconditional lower bounds. As an application, we study a variant of LCS where each alphabet symbol is equipped with a weight that is given as input, and the task is to compute a common subsequence of maximum total weight. Using our sketching algorithm, we obtain an O(min{nm, n + m^{|Sigma|}})-time algorithm for this problem, on strings x,y of length n,m, with n >= m. We prove optimality of this running time up to lower order factors, assuming the Strong Exponential Time Hypothesis

    Near-Linear Time Insertion-Deletion Codes and (1+ε\varepsilon)-Approximating Edit Distance via Indexing

    Full text link
    We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string II. In particular, for every length nn and every ε>0\varepsilon >0, one can in near linear time construct a string IΣnI \in \Sigma'^n with Σ=Oε(1)|\Sigma'| = O_{\varepsilon}(1), such that, indexing any string SΣnS \in \Sigma^n, symbol-by-symbol, with II results in a string SΣnS' \in \Sigma''^n where Σ=Σ×Σ\Sigma'' = \Sigma \times \Sigma' for which edit distance computations are easy, i.e., one can compute a (1+ε)(1+\varepsilon)-approximation of the edit distance between SS' and any other string in O(npoly(logn))O(n \text{poly}(\log n)) time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes

    Faster STR-IC-LCS Computation via RLE

    Get PDF
    The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz\u27s algorithm, and is faster when input strings are compressible via RLE
    corecore