202 research outputs found

    Hardness of Approximation of (Multi-)LCS over Small Alphabet

    Get PDF
    The problem of finding longest common subsequence (LCS) is one of the fundamental problems in computer science, which finds application in fields such as computational biology, text processing, information retrieval, data compression etc. It is well known that (decision version of) the problem of finding the length of a LCS of an arbitrary number of input sequences (which we refer to as Multi-LCS problem) is NP-complete. Jiang and Li [SICOMP\u2795] showed that if Max-Clique is hard to approximate within a factor of s then Multi-LCS is also hard to approximate within a factor of ?(s). By the NP-hardness of the problem of approximating Max-Clique by Zuckerman [ToC\u2707], for any constant ? > 0, the length of a LCS of arbitrary number of input sequences of length n each, cannot be approximated within an n^{1-?}-factor in polynomial time unless {P}={NP}. However, the reduction of Jiang and Li assumes the alphabet size to be ?(n). So far no hardness result is known for the problem of approximating Multi-LCS over sub-linear sized alphabet. On the other hand, it is easy to get 1/|?|-factor approximation for strings of alphabet ?. In this paper, we make a significant progress towards proving hardness of approximation over small alphabet by showing a polynomial-time reduction from the well-studied densest k-subgraph problem with perfect completeness to approximating Multi-LCS over alphabet of size poly(n/k). As a consequence, from the known hardness result of densest k-subgraph problem (e.g. [Manurangsi, STOC\u2717]) we get that no polynomial-time algorithm can give an n^{-o(1)}-factor approximation of Multi-LCS over an alphabet of size n^{o(1)}, unless the Exponential Time Hypothesis is false

    Re-Use Dynamic Programming for Sequence Alignment: An Algorithmic Toolkit

    Get PDF
    International audienceThe problem of comparing two sequences S and T to determine their similarity is one of the fundamental problems in pattern matching. In this manuscript we will be primarily concerned with sequences as our objects and with various string comparison metrics. Our goal is to survey a methodology for utilizing repetitions in sequences in order to speed up the comparison process. Within this framework we consider various methods of parsing the sequences in order to frame their repetitions, and present a toolkit of various solutions whose time complexity depends both on the chosen parsing method as well as on the string-comparison metric used for the alignment

    Hardness of longest common subsequence for sequences with bounded run-lengths

    Get PDF
    International audienceThe longest common subsequence (LCS) problem is a classic and well-studied problem in computer science with extensive applications in diverse areas ranging from spelling error corrections to molecular biology. This paper focuses on LCS for fixed alphabet size and fixed run-lengths (i.e., maximum number of consecutive occurrences of the same symbol). We show that LCS is NP-complete even when restricted to (i) alphabets of size 3 and run-length at most 1, and (ii) alphabets of size 2 and run-length at most 2 (both results are tight). For the latter case, we show that the problem is approximable within ratio 3/5
    • …
    corecore