Search CORE

365 research outputs found

Hardness of longest common subsequence for sequences with bounded run-lengths

Author: Blin Guillaume
Bulteau Laurent
Jiang Minghui
Tejada Pedro J.
Vialette Stéphane
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

International audienceThe longest common subsequence (LCS) problem is a classic and well-studied problem in computer science with extensive applications in diverse areas ranging from spelling error corrections to molecular biology. This paper focuses on LCS for fixed alphabet size and fixed run-lengths (i.e., maximum number of consecutive occurrences of the same symbol). We show that LCS is NP-complete even when restricted to (i) alphabets of size 3 and run-length at most 1, and (ii) alphabets of size 2 and run-length at most 2 (both results are tight). For the latter case, we show that the problem is approximable within ratio 3/5

Crossref

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Faster subsequence recognition in compressed strings

Author: A Tiskin
A Tiskin
A. Tiskin
BW Watson
CER Alves
G Myers
G Navarro
G Ziv
G Ziv
J Kärkkäinen
JL Bentley
M Crochemore
P Cégielski
TA Welch
W Rytter
WJ Masek
Publication venue
Publication date: 18/01/2008
Field of study

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel--Ziv compression. For an SLP-compressed text of length

\bar m

, and an uncompressed pattern of length

n

, C{\'e}gielski et al. gave an algorithm for local subsequence recognition running in time

O(\bar mn^2 \log n)

. We improve the running time to

O(\bar mn^{1.5})

. Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time

O(\bar mn^{1.5})

; the same problem with a compressed pattern is known to be NP-hard

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Hardness of Approximation of (Multi-)LCS over Small Alphabet

Author: Bhangale Amey
Chakraborty Diptarka
Kumar Rajendra
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)
Publication date: 01/01/2020
Field of study

The problem of finding longest common subsequence (LCS) is one of the fundamental problems in computer science, which finds application in fields such as computational biology, text processing, information retrieval, data compression etc. It is well known that (decision version of) the problem of finding the length of a LCS of an arbitrary number of input sequences (which we refer to as Multi-LCS problem) is NP-complete. Jiang and Li [SICOMP\u2795] showed that if Max-Clique is hard to approximate within a factor of s then Multi-LCS is also hard to approximate within a factor of ?(s). By the NP-hardness of the problem of approximating Max-Clique by Zuckerman [ToC\u2707], for any constant ? > 0, the length of a LCS of arbitrary number of input sequences of length n each, cannot be approximated within an n^{1-?}-factor in polynomial time unless {P}={NP}. However, the reduction of Jiang and Li assumes the alphabet size to be ?(n). So far no hardness result is known for the problem of approximating Multi-LCS over sub-linear sized alphabet. On the other hand, it is easy to get 1/|?|-factor approximation for strings of alphabet ?. In this paper, we make a significant progress towards proving hardness of approximation over small alphabet by showing a polynomial-time reduction from the well-studied densest k-subgraph problem with perfect completeness to approximating Multi-LCS over alphabet of size poly(n/k). As a consequence, from the known hardness result of densest k-subgraph problem (e.g. [Manurangsi, STOC\u2717]) we get that no polynomial-time algorithm can give an n^{-o(1)}-factor approximation of Multi-LCS over an alphabet of size n^{o(1)}, unless the Exponential Time Hypothesis is false

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

eScholarship - University of California

Multivariate Fine-Grained Complexity of Longest Common Subsequence

Author: Bringmann Karl
Künnemann Marvin
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2018
Field of study

We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings

x

and

y

of length

n

, a textbook algorithm solves LCS in time

O(n^2)

, but although much effort has been spent, no

O(n^{2-\varepsilon})

-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size

n:=\max\{|x|,|y|\}

, the length of the shorter string

m:=\min\{|x|,|y|\}

, the length

L

of an LCS of

x

and

y

, the numbers of deletions

\delta := m-L

and

\Delta := n-L

, the alphabet size, as well as the numbers of matching pairs

M

and dominant pairs

d

. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as

(n+\min\{d, \delta \Delta, \delta m\})^{1\pm o(1)}

. [...]Comment: Presented at SODA'18. Full Version. 66 page

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve

Author: Abboud A.
Backurs A.
Bringmann K.
Künnemann M.
Publication venue
Publication date: 01/01/2018
Field of study

Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size

n

of data that originally has size

N

, and we want to solve a problem with time complexity

T(\cdot)

. The naive strategy of "decompress-and-solve" gives time

T(N)

, whereas "the gold standard" is time

T(n)

: to analyze the compression as efficiently as if the original data was small. We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of Grammar Compressions. A vast literature, across many disciplines, established this as an influential notion for Algorithm design. We introduce a framework for proving (conditional) lower bounds in this field, allowing us to assess whether decompress-and-solve can be improved, and by how much. Our main results are: - The

O(nN\sqrt{\log{N/n}})

bound for LCS and the

O(\min\{N \log N, nM\})

bound for Pattern Matching with Wildcards are optimal up to

N^{o(1)}

factors, under the Strong Exponential Time Hypothesis. (Here,

M

denotes the uncompressed length of the compressed pattern.) - Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the

k

-Clique conjecture. - We give an algorithm showing that decompress-and-solve is not optimal for Disjointness

MPG.PuRe

Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve

Author: Abboud Amir
Backurs Arturs
Bringmann Karl
Künnemann Marvin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

n

of data that originally has size

N

, and we want to solve a problem with time complexity

T(\cdot)

. The naive strategy of "decompress-and-solve" gives time

T(N)

, whereas "the gold standard" is time

T(n)

O(nN\sqrt{\log{N/n}})

bound for LCS and the

O(\min\{N \log N, nM\})

bound for Pattern Matching with Wildcards are optimal up to

N^{o(1)}

factors, under the Strong Exponential Time Hypothesis. (Here,

M

denotes the uncompressed length of the compressed pattern.) - Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the

k

-Clique conjecture. - We give an algorithm showing that decompress-and-solve is not optimal for Disjointness.Comment: Presented at FOCS'17. Full version. 63 page

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Hardness of MSA with Selective Column Scoring

Author: Caucchiolo A.
Cicalese F.
Publication venue
Publication date: 01/01/2021
Field of study

Multiple Sequence Alignment (MSA for short) is a well known problem in the field of computational biology. In order to evaluate the quality of a solution, many different scoring functions have been introduced, the most widely used being the Sum-of-pairs score (SP-score). It is known that computing the best MSA under the SP-score measure is NP-hard. In this paper, we introduce a variant of the Column score (defined in Thompson et al. 1999), which we refer to as Selective Column score: Given a symbol a ∈ Σ, the score of the i-th column is one if and only if all symbols of the same column are a, and otherwise zero. The acolumn score of an alignment is then the number of columns made of only character a. We show that finding the optimal MSA under the Selective Column Score is NP-hard for all alphabets of size |Σ| ≥ 2 by reducing from MIN-2-SAT

Catalogo dei prodotti della ricerca