Search CORE

579 research outputs found

Exact algorithms for the repetition-bounded longest common subsequence problem

Author: Asahiro Yuichi
Jansson Jesper
Lin Guohui
Miyano Eiji
Ono Hirotaka
Utashima Tadatoshi
Publication venue: 'Elsevier BV'
Publication date: 04/08/2020
Field of study

In this paper, we study exact, exponential-time algorithms for a variant of the classic Longest Common Subsequence problem called the Repetition-Bounded Longest Common Subsequence problem (or RBLCS, for short): Let an alphabet S be a finite set of symbols and an occurrence constraint Cocc be a function Cocc: S → N, assigning an upper bound on the number of occurrences of each symbol in S. Given two sequences X and Y over the alphabet S and an occurrence constraint Cocc, the goal of RBLCS is to find a longest common subsequence of X and Y such that each symbol s ∈ S appears at most Cocc(s) times in the obtained subsequence. The special case where Cocc(s) = 1 for every symbol s ∈ S is known as the Repetition-Free Longest Common Subsequence problem (RFLCS) and has been studied previously; e.g., in [1], Adi et al. presented a simple (exponential-time) exact algorithm for RFLCS. However, they did not analyze its time complexity in detail, and to the best of our knowledge, there are no previous results on the running times of any exact algorithms for this problem. Without loss of generality, we will assume that |X| ≤ |Y | and |X| = n. In this paper, we first propose a simpler algorithm for RFLCS based on the strategy used in [1] and show explicitly that its running time is O(1.44225n). Next, we provide a dynamic programming (DP) based algorithm for RBLCS and prove that its running time is O(1.44225n) for any occurrence constraint Cocc, and even less in certain special cases. In particular, for RFLCS, our DP-based algorithm runs in O(1.41422n) time, which is faster than the previous one. Furthermore, we prove NP-hardness and APX-hardness results for RBLCS on restricted instances

Kyutacar : Kyushu Institute of Technology Academic Repository

Repetition-free longest common subsequence of random sequences

Author: Fernandes Cristina G.
Kiwi Marcos
Publication venue
Publication date: 21/05/2013
Field of study

A repetition free Longest Common Subsequence (LCS) of two sequences x and y is an LCS of x and y where each symbol may appear at most once. Let R denote the length of a repetition free LCS of two sequences of n symbols each one chosen randomly, uniformly, and independently over a k-ary alphabet. We study the asymptotic, in n and k, behavior of R and establish that there are three distinct regimes, depending on the relative speed of growth of n and k. For each regime we establish the limiting behavior of R. In fact, we do more, since we actually establish tail bounds for large deviations of R from its limiting behavior. Our study is motivated by the so called exemplar model proposed by Sankoff (1999) and the related similarity measure introduced by Adi et al. (2007). A natural question that arises in this context, which as we show is related to long standing open problems in the area of probabilistic combinatorics, is to understand the asymptotic, in n and k, behavior of parameter R.Comment: 15 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Near-Linear Time Insertion-Deletion Codes and (1+ $\varepsilon$ )-Approximating Edit Distance via Indexing

Author: Approximating
Efficiently
Goldwasser Shafi
Haeupler Bernhard
Haeupler Bernhard
Polylogarithmic
Selected
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/04/2019
Field of study

We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string

I

. In particular, for every length

n

and every

\varepsilon >0

, one can in near linear time construct a string

I \in \Sigma'^n

with

|\Sigma'| = O_{\varepsilon}(1)

, such that, indexing any string

S \in \Sigma^n

, symbol-by-symbol, with

I

results in a string

S' \in \Sigma''^n

where

\Sigma'' = \Sigma \times \Sigma'

for which edit distance computations are easy, i.e., one can compute a

(1+\varepsilon)

-approximation of the edit distance between

S'

and any other string in

O(n \text{poly}(\log n))

time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes

arXiv.org e-Print Archive

Crossref

Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve

Author: Abboud A.
Backurs A.
Bringmann K.
Künnemann M.
Publication venue
Publication date: 01/01/2018
Field of study

Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size

n

of data that originally has size

N

, and we want to solve a problem with time complexity

T(\cdot)

. The naive strategy of "decompress-and-solve" gives time

T(N)

, whereas "the gold standard" is time

T(n)

: to analyze the compression as efficiently as if the original data was small. We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of Grammar Compressions. A vast literature, across many disciplines, established this as an influential notion for Algorithm design. We introduce a framework for proving (conditional) lower bounds in this field, allowing us to assess whether decompress-and-solve can be improved, and by how much. Our main results are: - The

O(nN\sqrt{\log{N/n}})

bound for LCS and the

O(\min\{N \log N, nM\})

bound for Pattern Matching with Wildcards are optimal up to

N^{o(1)}

factors, under the Strong Exponential Time Hypothesis. (Here,

M

denotes the uncompressed length of the compressed pattern.) - Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the

k

-Clique conjecture. - We give an algorithm showing that decompress-and-solve is not optimal for Disjointness

MPG.PuRe

Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve

Author: Abboud Amir
Backurs Arturs
Bringmann Karl
Künnemann Marvin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

n

of data that originally has size

N

, and we want to solve a problem with time complexity

T(\cdot)

. The naive strategy of "decompress-and-solve" gives time

T(N)

, whereas "the gold standard" is time

T(n)

O(nN\sqrt{\log{N/n}})

bound for LCS and the

O(\min\{N \log N, nM\})

bound for Pattern Matching with Wildcards are optimal up to

N^{o(1)}

factors, under the Strong Exponential Time Hypothesis. (Here,

M

denotes the uncompressed length of the compressed pattern.) - Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the

k

-Clique conjecture. - We give an algorithm showing that decompress-and-solve is not optimal for Disjointness.Comment: Presented at FOCS'17. Full version. 63 page

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Multivariate Fine-Grained Complexity of Longest Common Subsequence

Author: Bringmann Karl
Künnemann Marvin
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2018
Field of study

We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings

x

and

y

of length

n

, a textbook algorithm solves LCS in time

O(n^2)

, but although much effort has been spent, no

O(n^{2-\varepsilon})

-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size

n:=\max\{|x|,|y|\}

, the length of the shorter string

m:=\min\{|x|,|y|\}

, the length

L

of an LCS of

x

and

y

, the numbers of deletions

\delta := m-L

and

\Delta := n-L

, the alphabet size, as well as the numbers of matching pairs

M

and dominant pairs

d

. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as

(n+\min\{d, \delta \Delta, \delta m\})^{1\pm o(1)}

. [...]Comment: Presented at SODA'18. Full Version. 66 page

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Chasing Ghosts: Competing with Stateful Policies

Author: Feige Uriel
Koren Tomer
Tennenholtz Moshe
Publication venue
Publication date: 29/07/2014
Field of study

We consider sequential decision making in a setting where regret is measured with respect to a set of stateful reference policies, and feedback is limited to observing the rewards of the actions performed (the so called "bandit" setting). If either the reference policies are stateless rather than stateful, or the feedback includes the rewards of all actions (the so called "expert" setting), previous work shows that the optimal regret grows like

\Theta(\sqrt{T})

in terms of the number of decision rounds

T

. The difficulty in our setting is that the decision maker unavoidably loses track of the internal states of the reference policies, and thus cannot reliably attribute rewards observed in a certain round to any of the reference policies. In fact, in this setting it is impossible for the algorithm to estimate which policy gives the highest (or even approximately highest) total reward. Nevertheless, we design an algorithm that achieves expected regret that is sublinear in

T

, of the form

O( T/\log^{1/4}{T})

. Our algorithm is based on a certain local repetition lemma that may be of independent interest. We also show that no algorithm can guarantee expected regret better than

O( T/\log^{3/2} T)

arXiv.org e-Print Archive

Crossref