Search CORE

531 research outputs found

Expected length of the longest common subsequence for large alphabets

Author: A. Frieze
A. Vershik
B. Bollobás
B. Logan
C. Schensted
D. Aldous
J. Baik
J. Gravner
J.F.C. Kingman
K. Johannson
M. Kiwi
P. Erdös
P. Pevzner
R. Baeza-Yates
R. Stanley
S. Janson
S. Ulam
V. Chvátal
Publication venue
Publication date: 01/01/2003
Field of study

We consider the length L of the longest common subsequence of two randomly uniformly and independently chosen n character words over a k-ary alphabet. Subadditivity arguments yield that the expected value of L, when normalized by n, converges to a constant C_k. We prove a conjecture of Sankoff and Mainville from the early 80's claiming that C_k\sqrt{k} goes to 2 as k goes to infinity.Comment: 14 pages, 1 figure, LaTe

arXiv.org e-Print Archive

CiteSeerX

Crossref

On a Speculated Relation Between Chv\'atal-Sankoff Constants of Several Sequences

Author: J. SOTO
M. KIWI
Pevzner
Vershik
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2008
Field of study

It is well known that, when normalized by n, the expected length of a longest common subsequence of d sequences of length n over an alphabet of size sigma converges to a constant gamma_{sigma,d}. We disprove a speculation by Steele regarding a possible relation between gamma_{2,d} and gamma_{2,2}. In order to do that we also obtain new lower bounds for gamma_{sigma,d}, when both sigma and d are small integers.Comment: 13 pages. To appear in Combinatorics, Probability and Computin

arXiv.org e-Print Archive

CiteSeerX

Crossref

Repositorio Académico de la Universidad de Chile

Multivariate Fine-Grained Complexity of Longest Common Subsequence

Author: Bringmann Karl
Künnemann Marvin
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2018
Field of study

We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings

x

and

y

of length

n

, a textbook algorithm solves LCS in time

O(n^2)

, but although much effort has been spent, no

O(n^{2-\varepsilon})

-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size

n:=\max\{|x|,|y|\}

, the length of the shorter string

m:=\min\{|x|,|y|\}

, the length

L

of an LCS of

x

and

y

, the numbers of deletions

\delta := m-L

and

\Delta := n-L

, the alphabet size, as well as the numbers of matching pairs

M

and dominant pairs

d

. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as

(n+\min\{d, \delta \Delta, \delta m\})^{1\pm o(1)}

. [...]Comment: Presented at SODA'18. Full Version. 66 page

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Near-Linear Time Insertion-Deletion Codes and (1+ $\varepsilon$ )-Approximating Edit Distance via Indexing

Author: Approximating
Efficiently
Goldwasser Shafi
Haeupler Bernhard
Haeupler Bernhard
Polylogarithmic
Selected
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/04/2019
Field of study

We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string

I

. In particular, for every length

n

and every

\varepsilon >0

, one can in near linear time construct a string

I \in \Sigma'^n

with

|\Sigma'| = O_{\varepsilon}(1)

, such that, indexing any string

S \in \Sigma^n

, symbol-by-symbol, with

I

results in a string

S' \in \Sigma''^n

where

\Sigma'' = \Sigma \times \Sigma'

for which edit distance computations are easy, i.e., one can compute a

(1+\varepsilon)

-approximation of the edit distance between

S'

and any other string in

O(n \text{poly}(\log n))

time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes

arXiv.org e-Print Archive

Crossref

A Central Limit Theorem for the Length of the Longest Common Subsequences in Random Words

Author: Houdré Christian
Işlak Ümit
Publication venue
Publication date: 22/01/2017
Field of study

Let

(X_i)_{i \geq 1}

and

(Y_i)_{i\geq1}

be two independent sequences of independent identically distributed random variables taking their values in a common finite alphabet and having the same law. Let

LC_n

be the length of the longest common subsequences of the two random words

X_1\cdots X_n

and

Y_1\cdots Y_n

. Under a lower bound assumption on the order of its variance,

LC_n

is shown to satisfy a central limit theorem. This is in contrast to the limiting distribution of the length of the longest common subsequences in two independent uniform random permutations of

\{1, \dots, n\}

, which is shown to be the Tracy-Widom distribution.Comment: Some corrections, typos corrected and improvement

arXiv.org e-Print Archive