Search CORE

3,245 research outputs found

Twins in words and long common subsequences in permutations

Author: Bukh Boris
Zhou Lidong
Publication venue
Publication date: 02/03/2015
Field of study

A large family of words must contain two words that are similar. We investigate several problems where the measure of similarity is the length of a common subsequence. We construct a family of n^{1/3} permutations on n letters, such that LCS of any two of them is only cn^{1/3}, improving a construction of Beame, Blais, and Huynh-Ngoc. We relate the problem of constructing many permutations with small LCS to the twin word problem of Axenovich, Person and Puzynina. In particular, we show that every word of length n over a k-letter alphabet contains two disjoint equal subsequences of length cnk^{-2/3}. Many problems are left open.Comment: 18+epsilon page

arXiv.org e-Print Archive

CiteSeerX

Knowledge Discovery in Documents by Extracting Frequent Word Sequences

Author: Ahonen Helena
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/1999
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

A Central Limit Theorem for the Length of the Longest Common Subsequences in Random Words

Author: Houdré Christian
Işlak Ümit
Publication venue
Publication date: 22/01/2017
Field of study

Let

(X_i)_{i \geq 1}

and

(Y_i)_{i\geq1}

be two independent sequences of independent identically distributed random variables taking their values in a common finite alphabet and having the same law. Let

LC_n

be the length of the longest common subsequences of the two random words

X_1\cdots X_n

and

Y_1\cdots Y_n

. Under a lower bound assumption on the order of its variance,

LC_n

is shown to satisfy a central limit theorem. This is in contrast to the limiting distribution of the length of the longest common subsequences in two independent uniform random permutations of

\{1, \dots, n\}

, which is shown to be the Tracy-Widom distribution.Comment: Some corrections, typos corrected and improvement

arXiv.org e-Print Archive

Expected length of the longest common subsequence for large alphabets

Author: A. Frieze
A. Vershik
B. Bollobás
B. Logan
C. Schensted
D. Aldous
J. Baik
J. Gravner
J.F.C. Kingman
K. Johannson
M. Kiwi
P. Erdös
P. Pevzner
R. Baeza-Yates
R. Stanley
S. Janson
S. Ulam
V. Chvátal
Publication venue
Publication date: 01/01/2003
Field of study

We consider the length L of the longest common subsequence of two randomly uniformly and independently chosen n character words over a k-ary alphabet. Subadditivity arguments yield that the expected value of L, when normalized by n, converges to a constant C_k. We prove a conjecture of Sankoff and Mainville from the early 80's claiming that C_k\sqrt{k} goes to 2 as k goes to infinity.Comment: 14 pages, 1 figure, LaTe

arXiv.org e-Print Archive

CiteSeerX

Crossref

Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare

Author: Duchene Florence
Garbay Catherine
Rialle Vincent
Publication venue
Publication date: 25/11/2004
Field of study

For the last years, time-series mining has become a challenging issue for researchers. An important application lies in most monitoring purposes, which require analyzing large sets of time-series for learning usual patterns. Any deviation from this learned profile is then considered as an unexpected situation. Moreover, complex applications may involve the temporal study of several heterogeneous parameters. In that paper, we propose a method for mining heterogeneous multivariate time-series for learning meaningful patterns. The proposed approach allows for mixed time-series -- containing both pattern and non-pattern data -- such as for imprecise matches, outliers, stretching and global translating of patterns instances in time. We present the early results of our approach in the context of monitoring the health status of a person at home. The purpose is to build a behavioral profile of a person by analyzing the time variations of several quantitative or qualitative parameters recorded through a provision of sensors installed in the home

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

On a Speculated Relation Between Chv\'atal-Sankoff Constants of Several Sequences

Author: J. SOTO
M. KIWI
Pevzner
Vershik
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2008
Field of study

It is well known that, when normalized by n, the expected length of a longest common subsequence of d sequences of length n over an alphabet of size sigma converges to a constant gamma_{sigma,d}. We disprove a speculation by Steele regarding a possible relation between gamma_{2,d} and gamma_{2,2}. In order to do that we also obtain new lower bounds for gamma_{sigma,d}, when both sigma and d are small integers.Comment: 13 pages. To appear in Combinatorics, Probability and Computin

arXiv.org e-Print Archive

CiteSeerX

Crossref

Repositorio Académico de la Universidad de Chile