3,245 research outputs found
Twins in words and long common subsequences in permutations
A large family of words must contain two words that are similar. We
investigate several problems where the measure of similarity is the length of a
common subsequence.
We construct a family of n^{1/3} permutations on n letters, such that LCS of
any two of them is only cn^{1/3}, improving a construction of Beame, Blais, and
Huynh-Ngoc. We relate the problem of constructing many permutations with small
LCS to the twin word problem of Axenovich, Person and Puzynina. In particular,
we show that every word of length n over a k-letter alphabet contains two
disjoint equal subsequences of length cnk^{-2/3}.
Many problems are left open.Comment: 18+epsilon page
Knowledge Discovery in Documents by Extracting Frequent Word Sequences
published or submitted for publicatio
A Central Limit Theorem for the Length of the Longest Common Subsequences in Random Words
Let and be two independent sequences of
independent identically distributed random variables taking their values in a
common finite alphabet and having the same law. Let be the length of the
longest common subsequences of the two random words and
. Under a lower bound assumption on the order of its variance,
is shown to satisfy a central limit theorem. This is in contrast to the
limiting distribution of the length of the longest common subsequences in two
independent uniform random permutations of , which is shown to
be the Tracy-Widom distribution.Comment: Some corrections, typos corrected and improvement
Expected length of the longest common subsequence for large alphabets
We consider the length L of the longest common subsequence of two randomly
uniformly and independently chosen n character words over a k-ary alphabet.
Subadditivity arguments yield that the expected value of L, when normalized by
n, converges to a constant C_k. We prove a conjecture of Sankoff and Mainville
from the early 80's claiming that C_k\sqrt{k} goes to 2 as k goes to infinity.Comment: 14 pages, 1 figure, LaTe
Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare
For the last years, time-series mining has become a challenging issue for
researchers. An important application lies in most monitoring purposes, which
require analyzing large sets of time-series for learning usual patterns. Any
deviation from this learned profile is then considered as an unexpected
situation. Moreover, complex applications may involve the temporal study of
several heterogeneous parameters. In that paper, we propose a method for mining
heterogeneous multivariate time-series for learning meaningful patterns. The
proposed approach allows for mixed time-series -- containing both pattern and
non-pattern data -- such as for imprecise matches, outliers, stretching and
global translating of patterns instances in time. We present the early results
of our approach in the context of monitoring the health status of a person at
home. The purpose is to build a behavioral profile of a person by analyzing the
time variations of several quantitative or qualitative parameters recorded
through a provision of sensors installed in the home
On a Speculated Relation Between Chv\'atal-Sankoff Constants of Several Sequences
It is well known that, when normalized by n, the expected length of a longest
common subsequence of d sequences of length n over an alphabet of size sigma
converges to a constant gamma_{sigma,d}. We disprove a speculation by Steele
regarding a possible relation between gamma_{2,d} and gamma_{2,2}. In order to
do that we also obtain new lower bounds for gamma_{sigma,d}, when both sigma
and d are small integers.Comment: 13 pages. To appear in Combinatorics, Probability and Computin
- …