3,245 research outputs found

    Twins in words and long common subsequences in permutations

    Full text link
    A large family of words must contain two words that are similar. We investigate several problems where the measure of similarity is the length of a common subsequence. We construct a family of n^{1/3} permutations on n letters, such that LCS of any two of them is only cn^{1/3}, improving a construction of Beame, Blais, and Huynh-Ngoc. We relate the problem of constructing many permutations with small LCS to the twin word problem of Axenovich, Person and Puzynina. In particular, we show that every word of length n over a k-letter alphabet contains two disjoint equal subsequences of length cnk^{-2/3}. Many problems are left open.Comment: 18+epsilon page

    Knowledge Discovery in Documents by Extracting Frequent Word Sequences

    Get PDF
    published or submitted for publicatio

    A Central Limit Theorem for the Length of the Longest Common Subsequences in Random Words

    Full text link
    Let (Xi)i≥1(X_i)_{i \geq 1} and (Yi)i≥1(Y_i)_{i\geq1} be two independent sequences of independent identically distributed random variables taking their values in a common finite alphabet and having the same law. Let LCnLC_n be the length of the longest common subsequences of the two random words X1⋯XnX_1\cdots X_n and Y1⋯YnY_1\cdots Y_n. Under a lower bound assumption on the order of its variance, LCnLC_n is shown to satisfy a central limit theorem. This is in contrast to the limiting distribution of the length of the longest common subsequences in two independent uniform random permutations of {1,…,n}\{1, \dots, n\}, which is shown to be the Tracy-Widom distribution.Comment: Some corrections, typos corrected and improvement

    Expected length of the longest common subsequence for large alphabets

    Full text link
    We consider the length L of the longest common subsequence of two randomly uniformly and independently chosen n character words over a k-ary alphabet. Subadditivity arguments yield that the expected value of L, when normalized by n, converges to a constant C_k. We prove a conjecture of Sankoff and Mainville from the early 80's claiming that C_k\sqrt{k} goes to 2 as k goes to infinity.Comment: 14 pages, 1 figure, LaTe

    Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare

    Full text link
    For the last years, time-series mining has become a challenging issue for researchers. An important application lies in most monitoring purposes, which require analyzing large sets of time-series for learning usual patterns. Any deviation from this learned profile is then considered as an unexpected situation. Moreover, complex applications may involve the temporal study of several heterogeneous parameters. In that paper, we propose a method for mining heterogeneous multivariate time-series for learning meaningful patterns. The proposed approach allows for mixed time-series -- containing both pattern and non-pattern data -- such as for imprecise matches, outliers, stretching and global translating of patterns instances in time. We present the early results of our approach in the context of monitoring the health status of a person at home. The purpose is to build a behavioral profile of a person by analyzing the time variations of several quantitative or qualitative parameters recorded through a provision of sensors installed in the home

    On a Speculated Relation Between Chv\'atal-Sankoff Constants of Several Sequences

    Full text link
    It is well known that, when normalized by n, the expected length of a longest common subsequence of d sequences of length n over an alphabet of size sigma converges to a constant gamma_{sigma,d}. We disprove a speculation by Steele regarding a possible relation between gamma_{2,d} and gamma_{2,2}. In order to do that we also obtain new lower bounds for gamma_{sigma,d}, when both sigma and d are small integers.Comment: 13 pages. To appear in Combinatorics, Probability and Computin
    • …
    corecore