36,099 research outputs found

    Efficient Seeds Computation Revisited

    Get PDF
    The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a given string, and the shortest seed problem is of much higher algorithmic difficulty. The problem is not well understood, no linear time algorithm is known. In the paper we give linear time algorithms for some of its versions --- computing shortest left-seed array, longest left-seed array and checking for seeds of a given length. The algorithm for the last problem is used to compute the seed array of a string (i.e., the shortest seeds for all the prefixes of the string) in O(n2)O(n^2) time. We describe also a simpler alternative algorithm computing efficiently the shortest seeds. As a by-product we obtain an O(nlog(n/m))O(n\log{(n/m)}) time algorithm checking if the shortest seed has length at least mm and finding the corresponding seed. We also correct some important details missing in the previously known shortest-seed algorithm (Iliopoulos et al., 1996).Comment: 14 pages, accepted to CPM 201

    Compressed Spaced Suffix Arrays

    Full text link
    Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data structure, either a hash table or a spaced suffix array (SSA). In this paper we show how to compress SSAs relative to normal suffix arrays (SAs) and still support fast random access to them. We first prove a theoretical upper bound on the space needed to store an SSA when we already have the SA. We then present experiments indicating that our approach works even better in practice

    Quasiperiodicities in Fibonacci strings

    Full text link
    We consider the problem of finding quasiperiodicities in a Fibonacci string. A factor u of a string y is a cover of y if every letter of y falls within some occurrence of u in y. A string v is a seed of y, if it is a cover of a superstring of y. A left seed of a string y is a prefix of y that it is a cover of a superstring of y. Similarly a right seed of a string y is a suffix of y that it is a cover of a superstring of y. In this paper, we present some interesting results regarding quasiperiodicities in Fibonacci strings, we identify all covers, left/right seeds and seeds of a Fibonacci string and all covers of a circular Fibonacci string.Comment: In Local Proceedings of "The 38th International Conference on Current Trends in Theory and Practice of Computer Science" (SOFSEM 2012
    corecore