6,213 research outputs found

    On Suffix Tree Breadth

    Get PDF
    The suffix tree—the compacted trie of all the suffixes of a string—is the most important and widely-used data structure in string processing. We consider a natural combinatorial question about suffix trees: for a string S of length n, how many nodes νS(d) can there be at (string) depth d in its suffix tree? We prove ν(n,d)=maxS∈ΣnνS(d) is O((n/d)logn) , and show that this bound is almost tight, describing strings for which νS(d)=d is Ω((n/d)log(n/d)

    Tight Upper and Lower Bounds on Suffix Tree Breadth

    Get PDF
    The suffix tree - the compacted trie of all the suffixes of a string - is the most important and widely-used data structure in string processing. We consider a natural combinatorial question about suffix trees: for a string S of length n, how many nodes nu(S)(d) can there be at (string) depth d in its suffix tree? We prove nu(n, d) = max(S) (is an element of Sigma n) nu(S)(d) is O ((n/d) log(n/d)), and show that this bound is asymptotically tight, describing strings for which nu(S)(d) is Omega((n/d)log(n/d)). (C) 2020 Elsevier B.V. All rights reserved.Peer reviewe

    De Novo Assembly of Nucleotide Sequences in a Compressed Feature Space

    Get PDF
    Sequencing technologies allow for an in-depth analysis of biological species but the size of the generated datasets introduce a number of analytical challenges. Recently, we demonstrated the application of numerical sequence representations and data transformations for the alignment of short reads to a reference genome. Here, we expand out approach for de novo assembly of short reads. Our results demonstrate that highly compressed data can encapsulate the signal suffi- ciently to accurately assemble reads to big contigs or complete genomes

    Alternating normal forms for braids and locally Garside monoids

    Get PDF
    We describe new types of normal forms for braid monoids, Artin-Tits monoids, and, more generally, for all monoids in which divisibility has some convenient lattice properties (``locally Garside monoids''). We show that, in the case of braids, one of these normal forms coincides with the normal form introduced by Burckel and deduce that the latter can be computed easily. This approach leads to a new, simple description for the standard ordering (``Dehornoy order'') of Bn in terms of that of B(n-1), and to a quadratic upper bound for the complexity of this ordering

    Linear pattern matching on sparse suffix trees

    Get PDF
    Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to logσn\log_{\sigma}n characters (σ\sigma the alphabet size), our index takes O(n/logσn)O(n/\log_{\sigma}n) space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time O(m+r2+rocc)O(m+r^2+r\cdot occ), where mm is the length of the pattern, rr is the actual number of characters stored in a word and occocc is the number of pattern occurrences

    An approach to computing downward closures

    Full text link
    The downward closure of a word language is the set of all (not necessarily contiguous) subwords of its members. It is well-known that the downward closure of any language is regular. While the downward closure appears to be a powerful abstraction, algorithms for computing a finite automaton for the downward closure of a given language have been established only for few language classes. This work presents a simple general method for computing downward closures. For language classes that are closed under rational transductions, it is shown that the computation of downward closures can be reduced to checking a certain unboundedness property. This result is used to prove that downward closures are computable for (i) every language class with effectively semilinear Parikh images that are closed under rational transductions, (ii) matrix languages, and (iii) indexed languages (equivalently, languages accepted by higher-order pushdown automata of order 2).Comment: Full version of contribution to ICALP 2015. Comments welcom
    corecore