6,213 research outputs found
On Suffix Tree Breadth
The suffix tree—the compacted trie of all the suffixes of a string—is the most important and widely-used data structure in string processing. We consider a natural combinatorial question about suffix trees: for a string S of length n, how many nodes νS(d) can there be at (string) depth d in its suffix tree? We prove ν(n,d)=maxS∈ΣnνS(d) is O((n/d)logn) , and show that this bound is almost tight, describing strings for which νS(d)=d is Ω((n/d)log(n/d)
Tight Upper and Lower Bounds on Suffix Tree Breadth
The suffix tree - the compacted trie of all the suffixes of a string - is the most important and widely-used data structure in string processing. We consider a natural combinatorial question about suffix trees: for a string S of length n, how many nodes nu(S)(d) can there be at (string) depth d in its suffix tree? We prove nu(n, d) = max(S) (is an element of Sigma n) nu(S)(d) is O ((n/d) log(n/d)), and show that this bound is asymptotically tight, describing strings for which nu(S)(d) is Omega((n/d)log(n/d)). (C) 2020 Elsevier B.V. All rights reserved.Peer reviewe
De Novo Assembly of Nucleotide Sequences in a Compressed Feature Space
Sequencing technologies allow for an in-depth analysis
of biological species but the size of the generated datasets
introduce a number of analytical challenges. Recently, we
demonstrated the application of numerical sequence representations
and data transformations for the alignment of short
reads to a reference genome. Here, we expand out approach
for de novo assembly of short reads. Our results demonstrate
that highly compressed data can encapsulate the signal suffi-
ciently to accurately assemble reads to big contigs or complete
genomes
Alternating normal forms for braids and locally Garside monoids
We describe new types of normal forms for braid monoids, Artin-Tits monoids,
and, more generally, for all monoids in which divisibility has some convenient
lattice properties (``locally Garside monoids''). We show that, in the case of
braids, one of these normal forms coincides with the normal form introduced by
Burckel and deduce that the latter can be computed easily. This approach leads
to a new, simple description for the standard ordering (``Dehornoy order'') of
Bn in terms of that of B(n-1), and to a quadratic upper bound for the
complexity of this ordering
Linear pattern matching on sparse suffix trees
Packing several characters into one computer word is a simple and natural way
to compress the representation of a string and to speed up its processing.
Exploiting this idea, we propose an index for a packed string, based on a {\em
sparse suffix tree} \cite{KU-96} with appropriately defined suffix links.
Assuming, under the standard unit-cost RAM model, that a word can store up to
characters ( the alphabet size), our index takes
space, i.e. the same space as the packed string itself.
The resulting pattern matching algorithm runs in time ,
where is the length of the pattern, is the actual number of characters
stored in a word and is the number of pattern occurrences
An approach to computing downward closures
The downward closure of a word language is the set of all (not necessarily
contiguous) subwords of its members. It is well-known that the downward closure
of any language is regular. While the downward closure appears to be a powerful
abstraction, algorithms for computing a finite automaton for the downward
closure of a given language have been established only for few language
classes.
This work presents a simple general method for computing downward closures.
For language classes that are closed under rational transductions, it is shown
that the computation of downward closures can be reduced to checking a certain
unboundedness property.
This result is used to prove that downward closures are computable for (i)
every language class with effectively semilinear Parikh images that are closed
under rational transductions, (ii) matrix languages, and (iii) indexed
languages (equivalently, languages accepted by higher-order pushdown automata
of order 2).Comment: Full version of contribution to ICALP 2015. Comments welcom
- …