Search CORE

6,213 research outputs found

On Suffix Tree Breadth

Author: Badkobeh Golnaz
Karkkainen Juha
Puglisi Simon
Zhukova Bella
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/09/2017
Field of study

The suffix tree—the compacted trie of all the suffixes of a string—is the most important and widely-used data structure in string processing. We consider a natural combinatorial question about suffix trees: for a string S of length n, how many nodes νS(d) can there be at (string) depth d in its suffix tree? We prove ν(n,d)=maxS∈ΣnνS(d) is O((n/d)logn) , and show that this bound is almost tight, describing strings for which νS(d)=d is Ω((n/d)log(n/d)

Goldsmiths Research Online

Crossref

Tight Upper and Lower Bounds on Suffix Tree Breadth

Author: Badkobeh Golnaz
Gawrychowski Pawel
Kärkkäinen Juha
Puglisi Simon
Zhukova Bella
Publication venue
Publication date: 01/01/2021
Field of study

The suffix tree - the compacted trie of all the suffixes of a string - is the most important and widely-used data structure in string processing. We consider a natural combinatorial question about suffix trees: for a string S of length n, how many nodes nu(S)(d) can there be at (string) depth d in its suffix tree? We prove nu(n, d) = max(S) (is an element of Sigma n) nu(S)(d) is O ((n/d) log(n/d)), and show that this bound is asymptotically tight, describing strings for which nu(S)(d) is Omega((n/d)log(n/d)). (C) 2020 Elsevier B.V. All rights reserved.Peer reviewe

Goldsmiths Research Online

Helsingin yliopiston digitaalinen arkisto

De Novo Assembly of Nucleotide Sequences in a Compressed Feature Space

Author: Robertson David L.
Tapinos Avraam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Sequencing technologies allow for an in-depth analysis of biological species but the size of the generated datasets introduce a number of analytical challenges. Recently, we demonstrated the application of numerical sequence representations and data transformations for the alignment of short reads to a reference genome. Here, we expand out approach for de novo assembly of short reads. Our results demonstrate that highly compressed data can encapsulate the signal suffi- ciently to accurately assemble reads to big contigs or complete genomes

Crossref

Enlighten

Alternating normal forms for braids and locally Garside monoids

Author: Dehornoy Patrick
Publication venue
Publication date: 01/01/2007
Field of study

We describe new types of normal forms for braid monoids, Artin-Tits monoids, and, more generally, for all monoids in which divisibility has some convenient lattice properties (``locally Garside monoids''). We show that, in the case of braids, one of these normal forms coincides with the normal form introduced by Burckel and deduce that the latter can be computed easily. This approach leads to a new, simple description for the standard ordering (``Dehornoy order'') of Bn in terms of that of B(n-1), and to a quadratic upper bound for the complexity of this ordering

arXiv.org e-Print Archive

HAL - Normandie Université

CiteSeerX

Elsevier - Publisher Connector

Linear pattern matching on sparse suffix trees

Author: Kolpakov Roman
Kucherov Gregory
Starikovskaya Tatiana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/03/2011
Field of study

Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to

\log_{\sigma}n

characters (

\sigma

the alphabet size), our index takes

O(n/\log_{\sigma}n)

space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time

O(m+r^2+r\cdot occ)

, where

m

is the length of the pattern,

r

is the actual number of characters stored in a word and

occ

is the number of pattern occurrences

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

An approach to computing downward closures

Author: A Ehrenfeucht
AN Maslov
AV Aho
B Courcelle
H Gruber
H Seki
J Dassow
J Dassow
J Leeuwen van
JE Hopcroft
LH Haines
M Jantzen
P Habermehl
PA Abdulla
R Mayr
RH Gilman
T Hayashi
T Smith
Publication venue
Publication date: 01/06/2015
Field of study

The downward closure of a word language is the set of all (not necessarily contiguous) subwords of its members. It is well-known that the downward closure of any language is regular. While the downward closure appears to be a powerful abstraction, algorithms for computing a finite automaton for the downward closure of a given language have been established only for few language classes. This work presents a simple general method for computing downward closures. For language classes that are closed under rational transductions, it is shown that the computation of downward closures can be reduced to checking a certain unboundedness property. This result is used to prove that downward closures are computable for (i) every language class with effectively semilinear Parikh images that are closed under rational transductions, (ii) matrix languages, and (iii) indexed languages (equivalently, languages accepted by higher-order pushdown automata of order 2).Comment: Full version of contribution to ICALP 2015. Comments welcom

arXiv.org e-Print Archive

Crossref