Search CORE

202 research outputs found

On the maximal number of cubic subwords in a string

Author: A. Apostolico
A. Thue
A.S. Freankel
C.S. Iliopoulos
D. Damanik
L. Ilie
L. Ilie
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Giraud
M. Lothaire
M.G. Main
M.G. Main
N.J. Fine
P. Baturo
R.M. Kolpakov
S.J. Puglisi
W. Rytter
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We investigate the problem of the maximum number of cubic subwords (of the form

www

) in a given word. We also consider square subwords (of the form

ww

). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares or which are highly repetitive; then we provide a nontrivial estimation for the number of cubes. We show that the maximum number of squares

xx

such that

x

is not a primitive word (nonprimitive squares) in a word of length

n

is exactly

\lfloor \frac{n}{2}\rfloor - 1

, and the maximum number of subwords of the form

x^k

, for

k\ge 3

, is exactly

n-2

. In particular, the maximum number of cubes in a word is not greater than

n-2

either. Using very technical properties of occurrences of cubes, we improve this bound significantly. We show that the maximum number of cubes in a word of length

n

is between

(1/2)n

and

(4/5)n

. (In particular, we improve the lower bound from the conference version of the paper.)Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Fast Label Extraction in the CDAWG

Author: A Blumer
D Belazzougui
D Gusfield
J Sirén
L Gasieniec
LS Russo
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Raffinot
MA Bender
O Berkman
T Gagie
V Mäkinen
V Mäkinen
Publication venue
Publication date: 26/09/2017
Field of study

The compact directed acyclic word graph (CDAWG) of a string

T

of length

n

takes space proportional just to the number

e

of right extensions of the maximal repeats of

T

, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which

e

grows significantly more slowly than

n

. We reduce from

O(m\log{\log{n}})

O(m)

the time needed to count the number of occurrences of a pattern of length

m

, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from

O(m\log{\log{n}}+\mathtt{occ})

O(m+\mathtt{occ})

in the time needed to locate all the

\mathtt{occ}

occurrences of the pattern. We also reduce from

O(k\log{\log{n}})

O(k)

the time needed to read the

k

characters of the label of an edge of the suffix tree of

T

, and we reduce from

O(m\log{\log{n}})

O(m)

the time needed to compute the matching statistics between a query of length

m

and

T

, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.Comment: 16 pages, 1 figure. In proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv admin note: text overlap with arXiv:1705.0864

arXiv.org e-Print Archive

Crossref

A simple algorithm for computing the Lempel-Ziv factorization

Author: Crochemore M.
Ilie L.
Smyth William
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

We give a space-efficient simple algorithm for computing the Lempel?Ziv factorization ofa string. For a string of length n over an integer alphabet, it runs in O(n) time independentlyof alphabet size and uses o(n) additional space

Crossref

Research Repository

King's Research Portal

espace@Curtin

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

A Minimal Periods Algorithm with Applications

Author: A. Apostolico
A.O. Slisenko
A.S. Fraenkel
B. Schieber
D. Beauquier
D. Gusfield
D. Gusfield
D. Harel
D. Knuth
E.M. McCreight
J. Duval
J. Stoye
L. Ilie
M. Crochemore
M. Crochemore
M. Crochemore
M. Main
M. Main
M.G. Main
R. Kolpakov
S.R. Kosaraju
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/11/2009
Field of study

Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and describe in detail how to compute in O(k|w|)-time the minimal k-th power, with period of length larger than s, starting at each position in a word w for arbitrary exponent

k\geq2

and integer

s\geq0

. We provide the complete proof of correctness of the algorithm, which is somehow not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is our original intention to study the generalization.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Prospecting of rhizobium for soy cultivation in soils with deficient natural drainage in the pampa biome.

Author: CROCHEMORE A. G.
GALARZ. L. A.
MATTOS M. L. T.
Publication venue
Publication date: 26/01/2016
Field of study

Repository Open Access to Scientific Information from Embrapa

Efficient Enumeration of Non-Equivalent Squares in Partial Words with Few Holes

Author: A Deza
A Diaconu
AS Fraenkel
D Gusfield
F Blanchet-Sadri
F Blanchet-Sadri
F Blanchet-Sadri
F Blanchet-Sadri
F Blanchet-Sadri
F Blanchet-Sadri
F Manea
F Manea
L Ilie
M Crochemore
M Crochemore
M Crochemore
M Crochemore
V Halava
Publication venue: HAL CCSD
Publication date: 01/08/2017
Field of study

International audienceA partial word is a word with holes (also called don't cares: special symbols which match any symbol). A p-square is a partial word matching at least one standard square without holes (called a full square). Two p-squares are called equivalent if they match the same sets of full squares. Denote by psquares(T) the number of non-equivalent p-squares which are subwords of a partial word T. Let PSQUARES k (n) be the maximum value of psquares(T) over all partial words of length n with k holes. We show asympthotically tight bounds: c1 · min(nk 2 , n 2) ≤ PSQUARES k (n) ≤ c2 · min(nk 2 , n 2) for some constants c1, c2 > 0. We also present an algorithm that computes psquares(T) in O(nk 3) time for a partial word T of length n with k holes. In particular, our algorithm runs in linear time for k = O(1) and its time complexity near-matches the maximum number of non-equivalent p-squares

Crossref

King's Research Portal

Hal-Diderot

HAL - UPEC / UPEM

Coleção de culturas de microrganismos multifuncionais da embrapa clima temperado: implementação de boas práticas.

Author: ALMEIDA B. M.
CROCHEMORE A. G.
FACIO M. L. P.
GALARZ L. A.
MATTOS M. L. T.
Publication venue
Publication date: 01/01/2013
Field of study

Repository Open Access to Scientific Information from Embrapa

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Palindromic complexity of trees

Author: A Glen
A Hof
A Luca de
AS Fraenkel
E Domenjoud
JP Allouche
L Balková
M Crochemore
M Lothaire
P Erdös
S Brlek
S Brlek
V Berthé
X Droubay
Publication venue
Publication date: 11/05/2015
Field of study

We consider finite trees with edges labeled by letters on a finite alphabet

\varSigma

. Each pair of nodes defines a unique labeled path whose trace is a word of the free monoid

\varSigma^*

. The set of all such words defines the language of the tree. In this paper, we investigate the palindromic complexity of trees and provide hints for an upper bound on the number of distinct palindromes in the language of a tree.Comment: Submitted to the conference DLT201

arXiv.org e-Print Archive

Crossref