Search CORE

The complexity of computing the Lempel-Ziv factorization and the set of all runs (= maximal repetitions) is studied in the decision tree model of computation over ordered alphabet. It is known that both these problems can be solved by RAM algorithms in

O(n\log\sigma)

time, where

n

is the length of the input string and

\sigma

is the number of distinct letters in it. We prove an

\Omega(n\log\sigma)

lower bound on the number of comparisons required to construct the Lempel-Ziv factorization and thereby conclude that a popular technique of computation of runs using the Lempel-Ziv factorization cannot achieve an

o(n\log\sigma)

time bound. In contrast with this, we exhibit an

O(n)

decision tree algorithm finding all runs in a string. Therefore, in the decision tree model the runs problem is easier than the Lempel-Ziv factorization. Thus we support the conjecture that there is a linear RAM algorithm finding all runs.Comment: 12 pages, 3 figures, submitte

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

On-line construction of position heaps

Author: A. Blumer
A. Ehrenfeucht
D. Gusfield
E. Coffman
E. Fredkin
E. Ukkonen
J.I. Munro
M. Crochemore
M. Crochemore
M. Crochemore
T. Cormen
Publication venue
Publication date: 01/01/2011
Field of study

We propose a simple linear-time on-line algorithm for constructing a position heap for a string [Ehrenfeucht et al, 2011]. Our definition of position heap differs slightly from the one proposed in [Ehrenfeucht et al, 2011] in that it considers the suffixes ordered from left to right. Our construction is based on classic suffix pointers and resembles the Ukkonen's algorithm for suffix trees [Ukkonen, 1995]. Using suffix pointers, the position heap can be extended into the augmented position heap that allows for a linear-time string matching algorithm [Ehrenfeucht et al, 2011].Comment: to appear in Journal of Discrete Algorithm

arXiv.org e-Print Archive

Crossref

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Author index volume 45 (1986)

Author
Publication venue: Published by Elsevier B.V.
Publication date
Field of study

Elsevier - Publisher Connector

Minimal Forbidden Factors of Circular Words

Author: AJ Pinho
C Barton
C Barton
D Belazzougui
F Mignosi
G Fici
M Béal
M Béal
M Crochemore
M Crochemore
M Crochemore
S Chairungsee
Publication venue
Publication date: 01/01/2017
Field of study

Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language

M

, computes a DFA recognizing the language whose set of minimal forbidden factors is

M

. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word. We generalize this result to the case of a circular word. We discuss several combinatorial properties of the minimal forbidden factors of a circular word. As a byproduct, we obtain a formal definition of the factor automaton of a circular word. Finally, we investigate the case of minimal forbidden factors of the circular Fibonacci words.Comment: To appear in Theoretical Computer Scienc

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

A simple algorithm for computing the Lempel-Ziv factorization

Author: Crochemore M.
Ilie L.
Smyth William
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

We give a space-efficient simple algorithm for computing the Lempel?Ziv factorization ofa string. For a string of length n over an integer alphabet, it runs in O(n) time independentlyof alphabet size and uses o(n) additional space

Crossref

Research Repository

King's Research Portal

espace@Curtin

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Fast detection of specific fragments against a set of sequences

Author: Béal Marie-Pierre
Crochemore Maxime
Publication venue
Publication date: 06/04/2023
Field of study

We design alignment-free techniques for comparing a sequence or word, called a target, against a set of words, called a reference. A target-specific factor of a target

T

against a reference

R

is a factor

w

of a word in

T

which is not a factor of a word of

R

and such that any proper factor of

w

is a factor of a word of

R

. We first address the computation of the set of target-specific factors of a target

T

against a reference

R

, where

T

and

R

are finite sets of sequences. The result is the construction of an automaton accepting the set of all considered target-specific factors. The construction algorithm runs in linear time according to the size of

T\cup R

. The second result consists of the design of an algorithm to compute all the occurrences in a single sequence

T

of its target-specific factors against a reference

R

. The algorithm runs in real-time on the target sequence, independently of the number of occurrences of target-specific factors

arXiv.org e-Print Archive

Factor oracle : a new structure for pattern matching

Author: Allauzen Cyril
Crochemore Maxime
Raffinot Mathieu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

International audienceWe introduce a new automaton on a word p, sequence of letters taken in an alphabet Σ, that we call factor oracle. This automaton is acyclic, recognizes at least the factors of p, has m+1 states and a linear number of transitions. We give an on-line construction to build it. We use this new structure in string matching algorithms that we conjecture optimal according to the experimental results. These algorithms are as effecient as the ones that already exist using less memory and being more easy to implement

CiteSeerX

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Optimal Parallel Construction of Minimal Suffix and Factor Automata

Author: Breslauer Dany
Hariharan Ramesh
Publication venue: 'Aarhus University Library'
Publication date: 01/01/1995
Field of study

This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for the efficient manipulation of trees, exploiting the well known relation between suffix and factor automata and suffix trees

CiteSeerX

Tidsskrift.dk (Det Kongelige Bibliotek)

MPG.PuRe