34 research outputs found
Computing Lempel-Ziv Factorization Online
We present an algorithm which computes the Lempel-Ziv factorization of a word
of length on an alphabet of size online in the
following sense: it reads starting from the left, and, after reading each
characters of , updates the Lempel-Ziv
factorization. The algorithm requires bits of space and O(n
\log^2 n) time. The basis of the algorithm is a sparse suffix tree combined
with wavelet trees
Lempel-Ziv Factorization May Be Harder Than Computing All Runs
The complexity of computing the Lempel-Ziv factorization and the set of all
runs (= maximal repetitions) is studied in the decision tree model of
computation over ordered alphabet. It is known that both these problems can be
solved by RAM algorithms in time, where is the length of
the input string and is the number of distinct letters in it. We prove
an lower bound on the number of comparisons required to
construct the Lempel-Ziv factorization and thereby conclude that a popular
technique of computation of runs using the Lempel-Ziv factorization cannot
achieve an time bound. In contrast with this, we exhibit an
decision tree algorithm finding all runs in a string. Therefore, in the
decision tree model the runs problem is easier than the Lempel-Ziv
factorization. Thus we support the conjecture that there is a linear RAM
algorithm finding all runs.Comment: 12 pages, 3 figures, submitte
On-line construction of position heaps
We propose a simple linear-time on-line algorithm for constructing a position
heap for a string [Ehrenfeucht et al, 2011]. Our definition of position heap
differs slightly from the one proposed in [Ehrenfeucht et al, 2011] in that it
considers the suffixes ordered from left to right. Our construction is based on
classic suffix pointers and resembles the Ukkonen's algorithm for suffix trees
[Ukkonen, 1995]. Using suffix pointers, the position heap can be extended into
the augmented position heap that allows for a linear-time string matching
algorithm [Ehrenfeucht et al, 2011].Comment: to appear in Journal of Discrete Algorithm
Minimal Forbidden Factors of Circular Words
Minimal forbidden factors are a useful tool for investigating properties of
words and languages. Two factorial languages are distinct if and only if they
have different (antifactorial) sets of minimal forbidden factors. There exist
algorithms for computing the minimal forbidden factors of a word, as well as of
a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an
algorithm that, given the trie recognizing a finite antifactorial language ,
computes a DFA recognizing the language whose set of minimal forbidden factors
is . In the same paper, they showed that the obtained DFA is minimal if the
input trie recognizes the minimal forbidden factors of a single word. We
generalize this result to the case of a circular word. We discuss several
combinatorial properties of the minimal forbidden factors of a circular word.
As a byproduct, we obtain a formal definition of the factor automaton of a
circular word. Finally, we investigate the case of minimal forbidden factors of
the circular Fibonacci words.Comment: To appear in Theoretical Computer Scienc
A simple algorithm for computing the Lempel-Ziv factorization
We give a space-efficient simple algorithm for computing the Lempel?Ziv factorization ofa string. For a string of length n over an integer alphabet, it runs in O(n) time independentlyof alphabet size and uses o(n) additional space
Fast detection of specific fragments against a set of sequences
We design alignment-free techniques for comparing a sequence or word, called
a target, against a set of words, called a reference. A target-specific factor
of a target against a reference is a factor of a word in which
is not a factor of a word of and such that any proper factor of is a
factor of a word of . We first address the computation of the set of
target-specific factors of a target against a reference , where and
are finite sets of sequences. The result is the construction of an
automaton accepting the set of all considered target-specific factors. The
construction algorithm runs in linear time according to the size of .
The second result consists of the design of an algorithm to compute all the
occurrences in a single sequence of its target-specific factors against a
reference . The algorithm runs in real-time on the target sequence,
independently of the number of occurrences of target-specific factors
Factor oracle : a new structure for pattern matching
International audienceWe introduce a new automaton on a word p, sequence of letters taken in an alphabet Σ, that we call factor oracle. This automaton is acyclic, recognizes at least the factors of p, has m+1 states and a linear number of transitions. We give an on-line construction to build it. We use this new structure in string matching algorithms that we conjecture optimal according to the experimental results. These algorithms are as effecient as the ones that already exist using less memory and being more easy to implement
Optimal Parallel Construction of Minimal Suffix and Factor Automata
This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for the efficient manipulation of trees, exploiting the well known relation between suffix and factor automata and suffix trees