187 research outputs found
On the maximal number of cubic subwords in a string
We investigate the problem of the maximum number of cubic subwords (of the
form ) in a given word. We also consider square subwords (of the form
). The problem of the maximum number of squares in a word is not well
understood. Several new results related to this problem are produced in the
paper. We consider two simple problems related to the maximum number of
subwords which are squares or which are highly repetitive; then we provide a
nontrivial estimation for the number of cubes. We show that the maximum number
of squares such that is not a primitive word (nonprimitive squares) in
a word of length is exactly , and the
maximum number of subwords of the form , for , is exactly .
In particular, the maximum number of cubes in a word is not greater than
either. Using very technical properties of occurrences of cubes, we improve
this bound significantly. We show that the maximum number of cubes in a word of
length is between and . (In particular, we improve the
lower bound from the conference version of the paper.)Comment: 14 page
Fast Label Extraction in the CDAWG
The compact directed acyclic word graph (CDAWG) of a string of length
takes space proportional just to the number of right extensions of the
maximal repeats of , and it is thus an appealing index for highly repetitive
datasets, like collections of genomes from similar species, in which grows
significantly more slowly than . We reduce from to
the time needed to count the number of occurrences of a pattern of
length , using an existing data structure that takes an amount of space
proportional to the size of the CDAWG. This implies a reduction from
to in the time needed to
locate all the occurrences of the pattern. We also reduce from
to the time needed to read the characters of the
label of an edge of the suffix tree of , and we reduce from
to the time needed to compute the matching
statistics between a query of length and , using an existing
representation of the suffix tree based on the CDAWG. All such improvements
derive from extracting the label of a vertex or of an arc of the CDAWG using a
straight-line program induced by the reversed CDAWG.Comment: 16 pages, 1 figure. In proceedings of the 24th International
Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv
admin note: text overlap with arXiv:1705.0864
A simple algorithm for computing the Lempel-Ziv factorization
We give a space-efficient simple algorithm for computing the Lempel?Ziv factorization ofa string. For a string of length n over an integer alphabet, it runs in O(n) time independentlyof alphabet size and uses o(n) additional space
A Minimal Periods Algorithm with Applications
Kosaraju in ``Computation of squares in a string'' briefly described a
linear-time algorithm for computing the minimal squares starting at each
position in a word. Using the same construction of suffix trees, we generalize
his result and describe in detail how to compute in O(k|w|)-time the minimal
k-th power, with period of length larger than s, starting at each position in a
word w for arbitrary exponent and integer . We provide the
complete proof of correctness of the algorithm, which is somehow not completely
clear in Kosaraju's original paper. The algorithm can be used as a sub-routine
to detect certain types of pseudo-patterns in words, which is our original
intention to study the generalization.Comment: 14 page
Efficient Enumeration of Non-Equivalent Squares in Partial Words with Few Holes
International audienceA partial word is a word with holes (also called don't cares: special symbols which match any symbol). A p-square is a partial word matching at least one standard square without holes (called a full square). Two p-squares are called equivalent if they match the same sets of full squares. Denote by psquares(T) the number of non-equivalent p-squares which are subwords of a partial word T. Let PSQUARES k (n) be the maximum value of psquares(T) over all partial words of length n with k holes. We show asympthotically tight bounds: c1 · min(nk 2 , n 2) ≤ PSQUARES k (n) ≤ c2 · min(nk 2 , n 2) for some constants c1, c2 > 0. We also present an algorithm that computes psquares(T) in O(nk 3) time for a partial word T of length n with k holes. In particular, our algorithm runs in linear time for k = O(1) and its time complexity near-matches the maximum number of non-equivalent p-squares
Palindromic complexity of trees
We consider finite trees with edges labeled by letters on a finite alphabet
. Each pair of nodes defines a unique labeled path whose trace is a
word of the free monoid . The set of all such words defines the
language of the tree. In this paper, we investigate the palindromic complexity
of trees and provide hints for an upper bound on the number of distinct
palindromes in the language of a tree.Comment: Submitted to the conference DLT201
- …