5,685 research outputs found
On the maximal sum of exponents of runs in a string
A run is an inclusion maximal occurrence in a string (as a subinterval) of a
repetition with a period such that . The exponent of a run
is defined as and is . We show new bounds on the maximal sum of
exponents of runs in a string of length . Our upper bound of is
better than the best previously known proven bound of by Crochemore &
Ilie (2008). The lower bound of , obtained using a family of binary
words, contradicts the conjecture of Kolpakov & Kucherov (1999) that the
maximal sum of exponents of runs in a string of length is smaller than Comment: 7 pages, 1 figur
Understanding maximal repetitions in strings
The cornerstone of any algorithm computing all repetitions in a string of
length n in O(n) time is the fact that the number of runs (or maximal
repetitions) is O(n). We give a simple proof of this result. As a consequence
of our approach, the stronger result concerning the linearity of the sum of
exponents of all runs follows easily
Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries
Longest common extension queries (LCE queries) and runs are ubiquitous in
algorithmic stringology. Linear-time algorithms computing runs and
preprocessing for constant-time LCE queries have been known for over a decade.
However, these algorithms assume a linearly-sortable integer alphabet. A recent
breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the
two notions: all the runs in a string can be computed via a linear number of
LCE queries. The first to consider these problems over a general ordered
alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an
-time algorithm for answering LCE queries. This
result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to time. In this work we note a special \emph{non-crossing} property
of LCE queries asked in the runs computation. We show that any such
non-crossing queries can be answered on-line in time, which
yields an -time algorithm for computing runs
Lempel-Ziv Factorization May Be Harder Than Computing All Runs
The complexity of computing the Lempel-Ziv factorization and the set of all
runs (= maximal repetitions) is studied in the decision tree model of
computation over ordered alphabet. It is known that both these problems can be
solved by RAM algorithms in time, where is the length of
the input string and is the number of distinct letters in it. We prove
an lower bound on the number of comparisons required to
construct the Lempel-Ziv factorization and thereby conclude that a popular
technique of computation of runs using the Lempel-Ziv factorization cannot
achieve an time bound. In contrast with this, we exhibit an
decision tree algorithm finding all runs in a string. Therefore, in the
decision tree model the runs problem is easier than the Lempel-Ziv
factorization. Thus we support the conjecture that there is a linear RAM
algorithm finding all runs.Comment: 12 pages, 3 figures, submitte
On the maximal number of cubic subwords in a string
We investigate the problem of the maximum number of cubic subwords (of the
form ) in a given word. We also consider square subwords (of the form
). The problem of the maximum number of squares in a word is not well
understood. Several new results related to this problem are produced in the
paper. We consider two simple problems related to the maximum number of
subwords which are squares or which are highly repetitive; then we provide a
nontrivial estimation for the number of cubes. We show that the maximum number
of squares such that is not a primitive word (nonprimitive squares) in
a word of length is exactly , and the
maximum number of subwords of the form , for , is exactly .
In particular, the maximum number of cubes in a word is not greater than
either. Using very technical properties of occurrences of cubes, we improve
this bound significantly. We show that the maximum number of cubes in a word of
length is between and . (In particular, we improve the
lower bound from the conference version of the paper.)Comment: 14 page
- …