21,021 research outputs found
Repetition Detection in a Dynamic String
A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings.
We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting
Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries
Longest common extension queries (LCE queries) and runs are ubiquitous in
algorithmic stringology. Linear-time algorithms computing runs and
preprocessing for constant-time LCE queries have been known for over a decade.
However, these algorithms assume a linearly-sortable integer alphabet. A recent
breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the
two notions: all the runs in a string can be computed via a linear number of
LCE queries. The first to consider these problems over a general ordered
alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an
-time algorithm for answering LCE queries. This
result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to time. In this work we note a special \emph{non-crossing} property
of LCE queries asked in the runs computation. We show that any such
non-crossing queries can be answered on-line in time, which
yields an -time algorithm for computing runs
Fast Algorithm for Partial Covers in Words
A factor of a word is a cover of if every position in lies
within some occurrence of in . A word covered by thus
generalizes the idea of a repetition, that is, a word composed of exact
concatenations of . In this article we introduce a new notion of
-partial cover, which can be viewed as a relaxed variant of cover, that
is, a factor covering at least positions in . We develop a data
structure of size (where ) that can be constructed in time which we apply to compute all shortest -partial covers for a
given . We also employ it for an -time algorithm computing
a shortest -partial cover for each
Efficient Seeds Computation Revisited
The notion of the cover is a generalization of a period of a string, and
there are linear time algorithms for finding the shortest cover. The seed is a
more complicated generalization of periodicity, it is a cover of a superstring
of a given string, and the shortest seed problem is of much higher algorithmic
difficulty. The problem is not well understood, no linear time algorithm is
known. In the paper we give linear time algorithms for some of its versions ---
computing shortest left-seed array, longest left-seed array and checking for
seeds of a given length. The algorithm for the last problem is used to compute
the seed array of a string (i.e., the shortest seeds for all the prefixes of
the string) in time. We describe also a simpler alternative algorithm
computing efficiently the shortest seeds. As a by-product we obtain an
time algorithm checking if the shortest seed has length at
least and finding the corresponding seed. We also correct some important
details missing in the previously known shortest-seed algorithm (Iliopoulos et
al., 1996).Comment: 14 pages, accepted to CPM 201
Internal Pattern Matching Queries in a Text and Applications
We consider several types of internal queries: questions about subwords of a
text. As the main tool we develop an optimal data structure for the problem
called here internal pattern matching. This data structure provides
constant-time answers to queries about occurrences of one subword in
another subword of a given text, assuming that ,
which allows for a constant-space representation of all occurrences. This
problem can be viewed as a natural extension of the well-studied pattern
matching problem. The data structure has linear size and admits a linear-time
construction algorithm.
Using the solution to the internal pattern matching problem, we obtain very
efficient data structures answering queries about: primitivity of subwords,
periods of subwords, general substring compression, and cyclic equivalence of
two subwords. All these results improve upon the best previously known
counterparts. The linear construction time of our data structure also allows to
improve the algorithm for finding -subrepetitions in a text (a more
general version of maximal repetitions, also called runs). For any fixed
we obtain the first linear-time algorithm, which matches the linear
time complexity of the algorithm computing runs. Our data structure has already
been used as a part of the efficient solutions for subword suffix rank &
selection, as well as substring compression using Burrows-Wheeler transform
composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201
Searching of gapped repeats and subrepetitions in a word
A gapped repeat is a factor of the form where and are nonempty
words. The period of the gapped repeat is defined as . The gapped
repeat is maximal if it cannot be extended to the left or to the right by at
least one letter with preserving its period. The gapped repeat is called
-gapped if its period is not greater than . A
-subrepetition is a factor which exponent is less than 2 but is not
less than (the exponent of the factor is the quotient of the length
and the minimal period of the factor). The -subrepetition is maximal if
it cannot be extended to the left or to the right by at least one letter with
preserving its minimal period. We reveal a close relation between maximal
gapped repeats and maximal subrepetitions. Moreover, we show that in a word of
length the number of maximal -gapped repeats is bounded by
and the number of maximal -subrepetitions is bounded by
. Using the obtained upper bounds, we propose algorithms for
finding all maximal -gapped repeats and all maximal
-subrepetitions in a word of length . The algorithm for finding all
maximal -gapped repeats has time complexity for the case
of constant alphabet size and time complexity for the
general case. For finding all maximal -subrepetitions we propose two
algorithms. The first algorithm has time
complexity for the case of constant alphabet size and time complexity for the general case. The
second algorithm has
expected time complexity
- …