157 research outputs found
Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries
Longest common extension queries (LCE queries) and runs are ubiquitous in
algorithmic stringology. Linear-time algorithms computing runs and
preprocessing for constant-time LCE queries have been known for over a decade.
However, these algorithms assume a linearly-sortable integer alphabet. A recent
breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the
two notions: all the runs in a string can be computed via a linear number of
LCE queries. The first to consider these problems over a general ordered
alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an
-time algorithm for answering LCE queries. This
result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to time. In this work we note a special \emph{non-crossing} property
of LCE queries asked in the runs computation. We show that any such
non-crossing queries can be answered on-line in time, which
yields an -time algorithm for computing runs
Minimal Suffix and Rotation of a Substring in Optimal Time
For a text given in advance, the substring minimal suffix queries ask to
determine the lexicographically minimal non-empty suffix of a substring
specified by the location of its occurrence in the text. We develop a data
structure answering such queries optimally: in constant time after linear-time
preprocessing. This improves upon the results of Babenko et al. (CPM 2014),
whose trade-off solution is characterized by product of these
time complexities. Next, we extend our queries to support concatenations of
substrings, for which the construction and query time is preserved. We
apply these generalized queries to compute lexicographically minimal and
maximal rotations of a given substring in constant time after linear-time
preprocessing.
Our data structures mainly rely on properties of Lyndon words and Lyndon
factorizations. We combine them with further algorithmic and combinatorial
tools, such as fusion trees and the notion of order isomorphism of strings
Lyndon Array Construction during Burrows-Wheeler Inversion
In this paper we present an algorithm to compute the Lyndon array of a string
of length as a byproduct of the inversion of the Burrows-Wheeler
transform of . Our algorithm runs in linear time using only a stack in
addition to the data structures used for Burrows-Wheeler inversion. We compare
our algorithm with two other linear-time algorithms for Lyndon array
construction and show that computing the Burrows-Wheeler transform and then
constructing the Lyndon array is competitive compared to the known approaches.
We also propose a new balanced parenthesis representation for the Lyndon array
that uses bits of space and supports constant time access. This
representation can be built in linear time using words of space, or in
time using asymptotically the same space as
Internal Pattern Matching Queries in a Text and Applications
We consider several types of internal queries: questions about subwords of a
text. As the main tool we develop an optimal data structure for the problem
called here internal pattern matching. This data structure provides
constant-time answers to queries about occurrences of one subword in
another subword of a given text, assuming that ,
which allows for a constant-space representation of all occurrences. This
problem can be viewed as a natural extension of the well-studied pattern
matching problem. The data structure has linear size and admits a linear-time
construction algorithm.
Using the solution to the internal pattern matching problem, we obtain very
efficient data structures answering queries about: primitivity of subwords,
periods of subwords, general substring compression, and cyclic equivalence of
two subwords. All these results improve upon the best previously known
counterparts. The linear construction time of our data structure also allows to
improve the algorithm for finding -subrepetitions in a text (a more
general version of maximal repetitions, also called runs). For any fixed
we obtain the first linear-time algorithm, which matches the linear
time complexity of the algorithm computing runs. Our data structure has already
been used as a part of the efficient solutions for subword suffix rank &
selection, as well as substring compression using Burrows-Wheeler transform
composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201
Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings
We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding.
Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2.
We also show an algorithm for computing all maximal repetitions in O(m alpha(m)) time and O(m) space, where alpha denotes the inverse Ackermann function
Fast Computation of Abelian Runs
Given a word and a Parikh vector , an abelian run of period
in is a maximal occurrence of a substring of having
abelian period . Our main result is an online algorithm that,
given a word of length over an alphabet of cardinality and a
Parikh vector , returns all the abelian runs of period
in in time and space , where is the
norm of , i.e., the sum of its components. We also present an
online algorithm that computes all the abelian runs with periods of norm in
in time , for any given norm . Finally, we give an -time
offline randomized algorithm for computing all the abelian runs of . Its
deterministic counterpart runs in time.Comment: To appear in Theoretical Computer Scienc
- …