15 research outputs found
Computing Runs on a General Alphabet
We describe a RAM algorithm computing all runs (maximal repetitions) of a
given string of length over a general ordered alphabet in
time and linear space. Our algorithm outperforms all
known solutions working in time provided , where is the alphabet size. We conjecture that there
exists a linear time RAM algorithm finding all runs.Comment: 4 pages, 2 figure
Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries
Longest common extension queries (LCE queries) and runs are ubiquitous in
algorithmic stringology. Linear-time algorithms computing runs and
preprocessing for constant-time LCE queries have been known for over a decade.
However, these algorithms assume a linearly-sortable integer alphabet. A recent
breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the
two notions: all the runs in a string can be computed via a linear number of
LCE queries. The first to consider these problems over a general ordered
alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an
-time algorithm for answering LCE queries. This
result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to time. In this work we note a special \emph{non-crossing} property
of LCE queries asked in the runs computation. We show that any such
non-crossing queries can be answered on-line in time, which
yields an -time algorithm for computing runs
Faster Longest Common Extension Queries in Strings over General Alphabets
Longest common extension queries (often called longest common prefix queries)
constitute a fundamental building block in multiple string algorithms, for
example computing runs and approximate pattern matching. We show that a
sequence of LCE queries for a string of size over a general ordered
alphabet can be realized in time making only
symbol comparisons. Consequently, all runs in a string over a general
ordered alphabet can be computed in time making
symbol comparisons. Our results improve upon a solution by Kosolobov
(Information Processing Letters, 2016), who gave an algorithm with running time and conjectured that time is possible. We
make a significant progress towards resolving this conjecture. Our techniques
extend to the case of general unordered alphabets, when the time increases to
. The main tools are difference covers and the
disjoint-sets data structure.Comment: Accepted to CPM 201
Finding the Leftmost Critical Factorization on Unordered Alphabet
We present a linear time and space algorithm computing the leftmost critical
factorization of a given string on an unordered alphabet.Comment: 13 pages, 13 figures (accepted to Theor. Comp. Sci.
Computing Runs on a Trie
A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n sqrt{log n}log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more
Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings
We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding.
Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2.
We also show an algorithm for computing all maximal repetitions in O(m alpha(m)) time and O(m) space, where alpha denotes the inverse Ackermann function
Linear Time Runs Over General Ordered Alphabets
A run in a string is a maximal periodic substring. For example, the string
contains the runs
and . There are less than runs in any
length- string, and computing all runs for a string over a linearly-sortable
alphabet takes time (Bannai et al., SODA 2015). Kosolobov
conjectured that there also exists a linear time runs algorithm for general
ordered alphabets (Inf. Process. Lett. 2016). The conjecture was almost proven
by Crochemore et al., who presented an time algorithm
(where is the extremely slowly growing inverse Ackermann function).
We show how to achieve time by exploiting combinatorial
properties of the Lyndon array, thus proving Kosolobov's conjecture.Comment: This work has been submitted to ICALP 202
Lyndon Arrays in Sublinear Time
?} with ? ? n. In this case, the string can be stored in O(n log ?) bits (or O(n / log_? n) words) of memory, and reading it takes only O(n / log_? n) time. We show that O(n / log_? n) time and words of space suffice to compute the succinct 2n-bit version of the Lyndon array. The time is optimal for w = O(log n). The algorithm uses precomputed lookup tables to perform significant parts of the computation in constant time. This is possible due to properties of periodic substrings, which we carefully analyze to achieve the desired result. We envision that the algorithm has applications in the computation of runs (maximal periodic substrings), where the Lyndon array plays a central role in both theoretically and practically fast algorithms