43,932 research outputs found
Binary Jumbled String Matching for Highly Run-Length Compressible Texts
The Binary Jumbled String Matching problem is defined as: Given a string
over of length and a query , with non-negative
integers, decide whether has a substring with exactly 's and
's. Previous solutions created an index of size O(n) in a pre-processing
step, which was then used to answer queries in constant time. The fastest
algorithms for construction of this index have running time
[Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or in
the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index
constructed directly from the run-length encoding of . The construction time
of our index is , where O(n) is the time for computing
the run-length encoding of and is the length of this encoding---this
is no worse than previous solutions if and better if . Our index can be queried in time. While
in the worst case, preliminary investigations have
indicated that may often be close to . Furthermore, the algorithm
for constructing the index is conceptually simple and easy to implement. In an
attempt to shed light on the structure and size of our index, we characterize
it in terms of the prefix normal forms of introduced in [Fici and Lipt\'ak,
DLT 2011].Comment: v2: only small cosmetic changes; v3: new title, weakened conjectures
on size of Corner Index (we no longer conjecture it to be always linear in
size of RLE); removed experimental part on random strings (these are valid
but limited in their predictive power w.r.t. general strings); v3 published
in IP
Algorithms to Compute the Lyndon Array
We first describe three algorithms for computing the Lyndon array that have
been suggested in the literature, but for which no structured exposition has
been given. Two of these algorithms execute in quadratic time in the worst
case, the third achieves linear time, but at the expense of prior computation
of both the suffix array and the inverse suffix array of x. We then go on to
describe two variants of a new algorithm that avoids prior computation of
global data structures and executes in worst-case n log n time. Experimental
evidence suggests that all but one of these five algorithms require only linear
execution time in practice, with the two new algorithms faster by a small
factor. We conjecture that there exists a fast and worst-case linear-time
algorithm to compute the Lyndon array that is also elementary (making no use of
global data structures such as the suffix array)
On the maximal sum of exponents of runs in a string
A run is an inclusion maximal occurrence in a string (as a subinterval) of a
repetition with a period such that . The exponent of a run
is defined as and is . We show new bounds on the maximal sum of
exponents of runs in a string of length . Our upper bound of is
better than the best previously known proven bound of by Crochemore &
Ilie (2008). The lower bound of , obtained using a family of binary
words, contradicts the conjecture of Kolpakov & Kucherov (1999) that the
maximal sum of exponents of runs in a string of length is smaller than Comment: 7 pages, 1 figur
Algorithms for Longest Common Abelian Factors
In this paper we consider the problem of computing the longest common abelian
factor (LCAF) between two given strings. We present a simple
time algorithm, where is the length of the strings and is the
alphabet size, and a sub-quadratic running time solution for the binary string
case, both having linear space requirement. Furthermore, we present a modified
algorithm applying some interesting tricks and experimentally show that the
resulting algorithm runs faster.Comment: 13 pages, 4 figure
Faster subsequence recognition in compressed strings
Computation on compressed strings is one of the key approaches to processing
massive data sets. We consider local subsequence recognition problems on
strings compressed by straight-line programs (SLP), which is closely related to
Lempel--Ziv compression. For an SLP-compressed text of length , and an
uncompressed pattern of length , C{\'e}gielski et al. gave an algorithm for
local subsequence recognition running in time . We improve
the running time to . Our algorithm can also be used to
compute the longest common subsequence between a compressed text and an
uncompressed pattern in time ; the same problem with a
compressed pattern is known to be NP-hard
- …