2 research outputs found
Optimal Substring-Equality Queries with Applications to Sparse Text Indexing
We consider the problem of encoding a string of length from an integer
alphabet of size so that access and substring equality queries (that
is, determining the equality of any two substrings) can be answered
efficiently. Any uniquely-decodable encoding supporting access must take
bits. We describe a new data
structure matching this lower bound when while supporting
both queries in optimal time. Furthermore, we show that the string can
be overwritten in-place with this structure. The redundancy of
bits and the constant query time break exponentially a lower bound that is
known to hold in the read-only model. Using our new string representation, we
obtain the first in-place subquadratic (indeed, even sublinear in some cases)
algorithms for several string-processing problems in the restore model: the
input string is rewritable and must be restored before the computation
terminates. In particular, we describe the first in-place subquadratic Monte
Carlo solutions to the sparse suffix sorting, sparse LCP array construction,
and suffix selection problems. With the sole exception of suffix selection, our
algorithms are also the first running in sublinear time for small enough sets
of input suffixes. Combining these solutions, we obtain the first
sublinear-time Monte Carlo algorithm for building the sparse suffix tree in
compact space. We also show how to derandomize our algorithms using small
space. This leads to the first Las Vegas in-place algorithm computing the full
LCP array in time and to the first Las Vegas in-place algorithms
solving the sparse suffix sorting and sparse LCP array construction problems in
time. Running times of these Las Vegas
algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las
Vegas algorithm
Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast
Sparse suffix sorting is the problem of sorting suffixes of a string
of length . Efficient sparse suffix sorting algorithms have existed for more
than a decade. Despite the multitude of works and their justified claims for
applications in text indexing, the existing algorithms have not been employed
by practitioners. Arguably this is because there are no simple, direct, and
efficient algorithms for sparse suffix array construction. We provide two new
algorithms for constructing the sparse suffix and LCP arrays that are
simultaneously simple, direct, small, and fast. In particular, our algorithms
are: simple in the sense that they can be implemented using only basic data
structures; direct in the sense that the output arrays are not a byproduct of
constructing the sparse suffix tree or an LCE data structure; fast in the sense
that they run in time, in the worst case, or in
time, when the total number of suffixes with an LCP value
greater than is in
, matching the time of the optimal yet much more
complicated algorithms [Gawrychowski and Kociumaka, SODA 2017; Birenzwige et
al., SODA 2020]; and small in the sense that they can be implemented using only
machine words. Our algorithms are simplified, yet non-trivial,
space-efficient adaptations of the Monte Carlo algorithm by I et al. for
constructing the sparse suffix tree in time [STACS
2014]. We also provide proof-of-concept experiments to justify our claims on
simplicity and efficiency.Comment: 16 pages, 1 figur