195 research outputs found
Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing
This paper presents a general technique for optimally transforming any
dynamic data structure that operates on atomic and indivisible keys by
constant-time comparisons, into a data structure that handles unbounded-length
keys whose comparison cost is not a constant. Examples of these keys are
strings, multi-dimensional points, multiple-precision numbers, multi-key data
(e.g.~records), XML paths, URL addresses, etc. The technique is more general
than what has been done in previous work as no particular exploitation of the
underlying structure of is required. The only requirement is that the insertion
of a key must identify its predecessor or its successor.
Using the proposed technique, online suffix tree can be constructed in worst
case time per input symbol (as opposed to amortized
time per symbol, achieved by previously known algorithms). To our knowledge,
our algorithm is the first that achieves worst case time per input
symbol. Searching for a pattern of length in the resulting suffix tree
takes time, where is the
number of occurrences of the pattern. The paper also describes more
applications and show how to obtain alternative methods for dealing with suffix
sorting, dynamic lowest common ancestors and order maintenance
Optimal Substring-Equality Queries with Applications to Sparse Text Indexing
We consider the problem of encoding a string of length from an integer
alphabet of size so that access and substring equality queries (that
is, determining the equality of any two substrings) can be answered
efficiently. Any uniquely-decodable encoding supporting access must take
bits. We describe a new data
structure matching this lower bound when while supporting
both queries in optimal time. Furthermore, we show that the string can
be overwritten in-place with this structure. The redundancy of
bits and the constant query time break exponentially a lower bound that is
known to hold in the read-only model. Using our new string representation, we
obtain the first in-place subquadratic (indeed, even sublinear in some cases)
algorithms for several string-processing problems in the restore model: the
input string is rewritable and must be restored before the computation
terminates. In particular, we describe the first in-place subquadratic Monte
Carlo solutions to the sparse suffix sorting, sparse LCP array construction,
and suffix selection problems. With the sole exception of suffix selection, our
algorithms are also the first running in sublinear time for small enough sets
of input suffixes. Combining these solutions, we obtain the first
sublinear-time Monte Carlo algorithm for building the sparse suffix tree in
compact space. We also show how to derandomize our algorithms using small
space. This leads to the first Las Vegas in-place algorithm computing the full
LCP array in time and to the first Las Vegas in-place algorithms
solving the sparse suffix sorting and sparse LCP array construction problems in
time. Running times of these Las Vegas
algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las
Vegas algorithm
Deterministic sub-linear space LCE data structures with efficient construction
Given a string of symbols, a longest common extension query
asks for the length of the longest common prefix of the
th and th suffixes of . LCE queries have several important
applications in string processing, perhaps most notably to suffix sorting.
Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015:
65-76) described several data structures for answering LCE queries that offers
a space-time trade-off between data structure size and query time. In
particular, for a parameter , their best deterministic
solution is a data structure of size which allows LCE queries to be
answered in time. However, the construction time for all
deterministic versions of their data structure is quadratic in . In this
paper, we propose a deterministic solution that achieves a similar space-time
trade-off of query time using
space, but significantly improve the construction time to
.Comment: updated titl
String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure
Burrows-Wheeler transform (BWT) is an invertible text transformation that,
given a text of length , permutes its symbols according to the
lexicographic order of suffixes of . BWT is one of the most heavily studied
algorithms in data compression with numerous applications in indexing, sequence
analysis, and bioinformatics. Its construction is a bottleneck in many
scenarios, and settling the complexity of this task is one of the most
important unsolved problems in sequence analysis that has remained open for 25
years. Given a binary string of length , occupying machine
words, the BWT construction algorithm due to Hon et al. (SIAM J. Comput., 2009)
runs in time and space. Recent advancements (Belazzougui,
STOC 2014, and Munro et al., SODA 2017) focus on removing the alphabet-size
dependency in the time complexity, but they still require time.
In this paper, we propose the first algorithm that breaks the -time
barrier for BWT construction. Given a binary string of length , our
procedure builds the Burrows-Wheeler transform in time and
space. We complement this result with a conditional lower bound
proving that any further progress in the time complexity of BWT construction
would yield faster algorithms for the very well studied problem of counting
inversions: it would improve the state-of-the-art -time
solution by Chan and P\v{a}tra\c{s}cu (SODA 2010). Our algorithm is based on a
novel concept of string synchronizing sets, which is of independent interest.
As one of the applications, we show that this technique lets us design a data
structure of the optimal size that answers Longest Common
Extension queries (LCE queries) in time and, furthermore, can be
deterministically constructed in the optimal time.Comment: Full version of a paper accepted to STOC 201
External Batched Range Minimum Queries and LCP Construction
This work deals with I/O-optimal suffix array (SA) and longest common prefix (LCP) array construction in external memory. For this purpose, the I/O-optimale DC3 algorithm is enhanced by LCP construction and adapted accordingly to the external memory model. In this context we present a method to construct the required range minimum queries (RMQs) efficiently in external memory. The core of this work is a description and implementation of the resulting external DC3-LCP algorithm using the Stxxl - the C++ Standard Template Library for Extra Large Data Sets. Experimental results based on realistic, real-world instances rounds off this work
- …