4,105 research outputs found
Wavelet Trees Meet Suffix Trees
We present an improved wavelet tree construction algorithm and discuss its
applications to a number of rank/select problems for integer keys and strings.
Given a string of length n over an alphabet of size , our
method builds the wavelet tree in time,
improving upon the state-of-the-art algorithm by a factor of .
As a consequence, given an array of n integers we can construct in time a data structure consisting of machine words and
capable of answering rank/select queries for the subranges of the array in
time. This is a -factor improvement in
query time compared to Chan and P\u{a}tra\c{s}cu and a -factor
improvement in construction time compared to Brodal et al.
Next, we switch to stringological context and propose a novel notion of
wavelet suffix trees. For a string w of length n, this data structure occupies
words, takes time to construct, and simultaneously
captures the combinatorial structure of substrings of w while enabling
efficient top-down traversal and binary search. In particular, with a wavelet
suffix tree we are able to answer in time the following two
natural analogues of rank/select queries for suffixes of substrings: for
substrings x and y of w count the number of suffixes of x that are
lexicographically smaller than y, and for a substring x of w and an integer k,
find the k-th lexicographically smallest suffix of x.
We further show that wavelet suffix trees allow to compute a
run-length-encoded Burrows-Wheeler transform of a substring x of w in time, where s denotes the length of the resulting run-length encoding.
This answers a question by Cormode and Muthukrishnan, who considered an
analogous problem for Lempel-Ziv compression.Comment: 33 pages, 5 figures; preliminary version published at SODA 201
Linear pattern matching on sparse suffix trees
Packing several characters into one computer word is a simple and natural way
to compress the representation of a string and to speed up its processing.
Exploiting this idea, we propose an index for a packed string, based on a {\em
sparse suffix tree} \cite{KU-96} with appropriately defined suffix links.
Assuming, under the standard unit-cost RAM model, that a word can store up to
characters ( the alphabet size), our index takes
space, i.e. the same space as the packed string itself.
The resulting pattern matching algorithm runs in time ,
where is the length of the pattern, is the actual number of characters
stored in a word and is the number of pattern occurrences
Computing Lempel-Ziv Factorization Online
We present an algorithm which computes the Lempel-Ziv factorization of a word
of length on an alphabet of size online in the
following sense: it reads starting from the left, and, after reading each
characters of , updates the Lempel-Ziv
factorization. The algorithm requires bits of space and O(n
\log^2 n) time. The basis of the algorithm is a sparse suffix tree combined
with wavelet trees
CiNCT: Compression and retrieval for massive vehicular trajectories via relative movement labeling
In this paper, we present a compressed data structure for moving object
trajectories in a road network, which are represented as sequences of road
edges. Unlike existing compression methods for trajectories in a network, our
method supports pattern matching and decompression from an arbitrary position
while retaining a high compressibility with theoretical guarantees.
Specifically, our method is based on FM-index, a fast and compact data
structure for pattern matching. To enhance the compression, we incorporate the
sparsity of road networks into the data structure. In particular, we present
the novel concepts of relative movement labeling and PseudoRank, each
contributing to significant reductions in data size and query processing time.
Our theoretical analysis and experimental studies reveal the advantages of our
proposed method as compared to existing trajectory compression methods and
FM-index variants
Linear-Space Data Structures for Range Mode Query in Arrays
A mode of a multiset is an element of maximum multiplicity;
that is, occurs at least as frequently as any other element in . Given a
list of items, we consider the problem of constructing a data
structure that efficiently answers range mode queries on . Each query
consists of an input pair of indices for which a mode of must
be returned. We present an -space static data structure
that supports range mode queries in time in the worst case, for
any fixed . When , this corresponds to
the first linear-space data structure to guarantee query time. We
then describe three additional linear-space data structures that provide
, , and query time, respectively, where denotes the
number of distinct elements in and denotes the frequency of the mode of
. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure
Deterministic sub-linear space LCE data structures with efficient construction
Given a string of symbols, a longest common extension query
asks for the length of the longest common prefix of the
th and th suffixes of . LCE queries have several important
applications in string processing, perhaps most notably to suffix sorting.
Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015:
65-76) described several data structures for answering LCE queries that offers
a space-time trade-off between data structure size and query time. In
particular, for a parameter , their best deterministic
solution is a data structure of size which allows LCE queries to be
answered in time. However, the construction time for all
deterministic versions of their data structure is quadratic in . In this
paper, we propose a deterministic solution that achieves a similar space-time
trade-off of query time using
space, but significantly improve the construction time to
.Comment: updated titl
- …