2,993 research outputs found
Succinct Representations of Permutations and Functions
We investigate the problem of succinctly representing an arbitrary
permutation, \pi, on {0,...,n-1} so that \pi^k(i) can be computed quickly for
any i and any (positive or negative) integer power k. A representation taking
(1+\epsilon) n lg n + O(1) bits suffices to compute arbitrary powers in
constant time, for any positive constant \epsilon <= 1. A representation taking
the optimal \ceil{\lg n!} + o(n) bits can be used to compute arbitrary powers
in O(lg n / lg lg n) time.
We then consider the more general problem of succinctly representing an
arbitrary function, f: [n] \rightarrow [n] so that f^k(i) can be computed
quickly for any i and any integer power k. We give a representation that takes
(1+\epsilon) n lg n + O(1) bits, for any positive constant \epsilon <= 1, and
computes arbitrary positive powers in constant time. It can also be used to
compute f^k(i), for any negative integer k, in optimal O(1+|f^k(i)|) time.
We place emphasis on the redundancy, or the space beyond the
information-theoretic lower bound that the data structure uses in order to
support operations efficiently. A number of lower bounds have recently been
shown on the redundancy of data structures. These lower bounds confirm the
space-time optimality of some of our solutions. Furthermore, the redundancy of
one of our structures "surpasses" a recent lower bound by Golynski [Golynski,
SODA 2009], thus demonstrating the limitations of this lower bound.Comment: Preliminary versions of these results have appeared in the
Proceedings of ICALP 2003 and 2004. However, all results in this version are
improved over the earlier conference versio
The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space
An indexed sequence of strings is a data structure for storing a string
sequence that supports random access, searching, range counting and analytics
operations, both for exact matches and prefix search. String sequences lie at
the core of column-oriented databases, log processing, and other storage and
query tasks. In these applications each string can appear several times and the
order of the strings in the sequence is relevant. The prefix structure of the
strings is relevant as well: common prefixes are sought in strings to extract
interesting features from the sequence. Moreover, space-efficiency is highly
desirable as it translates directly into higher performance, since more data
can fit in fast memory.
We introduce and study the problem of compressed indexed sequence of strings,
representing indexed sequences of strings in nearly-optimal compressed space,
both in the static and dynamic settings, while preserving provably good
performance for the supported operations.
We present a new data structure for this problem, the Wavelet Trie, which
combines the classical Patricia Trie with the Wavelet Tree, a succinct data
structure for storing a compressed sequence. The resulting Wavelet Trie
smoothly adapts to a sequence of strings that changes over time. It improves on
the state-of-the-art compressed data structures by supporting a dynamic
alphabet (i.e. the set of distinct strings) and prefix queries, both crucial
requirements in the aforementioned applications, and on traditional indexes by
reducing space occupancy to close to the entropy of the sequence
- …