2,993 research outputs found

    Succinct Representations of Permutations and Functions

    Get PDF
    We investigate the problem of succinctly representing an arbitrary permutation, \pi, on {0,...,n-1} so that \pi^k(i) can be computed quickly for any i and any (positive or negative) integer power k. A representation taking (1+\epsilon) n lg n + O(1) bits suffices to compute arbitrary powers in constant time, for any positive constant \epsilon <= 1. A representation taking the optimal \ceil{\lg n!} + o(n) bits can be used to compute arbitrary powers in O(lg n / lg lg n) time. We then consider the more general problem of succinctly representing an arbitrary function, f: [n] \rightarrow [n] so that f^k(i) can be computed quickly for any i and any integer power k. We give a representation that takes (1+\epsilon) n lg n + O(1) bits, for any positive constant \epsilon <= 1, and computes arbitrary positive powers in constant time. It can also be used to compute f^k(i), for any negative integer k, in optimal O(1+|f^k(i)|) time. We place emphasis on the redundancy, or the space beyond the information-theoretic lower bound that the data structure uses in order to support operations efficiently. A number of lower bounds have recently been shown on the redundancy of data structures. These lower bounds confirm the space-time optimality of some of our solutions. Furthermore, the redundancy of one of our structures "surpasses" a recent lower bound by Golynski [Golynski, SODA 2009], thus demonstrating the limitations of this lower bound.Comment: Preliminary versions of these results have appeared in the Proceedings of ICALP 2003 and 2004. However, all results in this version are improved over the earlier conference versio

    The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

    Full text link
    An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence
    • …
    corecore