129 research outputs found

    Succinct Indexable Dictionaries with Applications to Encoding kk-ary Trees, Prefix Sums and Multisets

    Full text link
    We consider the {\it indexable dictionary} problem, which consists of storing a set S{0,...,m1}S \subseteq \{0,...,m-1\} for some integer mm, while supporting the operations of \Rank(x), which returns the number of elements in SS that are less than xx if xSx \in S, and -1 otherwise; and \Select(i) which returns the ii-th smallest element in SS. We give a data structure that supports both operations in O(1) time on the RAM model and requires B(n,m)+o(n)+O(lglgm){\cal B}(n,m) + o(n) + O(\lg \lg m) bits to store a set of size nn, where {\cal B}(n,m) = \ceil{\lg {m \choose n}} is the minimum number of bits required to store any nn-element subset from a universe of size mm. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the O(lglgm)O(\lg \lg m) additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: An information-theoretically optimal representation of a kk-ary cardinal tree that supports standard operations in constant time, A representation of a multiset of size nn from {0,...,m1}\{0,...,m-1\} in B(n,m+n)+o(n){\cal B}(n,m+n) + o(n) bits that supports (appropriate generalizations of) \Rank and \Select operations in constant time, and A representation of a sequence of nn non-negative integers summing up to mm in B(n,m+n)+o(n){\cal B}(n,m+n) + o(n) bits that supports prefix sum queries in constant time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report 2002/1

    More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries

    Get PDF
    We consider the problem of representing, in a compressed format, a bit-vector SS of mm bits with nn 1s, supporting the following operations, where b{0,1}b \in \{0, 1 \}: rankb(S,i)rank_b(S,i) returns the number of occurrences of bit bb in the prefix S[1..i]S[1..i]; selectb(S,i)select_b(S,i) returns the position of the iith occurrence of bit bb in SS. Such a data structure is called \emph{fully indexable dictionary (FID)} [Raman et al.,2007], and is at least as powerful as predecessor data structures. Our focus is on space-efficient FIDs on the \textsc{ram} model with word size Θ(lgm)\Theta(\lg m) and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring SS to be encoded, having length mm and containing nn ones, the minimal amount of information that needs to be stored is B(n,m)=log(mn)B(n,m) = \lceil \log {{m}\choose{n}} \rceil. The state of the art in building a FID for SS is given in [Patrascu,2008] using B(m,n)+O(m/((logm/t)t))+O(m3/4)B(m,n)+O(m / ((\log m/ t) ^t)) + O(m^{3/4}) bits, to support the operations in O(t)O(t) time. Here, we propose a parametric data structure exhibiting a time/space trade-off such that, for any real constants 000 0, it uses B(n,m) + O(n^{1+\delta} + n (\frac{m}{n^s})^\eps) bits and performs all the operations in time O(s\delta^{-1} + \eps^{-1}). The improvement is twofold: our redundancy can be lowered parametrically and, fixing s=O(1)s = O(1), we get a constant-time FID whose space is B(n,m) + O(m^\eps/\poly{n}) bits, for sufficiently large mm. This is a significant improvement compared to the previous bounds for the general case

    Improved ESP-index: a practical self-index for highly repetitive texts

    Full text link
    While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.Comment: This is the full version of a proceeding accepted to the 11th International Symposium on Experimental Algorithms (SEA2014

    Space-Efficient Graph Coarsening with Applications to Succinct Planar Encodings

    Get PDF
    We present a novel space-efficient graph coarsening technique for n-vertex planar graphs G, called cloud partition, which partitions the vertices V(G) into disjoint sets C of size O(log n) such that each C induces a connected subgraph of G. Using this partition ? we construct a so-called structure-maintaining minor F of G via specific contractions within the disjoint sets such that F has O(n/log n) vertices. The combination of (F, ?) is referred to as a cloud decomposition. For planar graphs we show that a cloud decomposition can be constructed in O(n) time and using O(n) bits. Given a cloud decomposition (F, ?) constructed for a planar graph G we are able to find a balanced separator of G in O(n/log n) time. Contrary to related publications, we do not make use of an embedding of the planar input graph. We generalize our cloud decomposition from planar graphs to H-minor-free graphs for any fixed graph H. This allows us to construct the succinct encoding scheme for H-minor-free graphs due to Blelloch and Farzan (CPM 2010) in O(n) time and O(n) bits improving both runtime and space by a factor of ?(log n). As an additional application of our cloud decomposition we show that, for H-minor-free graphs, a tree decomposition of width O(n^{1/2 + ?}) for any ? > 0 can be constructed in O(n) bits and a time linear in the size of the tree decomposition. A similar result by Izumi and Otachi (ICALP 2020) constructs a tree decomposition of width O(k ?n log n) for graphs of treewidth k ? ?n in sublinear space and polynomial time

    The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

    Full text link
    An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence

    R3D3: A doubly opportunistic data structure for compressing and indexing massive data

    Get PDF
    Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in informationtheoretically minimum space. Yet, efficient data processing requires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed
    corecore