15 research outputs found

    Optimal Color Range Reporting in One Dimension

    Full text link
    Color (or categorical) range reporting is a variant of the orthogonal range reporting problem in which every point in the input is assigned a \emph{color}. While the answer to an orthogonal point reporting query contains all points in the query range QQ, the answer to a color reporting query contains only distinct colors of points in QQ. In this paper we describe an O(N)-space data structure that answers one-dimensional color reporting queries in optimal O(k+1)O(k+1) time, where kk is the number of colors in the answer and NN is the number of points in the data structure. Our result can be also dynamized and extended to the external memory model

    Dynamic Data Structures for Document Collections and Graphs

    Full text link
    In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations

    Substring Range Reporting

    Get PDF
    We revisit various string indexing problems with range reporting features, namely, position-restricted substring searching, indexing substrings with gaps, and indexing substrings with intervals. We obtain the following main results. {itemize} We give efficient reductions for each of the above problems to a new problem, which we call \emph{substring range reporting}. Hence, we unify the previous work by showing that we may restrict our attention to a single problem rather than studying each of the above problems individually. We show how to solve substring range reporting with optimal query time and little space. Combined with our reductions this leads to significantly improved time-space trade-offs for the above problems. In particular, for each problem we obtain the first solutions with optimal time query and O(nlogO(1)n)O(n\log^{O(1)} n) space, where nn is the length of the indexed string. We show that our techniques for substring range reporting generalize to \emph{substring range counting} and \emph{substring range emptiness} variants. We also obtain non-trivial time-space trade-offs for these problems. {itemize} Our bounds for substring range reporting are based on a novel combination of suffix trees and range reporting data structures. The reductions are simple and general and may apply to other combinations of string indexing with range reporting

    Succinct Data Structures for Retrieval and Approximate Membership

    Get PDF
    The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U ->{0,1}^r that has specified values on the elements of a given set S, a subset of U, |S|=n, but may have any value on elements outside S. Minimal perfect hashing makes it possible to avoid storing the set S, but this induces a space overhead of Theta(n) bits in addition to the nr bits needed for function values. In this paper we show how to eliminate this overhead. Moreover, we show that for any k query time O(k) can be achieved using space that is within a factor 1+e^{-k} of optimal, asymptotically for large n. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. The time to construct the data structure is O(n), expected. A main technical ingredient is to utilize existing tight bounds on the probability of almost square random matrices with rows of low weight to have full row rank. In addition to direct constructions, we point out a close connection between retrieval structures and hash tables where keys are stored in an array and some kind of probing scheme is used. Further, we propose a general reduction that transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Again, we show how to eliminate the space overhead present in previously known methods, and get arbitrarily close to the lower bound. The evaluation procedures of our data structures are extremely simple (similar to a Bloom filter). For the results stated above we assume free access to fully random hash functions. However, we show how to justify this assumption using extra space o(n) to simulate full randomness on a RAM

    Practical dynamic de Bruijn graphs

    Get PDF
    International audienceAs datasets of DNA reads grow rapidly, it becomes more and more important to represent de Bruijn graphs compactly while still supporting fast assembly. Previous implementations have not supported edge deletion, however, which is important for pruning spurious edges from the graph. Belazzougui et al. Belazzougui et al. (2016b) recently proposed a compact and fully dynamic representation, which supports exact membership queries and insertions and deletions of both nodes and edges. In this paper we give a practical implementation of their data structure, supporting exact membership queries and insertions and deletions of edges only, and demonstrate experimentally that its performance is comparable to that of state-of-the-art implementations based on Bloom filters. Our source-code is publicly available at https://github.com/csirac/kbf under an open-source license

    Call Me By My Name: Simple, Practical Private Information Retrieval for Keyword Queries

    Get PDF
    We introduce ChalametPIR\mathsf{ChalametPIR}: a single-server Private Information Retrieval (PIR) scheme supporting fast, low-bandwidth keyword queries, with a conceptually very simple design. In particular, we develop a generic framework for converting PIR schemes for index queries over flat arrays (based on the Learning With Errors problem) into keyword PIR. This involves representing a key-value map using any probabilistic filter that permits reconstruction of elements from inclusion queries (e.g. Cuckoo filters). In particular, we make use of recently developed Binary Fuse filters to construct ChalametPIR\mathsf{ChalametPIR}, with minimal efficiency blow-up compared with state-of-the-art index-based schemes (all costs bounded by a factor of 1.08\leq 1.08). Furthermore, we show that ChalametPIR\mathsf{ChalametPIR} achieves runtimes and financial costs that are factors of between 66-11×11\times and 3.753.75-11.4×11.4\times more efficient, respectively, than state-of-the-art keyword PIR approaches, for varying database configurations. Bandwidth costs are additionally reduced or remain competitive, depending on the configuration. Finally, we believe that our application of Binary Fuse filters can have independent value towards developing efficient variants of related cryptographic primitives (e.g. private set intersection), that already benefit from using less efficient filter constructions
    corecore