15 research outputs found
Optimal Color Range Reporting in One Dimension
Color (or categorical) range reporting is a variant of the orthogonal range
reporting problem in which every point in the input is assigned a \emph{color}.
While the answer to an orthogonal point reporting query contains all points in
the query range , the answer to a color reporting query contains only
distinct colors of points in . In this paper we describe an O(N)-space data
structure that answers one-dimensional color reporting queries in optimal
time, where is the number of colors in the answer and is the
number of points in the data structure. Our result can be also dynamized and
extended to the external memory model
Dynamic Data Structures for Document Collections and Graphs
In the dynamic indexing problem, we must maintain a changing collection of
text documents so that we can efficiently support insertions, deletions, and
pattern matching queries. We are especially interested in developing efficient
data structures that store and query the documents in compressed form. All
previous compressed solutions to this problem rely on answering rank and select
queries on a dynamic sequence of symbols. Because of the lower bound in
[Fredman and Saks, 1989], answering rank queries presents a bottleneck in
compressed dynamic indexing. In this paper we show how this lower bound can be
circumvented using our new framework. We demonstrate that the gap between
static and dynamic variants of the indexing problem can be almost closed. Our
method is based on a novel framework for adding dynamism to static compressed
data structures. Our framework also applies more generally to dynamizing other
problems. We show, for example, how our framework can be applied to develop
compressed representations of dynamic graphs and binary relations
Substring Range Reporting
We revisit various string indexing problems with range reporting features,
namely, position-restricted substring searching, indexing substrings with gaps,
and indexing substrings with intervals. We obtain the following main results.
{itemize} We give efficient reductions for each of the above problems to a new
problem, which we call \emph{substring range reporting}. Hence, we unify the
previous work by showing that we may restrict our attention to a single problem
rather than studying each of the above problems individually. We show how to
solve substring range reporting with optimal query time and little space.
Combined with our reductions this leads to significantly improved time-space
trade-offs for the above problems. In particular, for each problem we obtain
the first solutions with optimal time query and space,
where is the length of the indexed string. We show that our techniques for
substring range reporting generalize to \emph{substring range counting} and
\emph{substring range emptiness} variants. We also obtain non-trivial
time-space trade-offs for these problems. {itemize} Our bounds for substring
range reporting are based on a novel combination of suffix trees and range
reporting data structures. The reductions are simple and general and may apply
to other combinations of string indexing with range reporting
Succinct Data Structures for Retrieval and Approximate Membership
The retrieval problem is the problem of associating data with keys in a set.
Formally, the data structure must store a function f: U ->{0,1}^r that has
specified values on the elements of a given set S, a subset of U, |S|=n, but
may have any value on elements outside S. Minimal perfect hashing makes it
possible to avoid storing the set S, but this induces a space overhead of
Theta(n) bits in addition to the nr bits needed for function values. In this
paper we show how to eliminate this overhead. Moreover, we show that for any k
query time O(k) can be achieved using space that is within a factor 1+e^{-k} of
optimal, asymptotically for large n. If we allow logarithmic evaluation time,
the additive overhead can be reduced to O(log log n) bits whp. The time to
construct the data structure is O(n), expected. A main technical ingredient is
to utilize existing tight bounds on the probability of almost square random
matrices with rows of low weight to have full row rank. In addition to direct
constructions, we point out a close connection between retrieval structures and
hash tables where keys are stored in an array and some kind of probing scheme
is used. Further, we propose a general reduction that transfers the results on
retrieval into analogous results on approximate membership, a problem
traditionally addressed using Bloom filters. Again, we show how to eliminate
the space overhead present in previously known methods, and get arbitrarily
close to the lower bound. The evaluation procedures of our data structures are
extremely simple (similar to a Bloom filter). For the results stated above we
assume free access to fully random hash functions. However, we show how to
justify this assumption using extra space o(n) to simulate full randomness on a
RAM
Practical dynamic de Bruijn graphs
International audienceAs datasets of DNA reads grow rapidly, it becomes more and more important to represent de Bruijn graphs compactly while still supporting fast assembly. Previous implementations have not supported edge deletion, however, which is important for pruning spurious edges from the graph. Belazzougui et al. Belazzougui et al. (2016b) recently proposed a compact and fully dynamic representation, which supports exact membership queries and insertions and deletions of both nodes and edges. In this paper we give a practical implementation of their data structure, supporting exact membership queries and insertions and deletions of edges only, and demonstrate experimentally that its performance is comparable to that of state-of-the-art implementations based on Bloom filters. Our source-code is publicly available at https://github.com/csirac/kbf under an open-source license
Call Me By My Name: Simple, Practical Private Information Retrieval for Keyword Queries
We introduce : a single-server Private Information Retrieval (PIR) scheme supporting fast, low-bandwidth keyword queries, with a conceptually very simple design. In particular, we develop a generic framework for converting PIR schemes for index queries over flat arrays (based on the Learning With Errors problem) into keyword PIR. This involves representing a key-value map using any probabilistic filter that permits reconstruction of elements from inclusion queries (e.g. Cuckoo filters). In particular, we make use of recently developed Binary Fuse filters to construct , with minimal efficiency blow-up compared with state-of-the-art index-based schemes (all costs bounded by a factor of ). Furthermore, we show that achieves runtimes and financial costs that are factors of between - and - more efficient, respectively, than state-of-the-art keyword PIR approaches, for varying database configurations. Bandwidth costs are additionally reduced or remain competitive, depending on the configuration. Finally, we believe that our application of Binary Fuse filters can have independent value towards developing efficient variants of related cryptographic primitives (e.g. private set intersection), that already benefit from using less efficient filter constructions