799 research outputs found
A Dynamic Space-Efficient Filter with Constant Time Operations
A dynamic dictionary is a data structure that maintains sets of cardinality at most n from a given universe and supports insertions, deletions, and membership queries. A filter approximates membership queries with a one-sided error that occurs with probability at most ?. The goal is to obtain dynamic filters that are space-efficient (the space is 1+o(1) times the information-theoretic lower bound) and support all operations in constant time with high probability. One approach to designing filters is to reduce to the retrieval problem. When the size of the universe is polynomial in n, this approach yields a space-efficient dynamic filter as long as the error parameter ? satisfies log(1/?) = ?(log log n). For the case that log(1/?) = O(log log n), we present the first space-efficient dynamic filter with constant time operations in the worst case (whp). In contrast, the space-efficient dynamic filter of Pagh et al. [Anna Pagh et al., 2005] supports insertions and deletions in amortized expected constant time. Our approach employs the classic reduction of Carter et al. [Carter et al., 1978] on a new type of dictionary construction that supports random multisets
Dynamic "Succincter"
Augmented B-trees (aB-trees) are a broad class of data structures. The
seminal work "succincter" by Patrascu showed that any aB-tree can be stored
using only two bits of redundancy, while supporting queries to the tree in time
proportional to its depth. It has been a versatile building block for
constructing succinct data structures, including rank/select data structures,
dictionaries, locally decodable arithmetic coding, storing balanced
parenthesis, etc.
In this paper, we show how to "dynamize" an aB-tree. Our main result is the
design of dynamic aB-trees (daB-trees) with branching factor two using only
three bits of redundancy (with the help of lookup tables that are of negligible
size in applications), while supporting updates and queries in time polynomial
in its depth. As an application, we present a dynamic rank/select data
structure for -bit arrays, also known as a dynamic fully indexable
dictionary (FID). It supports updates and queries in
time, and when the array has ones, the data structure occupies bits. Note that the update and
query times are optimal even without space constraints due to a lower bound by
Fredman and Saks. Prior to our work, no dynamic FID with near-optimal update
and query times and redundancy was known. We further show that a
dynamic sequence supporting insertions, deletions and rank/select queries can
be maintained in (optimal) time and with bits of redundancy.Comment: 33 pages, 1 figure; in FOCS 202
Succinct Data Structures for Retrieval and Approximate Membership
The retrieval problem is the problem of associating data with keys in a set.
Formally, the data structure must store a function f: U ->{0,1}^r that has
specified values on the elements of a given set S, a subset of U, |S|=n, but
may have any value on elements outside S. Minimal perfect hashing makes it
possible to avoid storing the set S, but this induces a space overhead of
Theta(n) bits in addition to the nr bits needed for function values. In this
paper we show how to eliminate this overhead. Moreover, we show that for any k
query time O(k) can be achieved using space that is within a factor 1+e^{-k} of
optimal, asymptotically for large n. If we allow logarithmic evaluation time,
the additive overhead can be reduced to O(log log n) bits whp. The time to
construct the data structure is O(n), expected. A main technical ingredient is
to utilize existing tight bounds on the probability of almost square random
matrices with rows of low weight to have full row rank. In addition to direct
constructions, we point out a close connection between retrieval structures and
hash tables where keys are stored in an array and some kind of probing scheme
is used. Further, we propose a general reduction that transfers the results on
retrieval into analogous results on approximate membership, a problem
traditionally addressed using Bloom filters. Again, we show how to eliminate
the space overhead present in previously known methods, and get arbitrarily
close to the lower bound. The evaluation procedures of our data structures are
extremely simple (similar to a Bloom filter). For the results stated above we
assume free access to fully random hash functions. However, we show how to
justify this assumption using extra space o(n) to simulate full randomness on a
RAM
Dynamic Dictionary with Subconstant Wasted Bits per Key
Dictionaries have been one of the central questions in data structures. A
dictionary data structure maintains a set of key-value pairs under insertions
and deletions such that given a query key, the data structure efficiently
returns its value. The state-of-the-art dictionaries [Bender, Farach-Colton,
Kuszmaul, Kuszmaul, Liu 2022] store key-value pairs with only bits of redundancy, and support all operations in time,
for . It was recently shown to be optimal [Li, Liang, Yu, Zhou
2023b].
In this paper, we study the regime where the redundant bits is , and
show that when is at least , all operations can be
supported in time, matching the lower bound in this
regime [Li, Liang, Yu, Zhou 2023b]. We present two data structures based on
which range is in. The data structure for utilizes a
generalization of adapters studied in [Berger, Kuszmaul, Polak, Tidor, Wein
2022] and [Li, Liang, Yu, Zhou 2023a]. The data structure for is based on recursively hashing into buckets with logarithmic
sizes.Comment: 46 pages; SODA 202
Succinct Filters for Sets of Unknown Sizes
The membership problem asks to maintain a set S ? [u], supporting insertions and membership queries, i.e., testing if a given element is in the set. A data structure that computes exact answers is called a dictionary. When a (small) false positive rate ? is allowed, the data structure is called a filter.
The space usages of the standard dictionaries or filters usually depend on the upper bound on the size of S, while the actual set can be much smaller.
Pagh, Segev and Wieder [Pagh et al., 2013] were the first to study filters with varying space usage based on the current |S|. They showed in order to match the space with the current set size n = |S|, any filter data structure must use (1-o(1))n(log(1/?)+(1-O(?))log log n) bits, in contrast to the well-known lower bound of N log(1/?) bits, where N is an upper bound on |S|. They also presented a data structure with almost optimal space of (1+o(1))n(log(1/?)+O(log log n)) bits provided that n > u^0.001, with expected amortized constant insertion time and worst-case constant lookup time.
In this work, we present a filter data structure with improvements in two aspects:
- it has constant worst-case time for all insertions and lookups with high probability;
- it uses space (1+o(1))n(log (1/?)+log log n) bits when n > u^0.001, achieving optimal leading constant for all ? = o(1). We also present a dictionary that uses (1+o(1))nlog(u/n) bits of space, matching the optimal space in terms of the current size, and performs all operations in constant time with high probability
- …