332 research outputs found
Bloom Filters in Adversarial Environments
Many efficient data structures use randomness, allowing them to improve upon
deterministic ones. Usually, their efficiency and correctness are analyzed
using probabilistic tools under the assumption that the inputs and queries are
independent of the internal randomness of the data structure. In this work, we
consider data structures in a more robust model, which we call the adversarial
model. Roughly speaking, this model allows an adversary to choose inputs and
queries adaptively according to previous responses. Specifically, we consider a
data structure known as "Bloom filter" and prove a tight connection between
Bloom filters in this model and cryptography.
A Bloom filter represents a set of elements approximately, by using fewer
bits than a precise representation. The price for succinctness is allowing some
errors: for any it should always answer `Yes', and for any it should answer `Yes' only with small probability.
In the adversarial model, we consider both efficient adversaries (that run in
polynomial time) and computationally unbounded adversaries that are only
bounded in the number of queries they can make. For computationally bounded
adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and
only if one-way functions exist. For unbounded adversaries we show that there
exists a Bloom filter for sets of size and error , that is
secure against queries and uses only
bits of memory. In comparison, is the best
possible under a non-adaptive adversary
Fast and Powerful Hashing using Tabulation
Randomized algorithms are often enjoyed for their simplicity, but the hash
functions employed to yield the desired probabilistic guarantees are often too
complicated to be practical. Here we survey recent results on how simple
hashing schemes based on tabulation provide unexpectedly strong guarantees.
Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as
consisting of characters and we have precomputed character tables
mapping characters to random hash values. A key
is hashed to . This schemes is
very fast with character tables in cache. While simple tabulation is not even
4-independent, it does provide many of the guarantees that are normally
obtained via higher independence, e.g., linear probing and Cuckoo hashing.
Next we consider twisted tabulation where one input character is "twisted" in
a simple way. The resulting hash function has powerful distributional
properties: Chernoff-Hoeffding type tail bounds and a very small bias for
min-wise hashing. This also yields an extremely fast pseudo-random number
generator that is provably good for many classic randomized algorithms and
data-structures.
Finally, we consider double tabulation where we compose two simple tabulation
functions, applying one to the output of the other, and show that this yields
very high independence in the classic framework of Carter and Wegman [1977]. In
fact, w.h.p., for a given set of size proportional to that of the space
consumed, double tabulation gives fully-random hashing. We also mention some
more elaborate tabulation schemes getting near-optimal independence for given
time and space.
While these tabulation schemes are all easy to implement and use, their
analysis is not
ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force
A minimal perfect hash function (MPHF) maps a set of keys to the
first integers without collisions. There is a lower bound of
bits of space needed to represent an MPHF. A matching
upper bound is obtained using the brute-force algorithm that tries random hash
functions until stumbling on an MPHF and stores that function's seed. In
expectation, seeds need to be tested. The most
space-efficient previous algorithms for constructing MPHFs all use such a
brute-force approach as a basic building block.
In this paper, we introduce ShockHash - Small, heavily overloaded cuckoo hash
tables. ShockHash uses two hash functions and , hoping for the
existence of a function such that is an MPHF on . In graph terminology, ShockHash generates
-edge random graphs until stumbling on a pseudoforest - a graph where each
component contains as many edges as nodes. Using cuckoo hashing, ShockHash then
derives an MPHF from the pseudoforest in linear time. It uses a 1-bit retrieval
data structure to store using bits.
By carefully analyzing the probability that a random graph is a pseudoforest,
we show that ShockHash needs to try only hash
function seeds in expectation, reducing the space for storing the seed by
roughly bits. This makes ShockHash almost a factor faster than
brute-force, while maintaining the asymptotically optimal space consumption. An
implementation within the RecSplit framework yields the currently most space
efficient MPHFs, i.e., competing approaches need about two orders of magnitude
more work to achieve the same space
- β¦