    Bloom Filters in Adversarial Environments

    Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure. In this work, we consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we consider a data structure known as "Bloom filter" and prove a tight connection between Bloom filters in this model and cryptography. A Bloom filter represents a set SS of elements approximately, by using fewer bits than a precise representation. The price for succinctness is allowing some errors: for any x∈Sx \in S it should always answer `Yes', and for any xβˆ‰Sx \notin S it should answer `Yes' only with small probability. In the adversarial model, we consider both efficient adversaries (that run in polynomial time) and computationally unbounded adversaries that are only bounded in the number of queries they can make. For computationally bounded adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and only if one-way functions exist. For unbounded adversaries we show that there exists a Bloom filter for sets of size nn and error Ξ΅\varepsilon, that is secure against tt queries and uses only O(nlog⁑1Ξ΅+t)O(n \log{\frac{1}{\varepsilon}}+t) bits of memory. In comparison, nlog⁑1Ξ΅n\log{\frac{1}{\varepsilon}} is the best possible under a non-adaptive adversary

    Fast and Powerful Hashing using Tabulation

    Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of cc characters and we have precomputed character tables h1,...,hch_1,...,h_c mapping characters to random hash values. A key x=(x1,...,xc)x=(x_1,...,x_c) is hashed to h1[x1]βŠ•h2[x2].....βŠ•hc[xc]h_1[x_1] \oplus h_2[x_2].....\oplus h_c[x_c]. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. Next we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing. This also yields an extremely fast pseudo-random number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully-random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. While these tabulation schemes are all easy to implement and use, their analysis is not

    ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force

    A minimal perfect hash function (MPHF) maps a set SS of nn keys to the first nn integers without collisions. There is a lower bound of nlog⁑2eβˆ’O(log⁑n)n\log_2e-O(\log n) bits of space needed to represent an MPHF. A matching upper bound is obtained using the brute-force algorithm that tries random hash functions until stumbling on an MPHF and stores that function's seed. In expectation, enpoly(n)e^n\textrm{poly}(n) seeds need to be tested. The most space-efficient previous algorithms for constructing MPHFs all use such a brute-force approach as a basic building block. In this paper, we introduce ShockHash - Small, heavily overloaded cuckoo hash tables. ShockHash uses two hash functions h0h_0 and h1h_1, hoping for the existence of a function f:Sβ†’{0,1}f : S \rightarrow \{0,1\} such that x↦hf(x)(x)x \mapsto h_{f(x)}(x) is an MPHF on SS. In graph terminology, ShockHash generates nn-edge random graphs until stumbling on a pseudoforest - a graph where each component contains as many edges as nodes. Using cuckoo hashing, ShockHash then derives an MPHF from the pseudoforest in linear time. It uses a 1-bit retrieval data structure to store ff using n+o(n)n + o(n) bits. By carefully analyzing the probability that a random graph is a pseudoforest, we show that ShockHash needs to try only (e/2)npoly(n)(e/2)^n\textrm{poly}(n) hash function seeds in expectation, reducing the space for storing the seed by roughly nn bits. This makes ShockHash almost a factor 2n2^n faster than brute-force, while maintaining the asymptotically optimal space consumption. An implementation within the RecSplit framework yields the currently most space efficient MPHFs, i.e., competing approaches need about two orders of magnitude more work to achieve the same space
