Search CORE

332 research outputs found

Bloom Filters in Adversarial Environments

Author: Naor Moni
Yogev Eylon
Publication venue
Publication date: 29/01/2019
Field of study

Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure. In this work, we consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we consider a data structure known as "Bloom filter" and prove a tight connection between Bloom filters in this model and cryptography. A Bloom filter represents a set

S

of elements approximately, by using fewer bits than a precise representation. The price for succinctness is allowing some errors: for any

x \in S

it should always answer `Yes', and for any

x \notin S

it should answer `Yes' only with small probability. In the adversarial model, we consider both efficient adversaries (that run in polynomial time) and computationally unbounded adversaries that are only bounded in the number of queries they can make. For computationally bounded adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and only if one-way functions exist. For unbounded adversaries we show that there exists a Bloom filter for sets of size

n

and error

\varepsilon

, that is secure against

t

queries and uses only

O(n \log{\frac{1}{\varepsilon}}+t)

bits of memory. In comparison,

n\log{\frac{1}{\varepsilon}}

is the best possible under a non-adaptive adversary

arXiv.org e-Print Archive

Cryptology ePrint Archive

Fast and Powerful Hashing using Tabulation

Author: Thorup Mikkel
Publication venue
Publication date: 01/01/2016
Field of study

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of

c

characters and we have precomputed character tables

h_1,...,h_c

mapping characters to random hash values. A key

x=(x_1,...,x_c)

is hashed to

h_1[x_1] \oplus h_2[x_2].....\oplus h_c[x_c]

. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. Next we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing. This also yields an extremely fast pseudo-random number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully-random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. While these tabulation schemes are all easy to implement and use, their analysis is not

arXiv.org e-Print Archive

Copenhagen University Research Information System

Dagstuhl Research Online Publication Server

ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force

Author: Lehmann Hans-Peter
Sanders Peter
Walzer Stefan
Publication venue
Publication date: 13/11/2023
Field of study

A minimal perfect hash function (MPHF) maps a set

S

n

keys to the first

n

integers without collisions. There is a lower bound of

n\log_2e-O(\log n)

bits of space needed to represent an MPHF. A matching upper bound is obtained using the brute-force algorithm that tries random hash functions until stumbling on an MPHF and stores that function's seed. In expectation,

e^n\textrm{poly}(n)

seeds need to be tested. The most space-efficient previous algorithms for constructing MPHFs all use such a brute-force approach as a basic building block. In this paper, we introduce ShockHash - Small, heavily overloaded cuckoo hash tables. ShockHash uses two hash functions

h_0

and

h_1

, hoping for the existence of a function

f : S \rightarrow \{0,1\}

such that

x \mapsto h_{f(x)}(x)

is an MPHF on

S

. In graph terminology, ShockHash generates

n

-edge random graphs until stumbling on a pseudoforest - a graph where each component contains as many edges as nodes. Using cuckoo hashing, ShockHash then derives an MPHF from the pseudoforest in linear time. It uses a 1-bit retrieval data structure to store

f

using

n + o(n)

bits. By carefully analyzing the probability that a random graph is a pseudoforest, we show that ShockHash needs to try only

(e/2)^n\textrm{poly}(n)

hash function seeds in expectation, reducing the space for storing the seed by roughly

n

bits. This makes ShockHash almost a factor

2^n

faster than brute-force, while maintaining the asymptotically optimal space consumption. An implementation within the RecSplit framework yields the currently most space efficient MPHFs, i.e., competing approaches need about two orders of magnitude more work to achieve the same space

arXiv.org e-Print Archive