727 research outputs found
Fast and Powerful Hashing using Tabulation
Randomized algorithms are often enjoyed for their simplicity, but the hash
functions employed to yield the desired probabilistic guarantees are often too
complicated to be practical. Here we survey recent results on how simple
hashing schemes based on tabulation provide unexpectedly strong guarantees.
Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as
consisting of characters and we have precomputed character tables
mapping characters to random hash values. A key
is hashed to . This schemes is
very fast with character tables in cache. While simple tabulation is not even
4-independent, it does provide many of the guarantees that are normally
obtained via higher independence, e.g., linear probing and Cuckoo hashing.
Next we consider twisted tabulation where one input character is "twisted" in
a simple way. The resulting hash function has powerful distributional
properties: Chernoff-Hoeffding type tail bounds and a very small bias for
min-wise hashing. This also yields an extremely fast pseudo-random number
generator that is provably good for many classic randomized algorithms and
data-structures.
Finally, we consider double tabulation where we compose two simple tabulation
functions, applying one to the output of the other, and show that this yields
very high independence in the classic framework of Carter and Wegman [1977]. In
fact, w.h.p., for a given set of size proportional to that of the space
consumed, double tabulation gives fully-random hashing. We also mention some
more elaborate tabulation schemes getting near-optimal independence for given
time and space.
While these tabulation schemes are all easy to implement and use, their
analysis is not
Answering Spatial Multiple-Set Intersection Queries Using 2-3 Cuckoo Hash-Filters
We show how to answer spatial multiple-set intersection queries in O(n(log
w)/w + kt) expected time, where n is the total size of the t sets involved in
the query, w is the number of bits in a memory word, k is the output size, and
c is any fixed constant. This improves the asymptotic performance over previous
solutions and is based on an interesting data structure, known as 2-3 cuckoo
hash-filters. Our results apply in the word-RAM model (or practical RAM model),
which allows for constant-time bit-parallel operations, such as bitwise AND,
OR, NOT, and MSB (most-significant 1-bit), as exist in modern CPUs and GPUs.
Our solutions apply to any multiple-set intersection queries in spatial data
sets that can be reduced to one-dimensional range queries, such as spatial join
queries for one-dimensional points or sets of points stored along space-filling
curves, which are used in GIS applications.Comment: Full version of paper from 2017 ACM SIGSPATIAL International
Conference on Advances in Geographic Information System
- …