Search CORE

25,700 research outputs found

Fast and Powerful Hashing using Tabulation

Author: Thorup Mikkel
Publication venue
Publication date: 01/01/2016
Field of study

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of

c

characters and we have precomputed character tables

h_1,...,h_c

mapping characters to random hash values. A key

x=(x_1,...,x_c)

is hashed to

h_1[x_1] \oplus h_2[x_2].....\oplus h_c[x_c]

. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. Next we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing. This also yields an extremely fast pseudo-random number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully-random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. While these tabulation schemes are all easy to implement and use, their analysis is not

arXiv.org e-Print Archive

Copenhagen University Research Information System

Dagstuhl Research Online Publication Server

Simple Tabulation, Fast Expanders, Double Tabulation, and High Independence

Author: Thorup Mikkel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Simple tabulation dates back to Zobrist in 1970. Keys are viewed as c characters from some alphabet A. We initialize c tables h_0, ..., h_{c-1} mapping characters to random hash values. A key x=(x_0, ..., x_{c-1}) is hashed to h_0[x_0] xor...xor h_{c-1}[x_{c-1}]. The scheme is extremely fast when the character hash tables h_i are in cache. Simple tabulation hashing is not 4-independent, but we show that if we apply it twice, then we get high independence. First we hash to intermediate keys that are 6 times longer than the original keys, and then we hash the intermediate keys to the final hash values. The intermediate keys have d=6c characters from A. We can view the hash function as a degree d bipartite graph with keys on one side, each with edges to d output characters. We show that this graph has nice expansion properties, and from that we get that with another level of simple tabulation on the intermediate keys, the composition is a highly independent hash function. The independence we get is |A|^{Omega(1/c)}. Our space is O(c|A|) and the hash function is evaluated in O(c) time. Siegel [FOCS'89, SICOMP'04] proved that with this space, if the hash function is evaluated in o(c) time, then the independence can only be o(c), so our evaluation time is best possible for Omega(c) independence---our independence is much higher if c=|A|^{o(1)}. Siegel used O(c)^c evaluation time to get the same independence with similar space. Siegel's main focus was c=O(1), but we are exponentially faster when c=omega(1). Applying our scheme recursively, we can increase our independence to |A|^{Omega(1)} with o(c^{log c}) evaluation time. Compared with Siegel's scheme this is both faster and higher independence. Our scheme is easy to implement, and it does provide realistic implementations of 100-independent hashing for, say, 32 and 64-bit keys

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

A fifth grade sociometric study leaders and non-leaders.

Author: Sweeney Helen Elizabeth
Publication venue: Boston University
Publication date: 01/01/1953
Field of study

Thesis (Ed.M.)--Boston Universit

Boston University Institutional Repository (OpenBU)

Tabulation of cubic function fields via polynomial binary cubic forms

Author: Jacobson Jr. Michael
Rozenhart Pieter
Scheidler Renate
Publication venue
Publication date: 29/09/2010
Field of study

We present a method for tabulating all cubic function fields over

\mathbb{F}_q(t)

whose discriminant

D

has either odd degree or even degree and the leading coefficient of

-3D

is a non-square in

\mathbb{F}_{q}^*

, up to a given bound

B

on the degree of

D

. Our method is based on a generalization of Belabas' method for tabulating cubic number fields. The main theoretical ingredient is a generalization of a theorem of Davenport and Heilbronn to cubic function fields, along with a reduction theory for binary cubic forms that provides an efficient way to compute equivalence classes of binary cubic forms. The algorithm requires

O(B^4 q^B)

field operations as

B \rightarrow \infty

. The algorithm, examples and numerical data for

q=5,7,11,13

are included.Comment: 30 pages, minor typos corrected, extra table entries added, revamped complexity analysis of the algorithm. To appear in Mathematics of Computatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Oskar Bordeaux

Fast hashing with Strong Concentration Bounds

Author: Aamand Anders
Bernstein Sergei Natanovich
Celis L. Elisa
Dahlgaard Søren
Dumey A. I.
Meka Raghu
Mitzenmacher Michael
şcu Mihai P
şcu Mihai P
şcu Mihai P
Publication venue
Publication date: 01/01/2020
Field of study

Previous work on tabulation hashing by Patrascu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a given bin, but under some quite severe restrictions on the expected values of these sums. The basic idea in tabulation hashing is to view a key as consisting of

c=O(1)

characters, e.g., a 64-bit key as

c=8

characters of 8-bits. The character domain

\Sigma

should be small enough that character tables of size

|\Sigma|

fit in fast cache. The schemes then use

O(1)

tables of this size, so the space of tabulation hashing is

O(|\Sigma|)

. However, the concentration bounds by Patrascu and Thorup only apply if the expected sums are

\ll |\Sigma|

. To see the problem, consider the very simple case where we use tabulation hashing to throw

n

balls into

m

bins and want to analyse the number of balls in a given bin. With their concentration bounds, we are fine if

n=m

, for then the expected value is

1

. However, if

m=2

, as when tossing

n

unbiased coins, the expected value

n/2

\gg |\Sigma|

for large data sets, e.g., data sets that do not fit in fast cache. To handle expectations that go beyond the limits of our small space, we need a much more advanced analysis of simple tabulation, plus a new tabulation technique that we call \emph{tabulation-permutation} hashing which is at most twice as slow as simple tabulation. No other hashing scheme of comparable speed offers similar Chernoff-style concentration bounds.Comment: 54 pages, 3 figures. An extended abstract appeared at the 52nd Annual ACM Symposium on Theory of Computing (STOC20

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Communication Efficient Checking of Big Data Operations

Author: Hübschle-Schneider Lorenz
Sanders Peter
Publication venue
Publication date: 01/01/2018
Field of study

We propose fast probabilistic algorithms with low (i.e., sublinear in the input size) communication volume to check the correctness of operations in Big Data processing frameworks and distributed databases. Our checkers cover many of the commonly used operations, including sum, average, median, and minimum aggregation, as well as sorting, union, merge, and zip. An experimental evaluation of our implementation in Thrill (Bingmann et al., 2016) confirms the low overhead and high failure detection rate predicted by theoretical analysis

arXiv.org e-Print Archive

Crossref

KITopen