Search CORE

744 research outputs found

More Analysis of Double Hashing for Balanced Allocations

Author: Mitzenmacher Michael
Publication venue
Publication date: 02/03/2015
Field of study

With double hashing, for a key

x

, one generates two hash values

f(x)

and

g(x)

, and then uses combinations

(f(x) +i g(x)) \bmod n

for

i=0,1,2,...

to generate multiple hash values in the range

[0,n-1]

from the initial two. For balanced allocations, keys are hashed into a hash table where each bucket can hold multiple keys, and each key is placed in the least loaded of

d

choices. It has been shown previously that asymptotically the performance of double hashing and fully random hashing is the same in the balanced allocation paradigm using fluid limit methods. Here we extend a coupling argument used by Lueker and Molodowitch to show that double hashing and ideal uniform hashing are asymptotically equivalent in the setting of open address hash tables to the balanced allocation setting, providing further insight into this phenomenon. We also discuss the potential for and bottlenecks limiting the use this approach for other multiple choice hashing schemes.Comment: 13 pages ; current draft ; will be submitted to conference shortl

arXiv.org e-Print Archive

Crossref

Balanced Allocations and Double Hashing

Author: Alon N.
Dillinger P.C.
Heileman G.
Mitzenmacher M.
Mitzenmacher M.
Richa A.
Vadhan S.
Vvedenskaya N.D.
Publication venue
Publication date: 29/01/2014
Field of study

Double hashing has recently found more common usage in schemes that use multiple hash functions. In double hashing, for an item

x

, one generates two hash values

f(x)

and

g(x)

, and then uses combinations

(f(x) +k g(x)) \bmod n

for

k=0,1,2,...

to generate multiple hash values from the initial two. We first perform an empirical study showing that, surprisingly, the performance difference between double hashing and fully random hashing appears negligible in the standard balanced allocation paradigm, where each item is placed in the least loaded of

d

choices, as well as several related variants. We then provide theoretical results that explain the behavior of double hashing in this context.Comment: Further updated, small improvements/typos fixe

arXiv.org e-Print Archive

CiteSeerX

Crossref

Notes on Cloud computing principles

Author: Lee Dongman
Sandholm Thomas
Publication venue
Publication date: 01/01/2014
Field of study

This letter provides a review of fundamental distributed systems and economic Cloud computing principles. These principles are frequently deployed in their respective fields, but their inter-dependencies are often neglected. Given that Cloud Computing first and foremost is a new business model, a new model to sell computational resources, the understanding of these concepts is facilitated by treating them in unison. Here, we review some of the most important concepts and how they relate to each other

arXiv.org e-Print Archive

Springer - Publisher Connector

Load thresholds for cuckoo hashing with double hashing

Author: Mitzenmacher Michael
Panagiotou Konstantinos
Walzer Stefan
Publication venue
Publication date: 01/01/2018
Field of study

In k-ary cuckoo hashing, each of cn objects is associated with k random buckets in a hash table of size n. An l-orientation is an assignment of objects to associated buckets such that each bucket receives at most l objects. Several works have determined load thresholds c^* = c^*(k,l) for k-ary cuckoo hashing; that is, for c c^* no l-orientation exists with high probability. A natural variant of k-ary cuckoo hashing utilizes double hashing, where, when the buckets are numbered 0,1,...,n-1, the k choices of random buckets form an arithmetic progression modulo n. Double hashing simplifies implementation and requires less randomness, and it has been shown that double hashing has the same behavior as fully random hashing in several other data structures that similarly use multiple hashes for each object. Interestingly, previous work has come close to but has not fully shown that the load threshold for k-ary cuckoo hashing is the same when using double hashing as when using fully random hashing. Specifically, previous work has shown that the thresholds for both settings coincide, except that for double hashing it was possible that o(n) objects would have been left unplaced. Here we close this open question by showing the thresholds are indeed the same, by providing a combinatorial argument that reconciles this stubborn difference

Dagstuhl Research Online Publication Server

Digitale Bibliothek Thüringen

Fast and Powerful Hashing using Tabulation

Author: Thorup Mikkel
Publication venue
Publication date: 01/01/2016
Field of study

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of

c

characters and we have precomputed character tables

h_1,...,h_c

mapping characters to random hash values. A key

x=(x_1,...,x_c)

is hashed to

h_1[x_1] \oplus h_2[x_2].....\oplus h_c[x_c]

. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. Next we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing. This also yields an extremely fast pseudo-random number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully-random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. While these tabulation schemes are all easy to implement and use, their analysis is not

arXiv.org e-Print Archive

Copenhagen University Research Information System

Dagstuhl Research Online Publication Server

Load thresholds for cuckoo hashing with overlapping blocks

Author: Walzer Stefan
Publication venue
Publication date: 01/01/2018
Field of study

Dietzfelbinger and Weidling [DW07] proposed a natural variation of cuckoo hashing where each of

cn

objects is assigned

k = 2

intervals of size

\ell

in a linear (or cyclic) hash table of size

n

and both start points are chosen independently and uniformly at random. Each object must be placed into a table cell within its intervals, but each cell can only hold one object. Experiments suggested that this scheme outperforms the variant with blocks in which intervals are aligned at multiples of

\ell

. In particular, the load threshold is higher, i.e. the load

c

that can be achieved with high probability. For instance, Lehman and Panigrahy [LP09] empirically observed the threshold for

\ell = 2

to be around

96.5\%

as compared to roughly

89.7\%

using blocks. They managed to pin down the asymptotics of the thresholds for large

\ell

, but the precise values resisted rigorous analysis. We establish a method to determine these load thresholds for all

\ell \geq 2

, and, in fact, for general

k \geq 2

. For instance, for

k = \ell = 2

we get

\approx 96.4995\%

. The key tool we employ is an insightful and general theorem due to Leconte, Lelarge, and Massouli\'e [LLM13], which adapts methods from statistical physics to the world of hypergraph orientability. In effect, the orientability thresholds for our graph families are determined by belief propagation equations for certain graph limits. As a side note we provide experimental evidence suggesting that placements can be constructed in linear time with loads close to the threshold using an adapted version of an algorithm by Khosla [Kho13]

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Digitale Bibliothek Thüringen

Boosting Multi-Core Reachability Performance with Shared Hash Tables

Author: Laarman Alfons
van de Pol Jaco
Weber Michael
Publication venue
Publication date: 01/01/2010
Field of study

This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.Comment: preliminary repor

arXiv.org e-Print Archive

CiteSeerX

University of Twente Research Information

Power of $d$ Choices with Simple Tabulation

Author: Aamand Anders
Knudsen Mathias Bæk Tejs
Thorup Mikkel
Publication venue
Publication date: 25/04/2018
Field of study

Suppose that we are to place

m

balls into

n

bins sequentially using the

d

-choice paradigm: For each ball we are given a choice of

d

bins, according to

d

hash functions

h_1,\dots,h_d

and we place the ball in the least loaded of these bins breaking ties arbitrarily. Our interest is in the number of balls in the fullest bin after all

m

balls have been placed. Azar et al. [STOC'94] proved that when

m=O(n)

and when the hash functions are fully random the maximum load is at most

\frac{\lg \lg n }{\lg d}+O(1)

whp (i.e. with probability

1-O(n^{-\gamma})

for any choice of

\gamma

). In this paper we suppose that the

h_1,\dots,h_d

are simple tabulation hash functions. Generalising a result by Dahlgaard et al [SODA'16] we show that for an arbitrary constant

d\geq 2

the maximum load is

O(\lg \lg n)

whp, and that expected maximum load is at most

\frac{\lg \lg n}{\lg d}+O(1)

. We further show that by using a simple tie-breaking algorithm introduced by V\"ocking [J.ACM'03] the expected maximum load drops to

\frac{\lg \lg n}{d\lg \varphi_d}+O(1)

where

\varphi_d

is the rate of growth of the

d

-ary Fibonacci numbers. Both of these expected bounds match those of the fully random setting. The analysis by Dahlgaard et al. relies on a proof by P\u{a}tra\c{s}cu and Thorup [J.ACM'11] concerning the use of simple tabulation for cuckoo hashing. We need here a generalisation to

d>2

hash functions, but the original proof is an 8-page tour de force of ad-hoc arguments that do not appear to generalise. Our main technical contribution is a shorter, simpler and more accessible proof of the result by P\u{a}tra\c{s}cu and Thorup, where the relevant parts generalise nicely to the analysis of

d

choices.Comment: Accepted at ICALP 201

arXiv.org e-Print Archive

Copenhagen University Research Information System