Search CORE

69 research outputs found

Tight Thresholds for Cuckoo Hashing via XORSAT

Author: Dietzfelbinger Martin
Goerdt Andreas
Mitzenmacher Michael
Montanari Andrea
Pagh Rasmus
Rink Michael
Publication venue
Publication date: 01/01/2010
Field of study

We settle the question of tight thresholds for offline cuckoo hashing. The problem can be stated as follows: we have n keys to be hashed into m buckets each capable of holding a single key. Each key has k >= 3 (distinct) associated buckets chosen uniformly at random and independently of the choices of other keys. A hash table can be constructed successfully if each key can be placed into one of its buckets. We seek thresholds alpha_k such that, as n goes to infinity, if n/m <= alpha for some alpha < alpha_k then a hash table can be constructed successfully with high probability, and if n/m >= alpha for some alpha > alpha_k a hash table cannot be constructed successfully with high probability. Here we are considering the offline version of the problem, where all keys and hash values are given, so the problem is equivalent to previous models of multiple-choice hashing. We find the thresholds for all values of k > 2 by showing that they are in fact the same as the previously known thresholds for the random k-XORSAT problem. We then extend these results to the setting where keys can have differing number of choices, and provide evidence in the form of an algorithm for a conjecture extending this result to cuckoo hash tables that store multiple keys in a bucket.Comment: Revision 3 contains missing details of proofs, as appendix

arXiv.org e-Print Archive

CiteSeerX

The IT University of Copenhagen's Repository

The Multiple-orientability Thresholds for Random Hypergraphs

Author: Fountoulakis Nikolaos
Khosla Megha
Panagiotou Konstantinos
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 26/09/2013
Field of study

k

-uniform hypergraph

H = (V, E)

is called

\ell

-orientable, if there is an assignment of each edge

e\in E

to one of its vertices

v\in e

such that no vertex is assigned more than

\ell

edges. Let

H_{n,m,k}

be a hypergraph, drawn uniformly at random from the set of all

k

-uniform hypergraphs with

n

vertices and

m

edges. In this paper we establish the threshold for the

\ell

-orientability of

H_{n,m,k}

for all

k\ge 3

and

\ell \ge 2

, i.e., we determine a critical quantity

c_{k, \ell}^*

such that with probability

1-o(1)

the graph

H_{n,cn,k}

has an

\ell

-orientation if

c c_{k, \ell}^*

. Our result has various applications including sharp load thresholds for cuckoo hashing, load balancing with guaranteed maximum load, and massive parallel access to hard disk arrays.Comment: An extended abstract appeared in the proceedings of SODA 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

On the insertion time of random walk cuckoo hashing

Author: Frieze Alan
Johansson Tony
Publication venue
Publication date: 08/01/2017
Field of study

Cuckoo Hashing is a hashing scheme invented by Pagh and Rodler. It uses

d\geq 2

distinct hash functions to insert items into the hash table. It has been an open question for some time as to the expected time for Random Walk Insertion to add items. We show that if the number of hash functions

d=O(1)

is sufficiently large, then the expected insertion time is

O(1)

per item.Comment: 9 page

arXiv.org e-Print Archive

Crossref

On the Insertion Time of Cuckoo Hashing

Author: Fountoulakis Nikolaos
Panagiotou Konstantinos
Steger Angelika
Publication venue
Publication date: 01/01/2013
Field of study

Cuckoo hashing is an efficient technique for creating large hash tables with high space utilization and guaranteed constant access times. There, each item can be placed in a location given by any one out of k different hash functions. In this paper we investigate further the random walk heuristic for inserting in an online fashion new items into the hash table. Provided that k > 2 and that the number of items in the table is below (but arbitrarily close) to the theoretically achievable load threshold, we show a polylogarithmic bound for the maximum insertion time that holds with high probability.Comment: 27 pages, final version accepted by the SIAM Journal on Computin

arXiv.org e-Print Archive

University of Birmingham Research Portal

MPG.PuRe

Load thresholds for cuckoo hashing with overlapping blocks

Author: Walzer Stefan
Publication venue
Publication date: 01/01/2018
Field of study

Dietzfelbinger and Weidling [DW07] proposed a natural variation of cuckoo hashing where each of

cn

objects is assigned

k = 2

intervals of size

\ell

in a linear (or cyclic) hash table of size

n

and both start points are chosen independently and uniformly at random. Each object must be placed into a table cell within its intervals, but each cell can only hold one object. Experiments suggested that this scheme outperforms the variant with blocks in which intervals are aligned at multiples of

\ell

. In particular, the load threshold is higher, i.e. the load

c

that can be achieved with high probability. For instance, Lehman and Panigrahy [LP09] empirically observed the threshold for

\ell = 2

to be around

96.5\%

as compared to roughly

89.7\%

using blocks. They managed to pin down the asymptotics of the thresholds for large

\ell

, but the precise values resisted rigorous analysis. We establish a method to determine these load thresholds for all

\ell \geq 2

, and, in fact, for general

k \geq 2

. For instance, for

k = \ell = 2

we get

\approx 96.4995\%

. The key tool we employ is an insightful and general theorem due to Leconte, Lelarge, and Massouli\'e [LLM13], which adapts methods from statistical physics to the world of hypergraph orientability. In effect, the orientability thresholds for our graph families are determined by belief propagation equations for certain graph limits. As a side note we provide experimental evidence suggesting that placements can be constructed in linear time with loads close to the threshold using an adapted version of an algorithm by Khosla [Kho13]

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Digitale Bibliothek Thüringen

Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold

Author: Walzer Stefan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

Random hypergraphs for hashing-based data structures

Author: Walzer Stefan
Publication venue
Publication date: 01/01/2020
Field of study

This thesis concerns dictionaries and related data structures that rely on providing several random possibilities for storing each key. Imagine information on a set S of m = |S| keys should be stored in n memory locations, indexed by [n] = {1,…,n}. Each object x [ELEMENT OF] S is assigned a small set e(x) [SUBSET OF OR EQUAL TO] [n] of locations by a random hash function, independent of other objects. Information on x must then be stored in the locations from e(x) only. It is possible that too many objects compete for the same locations, in particular if the load c = m/n is high. Successfully storing all information may then be impossible. For most distributions of e(x), however, success or failure can be predicted very reliably, since the success probability is close to 1 for loads c less than a certain load threshold c^* and close to 0 for loads greater than this load threshold. We mainly consider two types of data structures: • A cuckoo hash table is a dictionary data structure where each key x [ELEMENT OF] S is stored together with an associated value f(x) in one of the memory locations with an index from e(x). The distribution of e(x) is controlled by the hashing scheme. We analyse three known hashing schemes, and determine their exact load thresholds. The schemes are unaligned blocks, double hashing and a scheme for dynamically growing key sets. • A retrieval data structure also stores a value f(x) for each x [ELEMENT OF] S. This time, the values stored in the memory locations from e(x) must satisfy a linear equation that characterises the value f(x). The resulting data structure is extremely compact, but unusual. It cannot answer questions of the form “is y [ELEMENT OF] S?”. Given a key y it returns a value z. If y [ELEMENT OF] S, then z = f(y) is guaranteed, otherwise z may be an arbitrary value. We consider two new hashing schemes, where the elements of e(x) are contained in one or two contiguous blocks. This yields good access times on a word RAM and high cache efficiency. An important question is whether these types of data structures can be constructed in linear time. The success probability of a natural linear time greedy algorithm exhibits, once again, threshold behaviour with respect to the load c. We identify a hashing scheme that leads to a particularly high threshold value in this regard. In the mathematical model, the memory locations [n] correspond to vertices, and the sets e(x) for x [ELEMENT OF] S correspond to hyperedges. Three properties of the resulting hypergraphs turn out to be important: peelability, solvability and orientability. Therefore, large parts of this thesis examine how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds. Translated back into the world of data structures, we achieve low access times, high memory efficiency and low construction times. We complement and support the theoretical results by experiments.Diese Arbeit behandelt Wörterbücher und verwandte Datenstrukturen, die darauf aufbauen, mehrere zufällige Möglichkeiten zur Speicherung jedes Schlüssels vorzusehen. Man stelle sich vor, Information über eine Menge S von m = |S| Schlüsseln soll in n Speicherplätzen abgelegt werden, die durch [n] = {1,…,n} indiziert sind. Jeder Schlüssel x [ELEMENT OF] S bekommt eine kleine Menge e(x) [SUBSET OF OR EQUAL TO] [n] von Speicherplätzen durch eine zufällige Hashfunktion unabhängig von anderen Schlüsseln zugewiesen. Die Information über x darf nun ausschließlich in den Plätzen aus e(x) untergebracht werden. Es kann hierbei passieren, dass zu viele Schlüssel um dieselben Speicherplätze konkurrieren, insbesondere bei hoher Auslastung c = m/n. Eine erfolgreiche Speicherung der Gesamtinformation ist dann eventuell unmöglich. Für die meisten Verteilungen von e(x) lässt sich Erfolg oder Misserfolg allerdings sehr zuverlässig vorhersagen, da für Auslastung c unterhalb eines gewissen Auslastungsschwellwertes c* die Erfolgswahrscheinlichkeit nahezu 1 ist und für c jenseits dieses Auslastungsschwellwertes nahezu 0 ist. Hauptsächlich werden wir zwei Arten von Datenstrukturen betrachten: • Eine Kuckucks-Hashtabelle ist eine Wörterbuchdatenstruktur, bei der jeder Schlüssel x [ELEMENT OF] S zusammen mit einem assoziierten Wert f(x) in einem der Speicherplätze mit Index aus e(x) gespeichert wird. Die Verteilung von e(x) wird hierbei vom Hashing-Schema festgelegt. Wir analysieren drei bekannte Hashing-Schemata und bestimmen erstmals deren exakte Auslastungsschwellwerte im obigen Sinne. Die Schemata sind unausgerichtete Blöcke, Doppel-Hashing sowie ein Schema für dynamisch wachsenden Schlüsselmengen. • Auch eine Retrieval-Datenstruktur speichert einen Wert f(x) für alle x [ELEMENT OF] S. Diesmal sollen die Werte in den Speicherplätzen aus e(x) eine lineare Gleichung erfüllen, die den Wert f(x) charakterisiert. Die entstehende Datenstruktur ist extrem platzsparend, aber ungewöhnlich: Sie ist ungeeignet um Fragen der Form „ist y [ELEMENT OF] S?“ zu beantworten. Bei Anfrage eines Schlüssels y wird ein Ergebnis z zurückgegeben. Falls y [ELEMENT OF] S ist, so ist z = f(y) garantiert, andernfalls darf z ein beliebiger Wert sein. Wir betrachten zwei neue Hashing-Schemata, bei denen die Elemente von e(x) in einem oder in zwei zusammenhängenden Blöcken liegen. So werden gute Zugriffszeiten auf Word-RAMs und eine hohe Cache-Effizienz erzielt. Eine wichtige Frage ist, ob Datenstrukturen obiger Art in Linearzeit konstruiert werden können. Die Erfolgswahrscheinlichkeit eines naheliegenden Greedy-Algorithmus weist abermals ein Schwellwertverhalten in Bezug auf die Auslastung c auf. Wir identifizieren ein Hashing-Schema, das diesbezüglich einen besonders hohen Schwellwert mit sich bringt. In der mathematischen Modellierung werden die Speicherpositionen [n] als Knoten und die Mengen e(x) für x [ELEMENT OF] S als Hyperkanten aufgefasst. Drei Eigenschaften der entstehenden Hypergraphen stellen sich dann als zentral heraus: Schälbarkeit, Lösbarkeit und Orientierbarkeit. Weite Teile dieser Arbeit beschäftigen sich daher mit den Wahrscheinlichkeiten für das Vorliegen dieser Eigenschaften abhängig von Hashing Schema und Auslastung, sowie mit entsprechenden Schwellwerten. Eine Rückübersetzung der Ergebnisse liefert dann Datenstrukturen mit geringen Anfragezeiten, hoher Speichereffizienz und geringen Konstruktionszeiten. Die theoretischen Überlegungen werden dabei durch experimentelle Ergebnisse ergänzt und gestützt

Digitale Bibliothek Thüringen