13 research outputs found
On the Insertion Time of Cuckoo Hashing
Cuckoo hashing is an efficient technique for creating large hash tables with
high space utilization and guaranteed constant access times. There, each item
can be placed in a location given by any one out of k different hash functions.
In this paper we investigate further the random walk heuristic for inserting in
an online fashion new items into the hash table. Provided that k > 2 and that
the number of items in the table is below (but arbitrarily close) to the
theoretically achievable load threshold, we show a polylogarithmic bound for
the maximum insertion time that holds with high probability.Comment: 27 pages, final version accepted by the SIAM Journal on Computin
A further analysis of Cuckoo Hashing with a Stash and Random Graphs of Excess r
Analysis of Algorithm
A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing
ABSTRACT Hashing is a solved problem. It allows us to get constant time access for lookups. Hashing is also simple. It is safe to use an arbitrary method as a black box and expect good performance, and optimizations to hashing can only improve it by a negligible delta. Why are all of the previous statements plain wrong? That is what this paper is about. In this paper we thoroughly study hashing for integer keys and carefully analyze the most common hashing methods in a five-dimensional requirements space: () data-distribution, () load factor, () dataset size, () read/write-ratio, and () un/successfulratio. Each point in that design space may potentially suggest a different hashing scheme, and additionally also a different hash function. We show that a right or wrong decision in picking the right hashing scheme and hash function combination may lead to significant difference in performance. To substantiate this claim, we carefully analyze two additional dimensions: () five representative hashing schemes (which includes an improved variant of Robin Hood hashing), () four important classes of hash functions widely used today. That is, we consider 20 different combinations in total. Finally, we also provide a glimpse about the effect of table memory layout and the use of SIMD instructions. Our study clearly indicates that picking the right combination may have considerable impact on insert and lookup performance, as well as memory footprint. A major conclusion of our work is that hashing should be considered a white box before blindly using it in applications, such as query processing. Finally, we also provide a strong guideline about when to use which hashing method
On the Insertion Time of Cuckoo Hashing
Cuckoo hashing is an efficient technique for creating large hash tables with high space utilization and guaranteed constant access times. There, each item can be placed in a location given by any one out of k different hash functions. In this paper we investigate further the random walk heuristic for inserting in an online fashion new items into the hash table. Provided that k > 2 and that the number of items in the table is below (but arbitrarily close) to the theoretically achievable load threshold, we show a polylogarithmic bound for the maximum insertion time that holds with high probability
On Risks of Using a High Performance Hashing Scheme with Common Universal Classes
The contribution of this thesis is a mathematical analysis a high
performance hashing scheme called cuckoo hashing when combined with two very simple and efficient classes of functions that we refer to as the multiplicative class and the linear class, respectively. We prove that cuckoo hashing tends to work badly with these classes. In order to show this, we investigate how the inner structure of such functions influences the behavior of the cuckoo scheme when a set of keys is inserted into initially empty tables. Cuckoo Hashing uses two tables of size each. It is known that the insertion of an arbitrary set of size for an arbitrary constant (which yields a \emph{load factor} of up to ) fails with probability if the hash functions are chosen from an -wise independent class. This leads to the result of expected amortized constant time for a single insertion. In contrast to this we prove lower bounds of the following kind: If is a uniformly random chosen set of size (leading to a load factor of only (!)) then the insertion of fails with probability , or even with probability , if the hash functions are either hosen from the multiplicative or the linear class. This answers an open question that was already raised by the inventors of cuckoo hashing, Pagh and Rodler, who observed in experiments that cuckoo hashing exhibits a bad behavior when combined with the multiplicative class. Our results implicitly show that the quality of pairwise independence is not sufficient for a hash class to work well with cuckoo hashing. Moreover, our work exemplifies that a recent result of Mitzenmacher and Vadhan, who prove that under certain constraints simple universal functions yield values that are highly independent and very lose to uniform random, has to be applied with care: It may not hold if the constraints are not satisfied.Der Beitrag dieser Dissertation ist die mathematische Analyse eines
Hashing-Verfahrens namens Cuckoo Hashing in Kombination mit einfachen,
effizient aus\-wertbaren Funktionen zweier Hashklassen, die wir die
multiplikative Klasse bzw. die lineare Klasse nennen. Cuckoo Hashing hat
die deutliche Tendenz, mit Funktionen dieser beiden Klassen schlecht zu
funktionieren. Um dies zu beweisen, untersuchen wir den Einfluss der
inneren Struktur solcher Funktionen auf das Verhalten des
Cuckoo-Verfahrens, wenn der Versuch unternommen wird, eine Schlüsselmenge
in anfangs leere Tabellen einzufügen. Cuckoo Hashing verwendet
zwei Tabellen der jeweiligen Größe . Man weiß, dass die
Einfügung einer beliebigen Menge der Größe für eine
beliebige Konstante (was einen Lastfaktor
von bis zu liefert) mit Wahrscheinlichkeit
fehlschlägt, falls die Hashfunktionen aus einer -fach
unabhängigen Klasse gewählt werden. Damit läßt sich beweisen, dass die
erwarteten amortisierten Kosten einer einzelnen Einfügung konstant
sind. Demgegenüber beweisen wir untere Schranken der folgenden Art:
Wenn eine uniform zufällig gewählte Schlüsselmenge der Größe ist
(Lastfaktor (!)), dann schlägt die Einfügung von mit
großer Wahrscheinlichkeit , oder sogar , fehl, falls
Funktionen der multiplikativen bzw. der linearen Klasse gewählt
werden. Damit beantworten wir eine Frage, die bereits von Pagh und
Rodler aufgeworfen wurde, als sie mit Hilfe von Experimenten
feststellten, dass es riskant ist, Cuckoo Hashing mit Funktionen der
multiplikativen Klasse zu kombinieren. Unsere Resultate zeigen, dass
paarweise Unabhängigkeit und uniforme Vertei\-lung der Hashwerte
keine Garantie dafür ist, dass Cuckoo Hashing gut funktioniert. Zudem
machen unsere Ergebnisse exemplarisch deutlich, dass bei der
Anwendung eines kürzlich veröffentlichten Resultates von Mitzenmacher und
Vadhan, welches besagt, dass selbst Funktionen aus einfachen, schwach
universellen Klassen unter gewissen Bedingungen zu hochgradig
unabhängigen und fast uniform zufälligen Werten führen können,
Vorsicht geboten ist: Falls dessen Bedingungen nicht erfüllt sind,
gilt es unter Umständen nicht
DORAM revisited: Maliciously secure RAM-MPC with logarithmic overhead
Distributed Oblivious Random Access Memory (DORAM) is a secure multiparty protocol that allows a group of participants holding a secret-shared array to read and write to secret-shared locations within the array. The efficiency of a DORAM protocol is measured by the amount of communication and computation required per read/write query into the array. DORAM protocols are a necessary ingredient for executing Secure Multiparty Computation (MPC) in the RAM model.
Although DORAM has been widely studied, all existing DORAM protocols have focused on the setting where the DORAM servers are semi-honest. Generic techniques for upgrading a semi-honest DORAM protocol to the malicious model typically increase the asymptotic communication complexity of the DORAM scheme.
In this work, we present a 3-party DORAM protocol which requires communication and computation per query, for a database of size with -bit values, where is the security parameter. Our hidden constants in a big-O nation are small. We show that our protocol is UC-secure in the presence of a malicious, static adversary. This matches the communication and computation complexity of the best semi-honest DORAM protocol, and is the first malicious DORAM protocol with this complexity