13 research outputs found

    On the Insertion Time of Cuckoo Hashing

    Full text link
    Cuckoo hashing is an efficient technique for creating large hash tables with high space utilization and guaranteed constant access times. There, each item can be placed in a location given by any one out of k different hash functions. In this paper we investigate further the random walk heuristic for inserting in an online fashion new items into the hash table. Provided that k > 2 and that the number of items in the table is below (but arbitrarily close) to the theoretically achievable load threshold, we show a polylogarithmic bound for the maximum insertion time that holds with high probability.Comment: 27 pages, final version accepted by the SIAM Journal on Computin

    A further analysis of Cuckoo Hashing with a Stash and Random Graphs of Excess r

    Get PDF
    Analysis of Algorithm

    A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing

    Get PDF
    ABSTRACT Hashing is a solved problem. It allows us to get constant time access for lookups. Hashing is also simple. It is safe to use an arbitrary method as a black box and expect good performance, and optimizations to hashing can only improve it by a negligible delta. Why are all of the previous statements plain wrong? That is what this paper is about. In this paper we thoroughly study hashing for integer keys and carefully analyze the most common hashing methods in a five-dimensional requirements space: () data-distribution, () load factor, () dataset size, () read/write-ratio, and () un/successfulratio. Each point in that design space may potentially suggest a different hashing scheme, and additionally also a different hash function. We show that a right or wrong decision in picking the right hashing scheme and hash function combination may lead to significant difference in performance. To substantiate this claim, we carefully analyze two additional dimensions: () five representative hashing schemes (which includes an improved variant of Robin Hood hashing), () four important classes of hash functions widely used today. That is, we consider 20 different combinations in total. Finally, we also provide a glimpse about the effect of table memory layout and the use of SIMD instructions. Our study clearly indicates that picking the right combination may have considerable impact on insert and lookup performance, as well as memory footprint. A major conclusion of our work is that hashing should be considered a white box before blindly using it in applications, such as query processing. Finally, we also provide a strong guideline about when to use which hashing method

    On the Insertion Time of Cuckoo Hashing

    No full text
    Cuckoo hashing is an efficient technique for creating large hash tables with high space utilization and guaranteed constant access times. There, each item can be placed in a location given by any one out of k different hash functions. In this paper we investigate further the random walk heuristic for inserting in an online fashion new items into the hash table. Provided that k > 2 and that the number of items in the table is below (but arbitrarily close) to the theoretically achievable load threshold, we show a polylogarithmic bound for the maximum insertion time that holds with high probability

    On Risks of Using a High Performance Hashing Scheme with Common Universal Classes

    Get PDF
    The contribution of this thesis is a mathematical analysis a high performance hashing scheme called cuckoo hashing when combined with two very simple and efficient classes of functions that we refer to as the multiplicative class and the linear class, respectively. We prove that cuckoo hashing tends to work badly with these classes. In order to show this, we investigate how the inner structure of such functions influences the behavior of the cuckoo scheme when a set SS of keys is inserted into initially empty tables. Cuckoo Hashing uses two tables of size mm each. It is known that the insertion of an arbitrary set SS of size n=(1δ)mn = (1 - \delta)m for an arbitrary constant δ(0,1)\delta \in (0,1) (which yields a \emph{load factor} n/(2m)n/(2m) of up to 1/21/2) fails with probability O(1/n)O(1/n) if the hash functions are chosen from an Ω(logn)\Omega(\log n)-wise independent class. This leads to the result of expected amortized constant time for a single insertion. In contrast to this we prove lower bounds of the following kind: If SS is a uniformly random chosen set of size n=m/2n = m/2 (leading to a load factor of only 1/41/4 (!)) then the insertion of SS fails with probability Ω(1)\Omega(1), or even with probability 1o(1)1 - o(1), if the hash functions are either hosen from the multiplicative or the linear class. This answers an open question that was already raised by the inventors of cuckoo hashing, Pagh and Rodler, who observed in experiments that cuckoo hashing exhibits a bad behavior when combined with the multiplicative class. Our results implicitly show that the quality of pairwise independence is not sufficient for a hash class to work well with cuckoo hashing. Moreover, our work exemplifies that a recent result of Mitzenmacher and Vadhan, who prove that under certain constraints simple universal functions yield values that are highly independent and very lose to uniform random, has to be applied with care: It may not hold if the constraints are not satisfied.Der Beitrag dieser Dissertation ist die mathematische Analyse eines Hashing-Verfahrens namens Cuckoo Hashing in Kombination mit einfachen, effizient aus\-wertbaren Funktionen zweier Hashklassen, die wir die multiplikative Klasse bzw. die lineare Klasse nennen. Cuckoo Hashing hat die deutliche Tendenz, mit Funktionen dieser beiden Klassen schlecht zu funktionieren. Um dies zu beweisen, untersuchen wir den Einfluss der inneren Struktur solcher Funktionen auf das Verhalten des Cuckoo-Verfahrens, wenn der Versuch unternommen wird, eine Schlüsselmenge SS in anfangs leere Tabellen einzufügen. Cuckoo Hashing verwendet zwei Tabellen der jeweiligen Größe mm. Man weiß, dass die Einfügung einer beliebigen Menge SS der Größe n=(1δ)mn = (1 - \delta)m für eine beliebige Konstante δ(0,1)\delta \in (0,1) (was einen Lastfaktor n/(2m)n/(2m) von bis zu 1/21/2 liefert) mit Wahrscheinlichkeit O(1/n)O(1/n) fehlschlägt, falls die Hashfunktionen aus einer Ω(logn)\Omega(\log n)-fach unabhängigen Klasse gewählt werden. Damit läßt sich beweisen, dass die erwarteten amortisierten Kosten einer einzelnen Einfügung konstant sind. Demgegenüber beweisen wir untere Schranken der folgenden Art: Wenn SS eine uniform zufällig gewählte Schlüsselmenge der Größe m/2m/2 ist (Lastfaktor 1/41/4 (!)), dann schlägt die Einfügung von SS mit großer Wahrscheinlichkeit Ω(1)\Omega(1), oder sogar 1o(1)1 - o(1), fehl, falls Funktionen der multiplikativen bzw. der linearen Klasse gewählt werden. Damit beantworten wir eine Frage, die bereits von Pagh und Rodler aufgeworfen wurde, als sie mit Hilfe von Experimenten feststellten, dass es riskant ist, Cuckoo Hashing mit Funktionen der multiplikativen Klasse zu kombinieren. Unsere Resultate zeigen, dass paarweise Unabhängigkeit und uniforme Vertei\-lung der Hashwerte keine Garantie dafür ist, dass Cuckoo Hashing gut funktioniert. Zudem machen unsere Ergebnisse exemplarisch deutlich, dass bei der Anwendung eines kürzlich veröffentlichten Resultates von Mitzenmacher und Vadhan, welches besagt, dass selbst Funktionen aus einfachen, schwach universellen Klassen unter gewissen Bedingungen zu hochgradig unabhängigen und fast uniform zufälligen Werten führen können, Vorsicht geboten ist: Falls dessen Bedingungen nicht erfüllt sind, gilt es unter Umständen nicht

    DORAM revisited: Maliciously secure RAM-MPC with logarithmic overhead

    Get PDF
    Distributed Oblivious Random Access Memory (DORAM) is a secure multiparty protocol that allows a group of participants holding a secret-shared array to read and write to secret-shared locations within the array. The efficiency of a DORAM protocol is measured by the amount of communication and computation required per read/write query into the array. DORAM protocols are a necessary ingredient for executing Secure Multiparty Computation (MPC) in the RAM model. Although DORAM has been widely studied, all existing DORAM protocols have focused on the setting where the DORAM servers are semi-honest. Generic techniques for upgrading a semi-honest DORAM protocol to the malicious model typically increase the asymptotic communication complexity of the DORAM scheme. In this work, we present a 3-party DORAM protocol which requires O((κ+D)logN)O((\kappa + D)\log N) communication and computation per query, for a database of size NN with DD-bit values, where κ\kappa is the security parameter. Our hidden constants in a big-O nation are small. We show that our protocol is UC-secure in the presence of a malicious, static adversary. This matches the communication and computation complexity of the best semi-honest DORAM protocol, and is the first malicious DORAM protocol with this complexity
    corecore