12 research outputs found

    Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold

    Get PDF

    SicHash -- Small Irregular Cuckoo Tables for Perfect Hashing

    Get PDF
    A Perfect Hash Function (PHF) is a hash function that has no collisions on a given input set. PHFs can be used for space efficient storage of data in an array, or for determining a compact representative of each object in the set. In this paper, we present the PHF construction algorithm SicHash - Small Irregular Cuckoo Tables for Perfect Hashing. At its core, SicHash uses a known technique: It places objects in a cuckoo hash table and then stores the final hash function choice of each object in a retrieval data structure. We combine the idea with irregular cuckoo hashing, where each object has a different number of hash functions. Additionally, we use many small tables that we overload beyond their asymptotic maximum load factor. The most space efficient competitors often use brute force methods to determine the PHFs. SicHash provides a more direct construction algorithm that only rarely needs to recompute parts. Our implementation improves the state of the art in terms of space usage versus construction time for a wide range of configurations. At the same time, it provides very fast queries

    Double Hashing Thresholds via Local Weak Convergence

    Get PDF
    International audienceA lot of interest has recently arisen in the analysis of multiple-choice "cuckoo hashing" schemes. In this context, a main performance criterion is the load threshold under which the hashing scheme is able to build a valid hashtable with high probability in the limit of large systems; various techniques have successfully been used to answer this question (differential equations, combinatorics, cavity method) for increasing levels of generality of the model. However, the hashing scheme analysed so far is quite utopic in that it requires to generate a lot of independent, fully random choices. Schemes with reduced randomness exists, such as "double hashing", which is expected to provide similar asymptotic results as the ideal scheme, yet they have been more resistant to analysis so far. In this paper, we point out that the approach via the cavity method extends quite naturally to the analysis of double hashing and allows to compute the corresponding threshold. The path followed is to show that the graph induced by the double hashing scheme has the same local weak limit as the one obtained with full randomness

    PIR with compressed queries and amortized query processing

    Get PDF
    Private information retrieval (PIR) is a key building block in many privacy-preserving systems. Unfortunately, existing constructions remain very expensive. This paper introduces two techniques that make the computational variant of PIR (CPIR) more efficient in practice. The first technique targets a recent class of CPU-efficient CPIR protocols where the query sent by the client contains a number of ciphertexts proportional to the size of the database. We show how to compresses this query, achieving size reductions of up to 274X. The second technique is a new data encoding called probabilistic batch codes (PBCs). We use PBCs to build a multi-query PIR scheme that allows the server to amortize its computational cost when processing a batch of requests from the same client. This technique achieves up to 40× speedup over processing queries one at a time, and is significantly more efficient than related encodings. We apply our techniques to the Pung private communication system, which relies on a custom multi-query CPIR protocol for its privacy guarantees. By porting our techniques to Pung, we find that we can simultaneously reduce network costs by 36× and increase throughput by 3X

    Cuckoo Commitments: Registration-Based Encryption and Key-Value Map Commitments for Large Spaces

    Get PDF
    Registration-Based Encryption (RBE) [Garg et al. TCC\u2718] is a public-key encryption mechanism in which users generate their own public and secret keys, and register their public keys with a central authority called the key curator. Similarly to Identity-Based Encryption (IBE), in RBE users can encrypt by only knowing the public parameters and the public identity of the recipient. Unlike IBE, though, RBE does not suffer the key escrow problem — one of the main obstacles of IBE\u27s adoption in practice — since the key curator holds no secret. In this work, we put forward a new methodology to construct RBE schemes that support large users identities (i.e., arbitrary strings). Our main result is the first efficient pairing-based RBE for large identities. Prior to our work, the most efficient RBE is that of [Glaeser et al. ePrint\u2722] which only supports small identities. The only known RBE schemes with large identities are realized either through expensive non-black-box techniques (ciphertexts of 3.6 TB for 1000 users), or via a specialized lattice-based construction [Döttling et al. Eurocrypt\u2723] (ciphertexts of 2.4 GB), or through the more complex notion of Registered Attribute-Based Encryption [Hohenberger et al. Eurocrypt’23]. By unlocking the use of pairings for RBE with large identity space, we enable a further improvement of three orders of magnitude, as our ciphertexts for a system with 1000 users are 1.7 MB. The core technique of our approach is a novel use of cuckoo hashing in cryptography that can be of independent interest. We give two main applications. The first one is the aforementioned RBE methodology, where we use cuckoo hashing to compile an RBE with small identities into one for large identities. The second one is a way to convert any vector commitment scheme into a key-value map commitment. For instance, this leads to the first algebraic pairing-based key-value map commitments

    Cuckoo Hashing in Cryptography: Optimal Parameters, Robustness and Applications

    Get PDF
    Cuckoo hashing is a powerful primitive that enables storing items using small space with efficient querying. At a high level, cuckoo hashing maps nn items into bb entries storing at most ℓ\ell items such that each item is placed into one of kk randomly chosen entries. Additionally, there is an overflow stash that can store at most ss items. Many cryptographic primitives rely upon cuckoo hashing to privately embed and query data where it is integral to ensure small failure probability when constructing cuckoo hashing tables as it directly relates to the privacy guarantees. As our main result, we present a more query-efficient cuckoo hashing construction using more hash functions. For construction failure probability Ï”\epsilon, the query overhead of our scheme is O(1+log⁥(1/Ï”)/log⁥n)O(1 + \sqrt{\log(1/\epsilon)/\log n}). Our scheme has quadratically smaller query overhead than prior works for any target failure probability Ï”\epsilon. We also prove lower bounds matching our construction. Our improvements come from a new understanding of the locality of cuckoo hashing failures for small sets of items. We also initiate the study of robust cuckoo hashing where the input set may be chosen with knowledge of the hash functions. We present a cuckoo hashing scheme using more hash functions with query overhead O~(log⁥λ)\tilde{O}(\log \lambda) that is robust against poly(λ)(\lambda) adversaries. Furthermore, we present lower bounds showing that this construction is tight and that extending previous approaches of large stashes or entries cannot obtain robustness except with Ω(n)\Omega(n) query overhead. As applications of our results, we obtain improved constructions for batch codes and PIR. In particular, we present the most efficient explicit batch code and blackbox reduction from single-query PIR to batch PIR

    Multiple choice allocations with small maximum loads

    Get PDF
    The idea of using multiple choices to improve allocation schemes is now well understood and is often illustrated by the following example. Suppose nn balls are allocated to nn bins with each ball choosing a bin independently and uniformly at random. The \emph{maximum load}, or the number of balls in the most loaded bin, will then be approximately log⁥nlog⁥log⁥n\log n \over \log \log n with high probability. Suppose now the balls are allocated sequentially by placing a ball in the least loaded bin among the k≄2k\ge 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this scenario, the maximum load drops to log⁥log⁥nlog⁥k+Θ(1){\log \log n \over \log k} +\Theta(1), with high probability, which is an exponential improvement over the previous case. In this thesis we investigate multiple choice allocations from a slightly different perspective. Instead of minimizing the maximum load, we fix the bin capacities and focus on maximizing the number of balls that can be allocated without overloading any bin. In the process that we consider we have m=⌊cn⌋m=\lfloor cn \rfloor balls and nn bins. Each ball chooses kk bins independently and uniformly at random. \emph{Is it possible to assign each ball to one of its choices such that the no bin receives more than ℓ\ell balls?} For all k≄3k\ge 3 and ℓ≄2\ell\ge 2 we give a critical value, ck,ℓ∗c_{k,\ell}^*, such that when cck,ℓ∗cc_{k,\ell}^* this is not the case. In case such an allocation exists, \emph{how quickly can we find it?} Previous work on total allocation time for case k≄3k\ge 3 and ℓ=1\ell=1 has analyzed a \emph{breadth first strategy} which is shown to be linear only in expectation. We give a simple and efficient algorithm which we also call \emph{local search allocation}(LSA) to find an allocation for all k≄3k\ge 3 and ℓ=1\ell=1. Provided the number of balls are below (but arbitrarily close to) the theoretical achievable load threshold, we give a \emph{linear} bound for the total allocation time that holds with high probability. We demonstrate, through simulations, an order of magnitude improvement for total and maximum allocation times when compared to the state of the art method. Our results find applications in many areas including hashing, load balancing, data management, orientability of random hypergraphs and maximum matchings in a special class of bipartite graphs.Die Idee, mehrere Wahlmöglichkeiten zu benutzen, um Zuordnungsschemas zu verbessern, ist mittlerweile gut verstanden und wird oft mit Hilfe des folgenden Beispiels illustriert: Man nehme an, dass n Kugeln auf n BehĂ€lter verteilt werden und jede Kugel unabhĂ€ngig und gleichverteilt per Zufall ihren BehĂ€lter wĂ€hlt. Die maximale Auslastung, bzw. die Anzahl an Kugeln im meist befĂŒllten BehĂ€lter, wird dann mit hoher Wahrscheinlichkeit schĂ€tzungsweise log⁥nlog⁥log⁥n\log n \over \log \log n sein. Alternativ können die Kugeln sequenziell zugeordnet werden, indem jede Kugel k ≄ 2 BehĂ€lter unabhĂ€ngig und gleichverteilt zufĂ€llig auswĂ€hlt und in dem am wenigsten befĂŒllten dieser k BehĂ€lter platziert wird. Azar, Broder, Karlin, and Upfal haben gezeigt, dass in diesem Szenario die maximale Auslastung mit hoher Wahrscheinlichkeit auf log⁥log⁥nlog⁥k+Θ(1){\log \log n \over \log k} +\Theta(1) sinkt, was eine exponentielle Verbesserung des vorhergehenden Falls darstellt. In dieser Doktorarbeit untersuchen wir solche Zuteilungschemas von einem etwas anderen Standpunkt. Statt die maximale Last zu minimieren, ïŹxieren wir die KapazitĂ€ten der BehĂ€lter und konzentrieren uns auf die Maximierung der Anzahl der Kugeln, die ohne Überlastung eines BehĂ€lters zugeteilt werden können. In dem von uns betrachteten Prozess haben wir m = bcnc Kugeln und n BehĂ€lter. Jede Kugel wĂ€hlt unabhĂ€ngig und gleichverteilt zufĂ€llig k BehĂ€lter. Ist es möglich, jeder Kugel einen BehĂ€lter ihrer Wahl zuzuordnen, so dass kein BehĂ€lter mehr als Kugeln erhĂ€lt? FĂŒr alle k ≄ 3 und ≄ 2 geben wir einen kritischen Wert ck,ℓ∗c _{k,\ell}^*, an sodass fĂŒr c c {k,\ell}^*nicht.ImFalle,dasssolcheineZuordnungexistiert,stelltsichdieFrage,wieschnelldiesegefundenwerdenkann.DiebisherdurchgefušhrtenArbeitenzurGesamtzuordnungszeitimFallek≄3and nicht. Im Falle, dass solch eine Zuordnung existiert, stellt sich die Frage, wie schnell diese gefunden werden kann. Die bisher durchgefĂŒhrten Arbeiten zur Gesamtzuordnungszeit im Falle k ≄ 3 and \ell = 1habeneineBreitensuchstrategieanalysiert,welchenurimErwartungswertlinearist.WirprašsentiereneineneinfachenundeïŹƒzientenAlgorithmus,welchenwirlocalsearchallocation(LSA)nennenundderZuteilungenfušrallek≄3und haben eine Breitensuchstrategie analysiert, welche nur im Erwartungswert linear ist. Wir prĂ€sentieren einen einfachen und eïŹƒzienten Algorithmus, welchen wir local search allocation (LSA) nennen und der Zuteilungen fĂŒr alle k ≄ 3 und \ell = 1$ ïŹndet. Sofern die Anzahl der Kugeln unter (aber beliebig nahe an) der theoretisch erreichbaren Lastschwelle ist, zeigen wir eine lineare Schranke fĂŒr die Gesamtzuordnungszeit, die mit hoher Wahrscheinlichkeit gilt. Anhand von Simulationen demonstrieren wir eine Verbesserung der Gesamt- und Maximalzuordnungszeiten um eine GrĂ¶ĂŸenordnung im Vergleich zu anderen aktuellen Methoden. Unsere Ergebnisse ïŹnden Anwendung in vielen Bereichen einschließlich Hashing, Lastbalancierung, Datenmanagement, Orientierbarkeit von zufĂ€lligen Hypergraphen und maximale Paarungen in einer speziellen Klasse von bipartiten Graphen

    Random hypergraphs for hashing-based data structures

    Get PDF
    This thesis concerns dictionaries and related data structures that rely on providing several random possibilities for storing each key. Imagine information on a set S of m = |S| keys should be stored in n memory locations, indexed by [n] = {1,
,n}. Each object x [ELEMENT OF] S is assigned a small set e(x) [SUBSET OF OR EQUAL TO] [n] of locations by a random hash function, independent of other objects. Information on x must then be stored in the locations from e(x) only. It is possible that too many objects compete for the same locations, in particular if the load c = m/n is high. Successfully storing all information may then be impossible. For most distributions of e(x), however, success or failure can be predicted very reliably, since the success probability is close to 1 for loads c less than a certain load threshold c^* and close to 0 for loads greater than this load threshold. We mainly consider two types of data structures: ‱ A cuckoo hash table is a dictionary data structure where each key x [ELEMENT OF] S is stored together with an associated value f(x) in one of the memory locations with an index from e(x). The distribution of e(x) is controlled by the hashing scheme. We analyse three known hashing schemes, and determine their exact load thresholds. The schemes are unaligned blocks, double hashing and a scheme for dynamically growing key sets. ‱ A retrieval data structure also stores a value f(x) for each x [ELEMENT OF] S. This time, the values stored in the memory locations from e(x) must satisfy a linear equation that characterises the value f(x). The resulting data structure is extremely compact, but unusual. It cannot answer questions of the form “is y [ELEMENT OF] S?”. Given a key y it returns a value z. If y [ELEMENT OF] S, then z = f(y) is guaranteed, otherwise z may be an arbitrary value. We consider two new hashing schemes, where the elements of e(x) are contained in one or two contiguous blocks. This yields good access times on a word RAM and high cache efficiency. An important question is whether these types of data structures can be constructed in linear time. The success probability of a natural linear time greedy algorithm exhibits, once again, threshold behaviour with respect to the load c. We identify a hashing scheme that leads to a particularly high threshold value in this regard. In the mathematical model, the memory locations [n] correspond to vertices, and the sets e(x) for x [ELEMENT OF] S correspond to hyperedges. Three properties of the resulting hypergraphs turn out to be important: peelability, solvability and orientability. Therefore, large parts of this thesis examine how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds. Translated back into the world of data structures, we achieve low access times, high memory efficiency and low construction times. We complement and support the theoretical results by experiments.Diese Arbeit behandelt WörterbĂŒcher und verwandte Datenstrukturen, die darauf aufbauen, mehrere zufĂ€llige Möglichkeiten zur Speicherung jedes SchlĂŒssels vorzusehen. Man stelle sich vor, Information ĂŒber eine Menge S von m = |S| SchlĂŒsseln soll in n SpeicherplĂ€tzen abgelegt werden, die durch [n] = {1,
,n} indiziert sind. Jeder SchlĂŒssel x [ELEMENT OF] S bekommt eine kleine Menge e(x) [SUBSET OF OR EQUAL TO] [n] von SpeicherplĂ€tzen durch eine zufĂ€llige Hashfunktion unabhĂ€ngig von anderen SchlĂŒsseln zugewiesen. Die Information ĂŒber x darf nun ausschließlich in den PlĂ€tzen aus e(x) untergebracht werden. Es kann hierbei passieren, dass zu viele SchlĂŒssel um dieselben SpeicherplĂ€tze konkurrieren, insbesondere bei hoher Auslastung c = m/n. Eine erfolgreiche Speicherung der Gesamtinformation ist dann eventuell unmöglich. FĂŒr die meisten Verteilungen von e(x) lĂ€sst sich Erfolg oder Misserfolg allerdings sehr zuverlĂ€ssig vorhersagen, da fĂŒr Auslastung c unterhalb eines gewissen Auslastungsschwellwertes c* die Erfolgswahrscheinlichkeit nahezu 1 ist und fĂŒr c jenseits dieses Auslastungsschwellwertes nahezu 0 ist. HauptsĂ€chlich werden wir zwei Arten von Datenstrukturen betrachten: ‱ Eine Kuckucks-Hashtabelle ist eine Wörterbuchdatenstruktur, bei der jeder SchlĂŒssel x [ELEMENT OF] S zusammen mit einem assoziierten Wert f(x) in einem der SpeicherplĂ€tze mit Index aus e(x) gespeichert wird. Die Verteilung von e(x) wird hierbei vom Hashing-Schema festgelegt. Wir analysieren drei bekannte Hashing-Schemata und bestimmen erstmals deren exakte Auslastungsschwellwerte im obigen Sinne. Die Schemata sind unausgerichtete Blöcke, Doppel-Hashing sowie ein Schema fĂŒr dynamisch wachsenden SchlĂŒsselmengen. ‱ Auch eine Retrieval-Datenstruktur speichert einen Wert f(x) fĂŒr alle x [ELEMENT OF] S. Diesmal sollen die Werte in den SpeicherplĂ€tzen aus e(x) eine lineare Gleichung erfĂŒllen, die den Wert f(x) charakterisiert. Die entstehende Datenstruktur ist extrem platzsparend, aber ungewöhnlich: Sie ist ungeeignet um Fragen der Form „ist y [ELEMENT OF] S?“ zu beantworten. Bei Anfrage eines SchlĂŒssels y wird ein Ergebnis z zurĂŒckgegeben. Falls y [ELEMENT OF] S ist, so ist z = f(y) garantiert, andernfalls darf z ein beliebiger Wert sein. Wir betrachten zwei neue Hashing-Schemata, bei denen die Elemente von e(x) in einem oder in zwei zusammenhĂ€ngenden Blöcken liegen. So werden gute Zugriffszeiten auf Word-RAMs und eine hohe Cache-Effizienz erzielt. Eine wichtige Frage ist, ob Datenstrukturen obiger Art in Linearzeit konstruiert werden können. Die Erfolgswahrscheinlichkeit eines naheliegenden Greedy-Algorithmus weist abermals ein Schwellwertverhalten in Bezug auf die Auslastung c auf. Wir identifizieren ein Hashing-Schema, das diesbezĂŒglich einen besonders hohen Schwellwert mit sich bringt. In der mathematischen Modellierung werden die Speicherpositionen [n] als Knoten und die Mengen e(x) fĂŒr x [ELEMENT OF] S als Hyperkanten aufgefasst. Drei Eigenschaften der entstehenden Hypergraphen stellen sich dann als zentral heraus: SchĂ€lbarkeit, Lösbarkeit und Orientierbarkeit. Weite Teile dieser Arbeit beschĂ€ftigen sich daher mit den Wahrscheinlichkeiten fĂŒr das Vorliegen dieser Eigenschaften abhĂ€ngig von Hashing Schema und Auslastung, sowie mit entsprechenden Schwellwerten. Eine RĂŒckĂŒbersetzung der Ergebnisse liefert dann Datenstrukturen mit geringen Anfragezeiten, hoher Speichereffizienz und geringen Konstruktionszeiten. Die theoretischen Überlegungen werden dabei durch experimentelle Ergebnisse ergĂ€nzt und gestĂŒtzt
    corecore