65 research outputs found

    Thresholds for Extreme Orientability

    Full text link
    Multiple-choice load balancing has been a topic of intense study since the seminal paper of Azar, Broder, Karlin, and Upfal. Questions in this area can be phrased in terms of orientations of a graph, or more generally a k-uniform random hypergraph. A (d,b)-orientation is an assignment of each edge to d of its vertices, such that no vertex has more than b edges assigned to it. Conditions for the existence of such orientations have been completely documented except for the "extreme" case of (k-1,1)-orientations. We consider this remaining case, and establish: - The density threshold below which an orientation exists with high probability, and above which it does not exist with high probability. - An algorithm for finding an orientation that runs in linear time with high probability, with explicit polynomial bounds on the failure probability. Previously, the only known algorithms for constructing (k-1,1)-orientations worked for k<=3, and were only shown to have expected linear running time.Comment: Corrected description of relationship to the work of LeLarg

    Ramsey games with giants

    Get PDF
    The classical result in the theory of random graphs, proved by Erdos and Renyi in 1960, concerns the threshold for the appearance of the giant component in the random graph process. We consider a variant of this problem, with a Ramsey flavor. Now, each random edge that arrives in the sequence of rounds must be colored with one of R colors. The goal can be either to create a giant component in every color class, or alternatively, to avoid it in every color. One can analyze the offline or online setting for this problem. In this paper, we consider all these variants and provide nontrivial upper and lower bounds; in certain cases (like online avoidance) the obtained bounds are asymptotically tight.Comment: 29 pages; minor revision

    Random hypergraphs for hashing-based data structures

    Get PDF
    This thesis concerns dictionaries and related data structures that rely on providing several random possibilities for storing each key. Imagine information on a set S of m = |S| keys should be stored in n memory locations, indexed by [n] = {1,
,n}. Each object x [ELEMENT OF] S is assigned a small set e(x) [SUBSET OF OR EQUAL TO] [n] of locations by a random hash function, independent of other objects. Information on x must then be stored in the locations from e(x) only. It is possible that too many objects compete for the same locations, in particular if the load c = m/n is high. Successfully storing all information may then be impossible. For most distributions of e(x), however, success or failure can be predicted very reliably, since the success probability is close to 1 for loads c less than a certain load threshold c^* and close to 0 for loads greater than this load threshold. We mainly consider two types of data structures: ‱ A cuckoo hash table is a dictionary data structure where each key x [ELEMENT OF] S is stored together with an associated value f(x) in one of the memory locations with an index from e(x). The distribution of e(x) is controlled by the hashing scheme. We analyse three known hashing schemes, and determine their exact load thresholds. The schemes are unaligned blocks, double hashing and a scheme for dynamically growing key sets. ‱ A retrieval data structure also stores a value f(x) for each x [ELEMENT OF] S. This time, the values stored in the memory locations from e(x) must satisfy a linear equation that characterises the value f(x). The resulting data structure is extremely compact, but unusual. It cannot answer questions of the form “is y [ELEMENT OF] S?”. Given a key y it returns a value z. If y [ELEMENT OF] S, then z = f(y) is guaranteed, otherwise z may be an arbitrary value. We consider two new hashing schemes, where the elements of e(x) are contained in one or two contiguous blocks. This yields good access times on a word RAM and high cache efficiency. An important question is whether these types of data structures can be constructed in linear time. The success probability of a natural linear time greedy algorithm exhibits, once again, threshold behaviour with respect to the load c. We identify a hashing scheme that leads to a particularly high threshold value in this regard. In the mathematical model, the memory locations [n] correspond to vertices, and the sets e(x) for x [ELEMENT OF] S correspond to hyperedges. Three properties of the resulting hypergraphs turn out to be important: peelability, solvability and orientability. Therefore, large parts of this thesis examine how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds. Translated back into the world of data structures, we achieve low access times, high memory efficiency and low construction times. We complement and support the theoretical results by experiments.Diese Arbeit behandelt WörterbĂŒcher und verwandte Datenstrukturen, die darauf aufbauen, mehrere zufĂ€llige Möglichkeiten zur Speicherung jedes SchlĂŒssels vorzusehen. Man stelle sich vor, Information ĂŒber eine Menge S von m = |S| SchlĂŒsseln soll in n SpeicherplĂ€tzen abgelegt werden, die durch [n] = {1,
,n} indiziert sind. Jeder SchlĂŒssel x [ELEMENT OF] S bekommt eine kleine Menge e(x) [SUBSET OF OR EQUAL TO] [n] von SpeicherplĂ€tzen durch eine zufĂ€llige Hashfunktion unabhĂ€ngig von anderen SchlĂŒsseln zugewiesen. Die Information ĂŒber x darf nun ausschließlich in den PlĂ€tzen aus e(x) untergebracht werden. Es kann hierbei passieren, dass zu viele SchlĂŒssel um dieselben SpeicherplĂ€tze konkurrieren, insbesondere bei hoher Auslastung c = m/n. Eine erfolgreiche Speicherung der Gesamtinformation ist dann eventuell unmöglich. FĂŒr die meisten Verteilungen von e(x) lĂ€sst sich Erfolg oder Misserfolg allerdings sehr zuverlĂ€ssig vorhersagen, da fĂŒr Auslastung c unterhalb eines gewissen Auslastungsschwellwertes c* die Erfolgswahrscheinlichkeit nahezu 1 ist und fĂŒr c jenseits dieses Auslastungsschwellwertes nahezu 0 ist. HauptsĂ€chlich werden wir zwei Arten von Datenstrukturen betrachten: ‱ Eine Kuckucks-Hashtabelle ist eine Wörterbuchdatenstruktur, bei der jeder SchlĂŒssel x [ELEMENT OF] S zusammen mit einem assoziierten Wert f(x) in einem der SpeicherplĂ€tze mit Index aus e(x) gespeichert wird. Die Verteilung von e(x) wird hierbei vom Hashing-Schema festgelegt. Wir analysieren drei bekannte Hashing-Schemata und bestimmen erstmals deren exakte Auslastungsschwellwerte im obigen Sinne. Die Schemata sind unausgerichtete Blöcke, Doppel-Hashing sowie ein Schema fĂŒr dynamisch wachsenden SchlĂŒsselmengen. ‱ Auch eine Retrieval-Datenstruktur speichert einen Wert f(x) fĂŒr alle x [ELEMENT OF] S. Diesmal sollen die Werte in den SpeicherplĂ€tzen aus e(x) eine lineare Gleichung erfĂŒllen, die den Wert f(x) charakterisiert. Die entstehende Datenstruktur ist extrem platzsparend, aber ungewöhnlich: Sie ist ungeeignet um Fragen der Form „ist y [ELEMENT OF] S?“ zu beantworten. Bei Anfrage eines SchlĂŒssels y wird ein Ergebnis z zurĂŒckgegeben. Falls y [ELEMENT OF] S ist, so ist z = f(y) garantiert, andernfalls darf z ein beliebiger Wert sein. Wir betrachten zwei neue Hashing-Schemata, bei denen die Elemente von e(x) in einem oder in zwei zusammenhĂ€ngenden Blöcken liegen. So werden gute Zugriffszeiten auf Word-RAMs und eine hohe Cache-Effizienz erzielt. Eine wichtige Frage ist, ob Datenstrukturen obiger Art in Linearzeit konstruiert werden können. Die Erfolgswahrscheinlichkeit eines naheliegenden Greedy-Algorithmus weist abermals ein Schwellwertverhalten in Bezug auf die Auslastung c auf. Wir identifizieren ein Hashing-Schema, das diesbezĂŒglich einen besonders hohen Schwellwert mit sich bringt. In der mathematischen Modellierung werden die Speicherpositionen [n] als Knoten und die Mengen e(x) fĂŒr x [ELEMENT OF] S als Hyperkanten aufgefasst. Drei Eigenschaften der entstehenden Hypergraphen stellen sich dann als zentral heraus: SchĂ€lbarkeit, Lösbarkeit und Orientierbarkeit. Weite Teile dieser Arbeit beschĂ€ftigen sich daher mit den Wahrscheinlichkeiten fĂŒr das Vorliegen dieser Eigenschaften abhĂ€ngig von Hashing Schema und Auslastung, sowie mit entsprechenden Schwellwerten. Eine RĂŒckĂŒbersetzung der Ergebnisse liefert dann Datenstrukturen mit geringen Anfragezeiten, hoher Speichereffizienz und geringen Konstruktionszeiten. Die theoretischen Überlegungen werden dabei durch experimentelle Ergebnisse ergĂ€nzt und gestĂŒtzt

    Answering Spatial Multiple-Set Intersection Queries Using 2-3 Cuckoo Hash-Filters

    Full text link
    We show how to answer spatial multiple-set intersection queries in O(n(log w)/w + kt) expected time, where n is the total size of the t sets involved in the query, w is the number of bits in a memory word, k is the output size, and c is any fixed constant. This improves the asymptotic performance over previous solutions and is based on an interesting data structure, known as 2-3 cuckoo hash-filters. Our results apply in the word-RAM model (or practical RAM model), which allows for constant-time bit-parallel operations, such as bitwise AND, OR, NOT, and MSB (most-significant 1-bit), as exist in modern CPUs and GPUs. Our solutions apply to any multiple-set intersection queries in spatial data sets that can be reduced to one-dimensional range queries, such as spatial join queries for one-dimensional points or sets of points stored along space-filling curves, which are used in GIS applications.Comment: Full version of paper from 2017 ACM SIGSPATIAL International Conference on Advances in Geographic Information System

    Generation and properties of random graphs and analysis of randomized algorithms

    Get PDF
    We study a new method of generating random dd-regular graphs by repeatedly applying an operation called pegging. The pegging algorithm, which applies the pegging operation in each step, is a method of generating large random regular graphs beginning with small ones. We prove that the limiting joint distribution of the numbers of short cycles in the resulting graph is independent Poisson. We use the coupling method to bound the total variation distance between the joint distribution of short cycle counts and its limit and thereby show that O(ϔ−1)O(\epsilon^{-1}) is an upper bound of the \eps-mixing time. The coupling involves two different, though quite similar, Markov chains that are not time-homogeneous. We also show that the Ï”\epsilon-mixing time is not o(ϔ−1)o(\epsilon^{-1}). This demonstrates that the upper bound is essentially tight. We study also the connectivity of random dd-regular graphs generated by the pegging algorithm. We show that these graphs are asymptotically almost surely dd-connected for any even constant d≄4d\ge 4. The problem of orientation of random hypergraphs is motivated by the classical load balancing problem. Let h>w>0h>w>0 be two fixed integers. Let \orH be a hypergraph whose hyperedges are uniformly of size hh. To {\em ww-orient} a hyperedge, we assign exactly ww of its vertices positive signs with respect to this hyperedge, and the rest negative. A (w,k)(w,k)-orientation of \orH consists of a ww-orientation of all hyperedges of \orH, such that each vertex receives at most kk positive signs from its incident hyperedges. When kk is large enough, we determine the threshold of the existence of a (w,k)(w,k)-orientation of a random hypergraph. The (w,k)(w,k)-orientation of hypergraphs is strongly related to a general version of the off-line load balancing problem. The other topic we discuss is computing the probability of induced subgraphs in a random regular graph. Let 0<s<n0<s<n and HH be a graph on ss vertices. For any S⊂[n]S\subset [n] with ∣S∣=s|S|=s, we compute the probability that the subgraph of Gn,d\mathcal{G}_{n,d} induced by SS is HH. The result holds for any d=o(n1/3)d=o(n^{1/3}) and is further extended to Gn,d\mathcal{G}_{n,{\bf d}}, the probability space of random graphs with given degree sequence d\bf d. This result provides a basic tool for studying properties, for instance the existence or the counts, of certain types of induced subgraphs

    Private Set Intersection with Linear Communication from General Assumptions

    Get PDF
    This work presents a hashing-based algorithm for Private Set Intersection (PSI) in the honest-but-curious setting. The protocol is generic, modular and provides both asymptotic and concrete efficiency improvements over existing PSI protocols. If each player has mm elements, our scheme requires only O(m \secpar) communication between the parties, where \secpar is a security parameter. Our protocol builds on the hashing-based PSI protocol of Pinkas et al. (USENIX 2014, USENIX 2015), but we replace one of the sub-protocols (handling the cuckoo ``stash\u27\u27) with a special-purpose PSI protocol that is optimized for comparing sets of unbalanced size. This brings the asymptotic communication complexity of the overall protocol down from \omega(m \secpar) to O(m\secpar), and provides concrete performance improvements (10-15\% reduction in communication costs) over Kolesnikov et al. (CCS 2016) under real-world parameter choices. Our protocol is simple, generic and benefits from the permutation-hashing optimizations of Pinkas et al. (USENIX 2015) and the Batched, Relaxed Oblivious Pseudo Random Functions of Kolesnikov et al. (CCS 2016)
    • 

    corecore