70 research outputs found
Load thresholds for cuckoo hashing with overlapping blocks
Dietzfelbinger and Weidling [DW07] proposed a natural variation of cuckoo
hashing where each of objects is assigned intervals of size
in a linear (or cyclic) hash table of size and both start points are chosen
independently and uniformly at random. Each object must be placed into a table
cell within its intervals, but each cell can only hold one object. Experiments
suggested that this scheme outperforms the variant with blocks in which
intervals are aligned at multiples of . In particular, the load threshold
is higher, i.e. the load that can be achieved with high probability. For
instance, Lehman and Panigrahy [LP09] empirically observed the threshold for
to be around as compared to roughly using blocks.
They managed to pin down the asymptotics of the thresholds for large ,
but the precise values resisted rigorous analysis.
We establish a method to determine these load thresholds for all , and, in fact, for general . For instance, for we
get . The key tool we employ is an insightful and general
theorem due to Leconte, Lelarge, and Massouli\'e [LLM13], which adapts methods
from statistical physics to the world of hypergraph orientability. In effect,
the orientability thresholds for our graph families are determined by belief
propagation equations for certain graph limits. As a side note we provide
experimental evidence suggesting that placements can be constructed in linear
time with loads close to the threshold using an adapted version of an algorithm
by Khosla [Kho13]
Dense peelable random uniform hypergraphs
We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1.
In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled "from the outside in". The density thresholds f_k for peelability of our hypergraphs (f_3 ~ 0.918, f_4 ~ 0.977, f_5 ~ 0.992, ...) are well beyond the corresponding thresholds (c_3 ~ 0.818, c_4 ~ 0.772, c_5 ~ 0.702, ...) of standard k-uniform random hypergraphs.
To get a grip on f_k, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on [0,1]^Z and f_k can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods.
Random hypergraphs underlie the construction of various data structures based on hashing, for instance invertible Bloom filters, perfect hash functions, retrieval data structures, error correcting codes and cuckoo hash tables, where inputs are mapped to edges using hash functions. Frequently, the data structures rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. Memory efficiency is closely tied to edge density while worst and average case query times are tied to maximum and average edge size.
To demonstrate the usefulness of our construction, we used our 3-uniform hypergraphs as a drop-in replacement for the standard 3-uniform hypergraphs in a retrieval data structure by Botelho et al. [Fabiano Cupertino Botelho et al., 2013]. This reduces memory usage from 1.23m bits to 1.12m bits (m being the input size) with almost no change in running time. Using k > 3 attains, at small sacrifices in running time, further improvements to memory usage
Towards Optimal Degree-distributions for Left-perfect Matchings in Random Bipartite Graphs
Consider a random bipartite multigraph with left nodes and right nodes. Each left node has random right
neighbors. The average left degree is fixed, . We ask
whether for the probability that has a left-perfect matching it is
advantageous not to fix for each left node but rather choose it at
random according to some (cleverly chosen) distribution. We show the following,
provided that the degrees of the left nodes are independent: If is an
integer then it is optimal to use a fixed degree of for all left
nodes. If is non-integral then an optimal degree-distribution has the
property that each left node has two possible degrees, \floor{\Delta} and
\ceil{\Delta}, with probability and , respectively, where
is from the closed interval and the average over all equals
\ceil{\Delta}-\Delta. Furthermore, if and is
constant, then each distribution of the left degrees that meets the conditions
above determines the same threshold that has the following
property as goes to infinity: If then there exists a
left-perfect matching with high probability. If then there
exists no left-perfect matching with high probability. The threshold
is the same as the known threshold for offline -ary cuckoo
hashing for integral or non-integral
Random hypergraphs for hashing-based data structures
This thesis concerns dictionaries and related data structures that rely on providing several random possibilities for storing each key. Imagine information on a set S of m = |S| keys should be stored in n memory locations, indexed by [n] = {1,âŠ,n}. Each object x [ELEMENT OF] S is assigned a small set e(x) [SUBSET OF OR EQUAL TO] [n] of locations by a random hash function, independent of other objects. Information on x must then be stored in the locations from e(x) only. It is possible that too many objects compete for the same locations, in particular if the load c = m/n is high. Successfully storing all information may then be impossible. For most distributions of e(x), however, success or failure can be predicted very reliably, since the success probability is close to 1 for loads c less than a certain load threshold c^* and close to 0 for loads greater than this load threshold. We mainly consider two types of data structures: âą A cuckoo hash table is a dictionary data structure where each key x [ELEMENT OF] S is stored together with an associated value f(x) in one of the memory locations with an index from e(x). The distribution of e(x) is controlled by the hashing scheme. We analyse three known hashing schemes, and determine their exact load thresholds. The schemes are unaligned blocks, double hashing and a scheme for dynamically growing key sets. âą A retrieval data structure also stores a value f(x) for each x [ELEMENT OF] S. This time, the values stored in the memory locations from e(x) must satisfy a linear equation that characterises the value f(x). The resulting data structure is extremely compact, but unusual. It cannot answer questions of the form âis y [ELEMENT OF] S?â. Given a key y it returns a value z. If y [ELEMENT OF] S, then z = f(y) is guaranteed, otherwise z may be an arbitrary value. We consider two new hashing schemes, where the elements of e(x) are contained in one or two contiguous blocks. This yields good access times on a word RAM and high cache efficiency. An important question is whether these types of data structures can be constructed in linear time. The success probability of a natural linear time greedy algorithm exhibits, once again, threshold behaviour with respect to the load c. We identify a hashing scheme that leads to a particularly high threshold value in this regard. In the mathematical model, the memory locations [n] correspond to vertices, and the sets e(x) for x [ELEMENT OF] S correspond to hyperedges. Three properties of the resulting hypergraphs turn out to be important: peelability, solvability and orientability. Therefore, large parts of this thesis examine how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds. Translated back into the world of data structures, we achieve low access times, high memory efficiency and low construction times. We complement and support the theoretical results by experiments.Diese Arbeit behandelt WörterbĂŒcher und verwandte Datenstrukturen, die darauf aufbauen, mehrere zufĂ€llige Möglichkeiten zur Speicherung jedes SchlĂŒssels vorzusehen. Man stelle sich vor, Information ĂŒber eine Menge S von m = |S| SchlĂŒsseln soll in n SpeicherplĂ€tzen abgelegt werden, die durch [n] = {1,âŠ,n} indiziert sind. Jeder SchlĂŒssel x [ELEMENT OF] S bekommt eine kleine Menge e(x) [SUBSET OF OR EQUAL TO] [n] von SpeicherplĂ€tzen durch eine zufĂ€llige Hashfunktion unabhĂ€ngig von anderen SchlĂŒsseln zugewiesen. Die Information ĂŒber x darf nun ausschlieĂlich in den PlĂ€tzen aus e(x) untergebracht werden. Es kann hierbei passieren, dass zu viele SchlĂŒssel um dieselben SpeicherplĂ€tze konkurrieren, insbesondere bei hoher Auslastung c = m/n. Eine erfolgreiche Speicherung der Gesamtinformation ist dann eventuell unmöglich. FĂŒr die meisten Verteilungen von e(x) lĂ€sst sich Erfolg oder Misserfolg allerdings sehr zuverlĂ€ssig vorhersagen, da fĂŒr Auslastung c unterhalb eines gewissen Auslastungsschwellwertes c* die Erfolgswahrscheinlichkeit nahezu 1 ist und fĂŒr c jenseits dieses Auslastungsschwellwertes nahezu 0 ist. HauptsĂ€chlich werden wir zwei Arten von Datenstrukturen betrachten: âą Eine Kuckucks-Hashtabelle ist eine Wörterbuchdatenstruktur, bei der jeder SchlĂŒssel x [ELEMENT OF] S zusammen mit einem assoziierten Wert f(x) in einem der SpeicherplĂ€tze mit Index aus e(x) gespeichert wird. Die Verteilung von e(x) wird hierbei vom Hashing-Schema festgelegt. Wir analysieren drei bekannte Hashing-Schemata und bestimmen erstmals deren exakte Auslastungsschwellwerte im obigen Sinne. Die Schemata sind unausgerichtete Blöcke, Doppel-Hashing sowie ein Schema fĂŒr dynamisch wachsenden SchlĂŒsselmengen. âą Auch eine Retrieval-Datenstruktur speichert einen Wert f(x) fĂŒr alle x [ELEMENT OF] S. Diesmal sollen die Werte in den SpeicherplĂ€tzen aus e(x) eine lineare Gleichung erfĂŒllen, die den Wert f(x) charakterisiert. Die entstehende Datenstruktur ist extrem platzsparend, aber ungewöhnlich: Sie ist ungeeignet um Fragen der Form âist y [ELEMENT OF] S?â zu beantworten. Bei Anfrage eines SchlĂŒssels y wird ein Ergebnis z zurĂŒckgegeben. Falls y [ELEMENT OF] S ist, so ist z = f(y) garantiert, andernfalls darf z ein beliebiger Wert sein. Wir betrachten zwei neue Hashing-Schemata, bei denen die Elemente von e(x) in einem oder in zwei zusammenhĂ€ngenden Blöcken liegen. So werden gute Zugriffszeiten auf Word-RAMs und eine hohe Cache-Effizienz erzielt. Eine wichtige Frage ist, ob Datenstrukturen obiger Art in Linearzeit konstruiert werden können. Die Erfolgswahrscheinlichkeit eines naheliegenden Greedy-Algorithmus weist abermals ein Schwellwertverhalten in Bezug auf die Auslastung c auf. Wir identifizieren ein Hashing-Schema, das diesbezĂŒglich einen besonders hohen Schwellwert mit sich bringt. In der mathematischen Modellierung werden die Speicherpositionen [n] als Knoten und die Mengen e(x) fĂŒr x [ELEMENT OF] S als Hyperkanten aufgefasst. Drei Eigenschaften der entstehenden Hypergraphen stellen sich dann als zentral heraus: SchĂ€lbarkeit, Lösbarkeit und Orientierbarkeit. Weite Teile dieser Arbeit beschĂ€ftigen sich daher mit den Wahrscheinlichkeiten fĂŒr das Vorliegen dieser Eigenschaften abhĂ€ngig von Hashing Schema und Auslastung, sowie mit entsprechenden Schwellwerten. Eine RĂŒckĂŒbersetzung der Ergebnisse liefert dann Datenstrukturen mit geringen Anfragezeiten, hoher Speichereffizienz und geringen Konstruktionszeiten. Die theoretischen Ăberlegungen werden dabei durch experimentelle Ergebnisse ergĂ€nzt und gestĂŒtzt
Explorative Graph Visualization
Netzwerkstrukturen (Graphen) sind heutzutage weit verbreitet. Ihre Untersuchung dient dazu, ein besseres VerstĂ€ndnis ihrer Struktur und der durch sie modellierten realen Aspekte zu gewinnen. Die Exploration solcher Netzwerke wird zumeist mit Visualisierungstechniken unterstĂŒtzt. Ziel dieser Arbeit ist es, einen Ăberblick ĂŒber die Probleme dieser Visualisierungen zu geben und konkrete LösungsansĂ€tze aufzuzeigen. Dabei werden neue Visualisierungstechniken eingefĂŒhrt, um den Nutzen der gefĂŒhrten Diskussion fĂŒr die explorative Graphvisualisierung am konkreten Beispiel zu belegen.Network structures (graphs) have become a natural part of everyday life and their analysis helps to gain an understanding of their inherent structure and the real-world aspects thereby expressed. The exploration of graphs is largely supported and driven by visual means. The aim of this thesis is to give a comprehensive view on the problems associated with these visual means and to detail concrete solution approaches for them. Concrete visualization techniques are introduced to underline the value of this comprehensive discussion for supporting explorative graph visualization
Generation and properties of random graphs and analysis of randomized algorithms
We study a new method of generating random -regular graphs by
repeatedly applying an operation called pegging. The pegging
algorithm, which applies the pegging operation in each step, is a
method of generating large random regular graphs beginning with
small ones. We prove that the limiting joint distribution of the
numbers of short cycles in the resulting graph is independent
Poisson. We use the coupling method to bound the total variation
distance between the joint distribution of short cycle counts and
its limit and thereby show that is an upper bound
of the \eps-mixing time. The coupling involves two different,
though quite similar, Markov chains that are not time-homogeneous.
We also show that the -mixing time is not
. This demonstrates that the upper bound
is essentially tight. We study also the
connectivity of random -regular graphs generated by the pegging
algorithm. We show that these graphs are asymptotically almost
surely -connected for any even constant .
The problem of orientation of random hypergraphs is motivated by the
classical load balancing problem. Let be two fixed integers.
Let \orH be a hypergraph whose hyperedges are uniformly of size
.
To {\em -orient} a hyperedge, we assign exactly of its
vertices positive signs with respect to this hyperedge, and the rest
negative. A -orientation of \orH consists of a
-orientation of all hyperedges of \orH, such that each vertex
receives at most positive signs from its incident hyperedges.
When is large enough, we determine the threshold of the
existence of a -orientation of a random hypergraph. The
-orientation of hypergraphs is strongly related to a general
version of the off-line load balancing problem.
The other topic we discuss is computing the probability of induced
subgraphs in a random regular graph. Let and be a graph
on vertices. For any with , we compute the
probability that the subgraph of induced by
is . The result holds for any and is further
extended to , the probability space of
random graphs with given degree sequence . This result
provides a basic tool for studying properties, for instance the
existence or the counts, of certain types of induced subgraphs
Co-occurrence simplicial complexes in mathematics:identifying the holes of knowledge
In the last years complex networks tools contributed to provide insights on
the structure of research, through the study of collaboration, citation and
co-occurrence networks. The network approach focuses on pairwise relationships,
often compressing multidimensional data structures and inevitably losing
information. In this paper we propose for the first time a simplicial complex
approach to word co-occurrences, providing a natural framework for the study of
higher-order relations in the space of scientific knowledge. Using topological
methods we explore the conceptual landscape of mathematical research, focusing
on homological holes, regions with low connectivity in the simplicial
structure. We find that homological holes are ubiquitous, which suggests that
they capture some essential feature of research practice in mathematics. Holes
die when a subset of their concepts appear in the same article, hence their
death may be a sign of the creation of new knowledge, as we show with some
examples. We find a positive relation between the dimension of a hole and the
time it takes to be closed: larger holes may represent potential for important
advances in the field because they separate conceptually distant areas. We also
show that authors' conceptual entropy is positively related with their
contribution to homological holes, suggesting that polymaths tend to be on the
frontier of research
- âŠ