Search CORE

70 research outputs found

Load thresholds for cuckoo hashing with overlapping blocks

Author: Walzer Stefan
Publication venue
Publication date: 01/01/2018
Field of study

Dietzfelbinger and Weidling [DW07] proposed a natural variation of cuckoo hashing where each of

cn

objects is assigned

k = 2

intervals of size

\ell

in a linear (or cyclic) hash table of size

n

and both start points are chosen independently and uniformly at random. Each object must be placed into a table cell within its intervals, but each cell can only hold one object. Experiments suggested that this scheme outperforms the variant with blocks in which intervals are aligned at multiples of

\ell

. In particular, the load threshold is higher, i.e. the load

c

that can be achieved with high probability. For instance, Lehman and Panigrahy [LP09] empirically observed the threshold for

\ell = 2

to be around

96.5\%

as compared to roughly

89.7\%

using blocks. They managed to pin down the asymptotics of the thresholds for large

\ell

, but the precise values resisted rigorous analysis. We establish a method to determine these load thresholds for all

\ell \geq 2

, and, in fact, for general

k \geq 2

. For instance, for

k = \ell = 2

we get

\approx 96.4995\%

. The key tool we employ is an insightful and general theorem due to Leconte, Lelarge, and Massouli\'e [LLM13], which adapts methods from statistical physics to the world of hypergraph orientability. In effect, the orientability thresholds for our graph families are determined by belief propagation equations for certain graph limits. As a side note we provide experimental evidence suggesting that placements can be constructed in linear time with loads close to the threshold using an adapted version of an algorithm by Khosla [Kho13]

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Dense peelable random uniform hypergraphs

Author: Dietzfelbinger Martin
Walzer Stefan
Publication venue
Publication date: 01/01/2019
Field of study

We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled "from the outside in". The density thresholds f_k for peelability of our hypergraphs (f_3 ~ 0.918, f_4 ~ 0.977, f_5 ~ 0.992, ...) are well beyond the corresponding thresholds (c_3 ~ 0.818, c_4 ~ 0.772, c_5 ~ 0.702, ...) of standard k-uniform random hypergraphs. To get a grip on f_k, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on [0,1]^Z and f_k can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods. Random hypergraphs underlie the construction of various data structures based on hashing, for instance invertible Bloom filters, perfect hash functions, retrieval data structures, error correcting codes and cuckoo hash tables, where inputs are mapped to edges using hash functions. Frequently, the data structures rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. Memory efficiency is closely tied to edge density while worst and average case query times are tied to maximum and average edge size. To demonstrate the usefulness of our construction, we used our 3-uniform hypergraphs as a drop-in replacement for the standard 3-uniform hypergraphs in a retrieval data structure by Botelho et al. [Fabiano Cupertino Botelho et al., 2013]. This reduces memory usage from 1.23m bits to 1.12m bits (m being the input size) with almost no change in running time. Using k > 3 attains, at small sacrifices in running time, further improvements to memory usage

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Towards Optimal Degree-distributions for Left-perfect Matchings in Random Bipartite Graphs

Author: Fakultät Für Informatik Und Automatisierung
Martin Dietzfelbinger
Michael Rink
Publication venue
Publication date: 27/04/2012
Field of study

Consider a random bipartite multigraph

G

with

n

left nodes and

m \geq n \geq 2

right nodes. Each left node

x

has

d_x \geq 1

random right neighbors. The average left degree

\Delta

is fixed,

\Delta \geq 2

. We ask whether for the probability that

G

has a left-perfect matching it is advantageous not to fix

d_x

for each left node

x

but rather choose it at random according to some (cleverly chosen) distribution. We show the following, provided that the degrees of the left nodes are independent: If

\Delta

is an integer then it is optimal to use a fixed degree of

\Delta

for all left nodes. If

\Delta

is non-integral then an optimal degree-distribution has the property that each left node

x

has two possible degrees, \floor{\Delta} and \ceil{\Delta}, with probability

p_x

and

1-p_x

, respectively, where

p_x

is from the closed interval

[0,1]

and the average over all

p_x

equals \ceil{\Delta}-\Delta. Furthermore, if

n=c\cdot m

and

\Delta>2

is constant, then each distribution of the left degrees that meets the conditions above determines the same threshold

c^*(\Delta)

that has the following property as

n

goes to infinity: If

c<c^*(\Delta)

then there exists a left-perfect matching with high probability. If

c>c^*(\Delta)

then there exists no left-perfect matching with high probability. The threshold

c^*(\Delta)

is the same as the known threshold for offline

k

-ary cuckoo hashing for integral or non-integral

k=\Delta

arXiv.org e-Print Archive

CiteSeerX

Random hypergraphs for hashing-based data structures

Author: Walzer Stefan
Publication venue
Publication date: 01/01/2020
Field of study

This thesis concerns dictionaries and related data structures that rely on providing several random possibilities for storing each key. Imagine information on a set S of m = |S| keys should be stored in n memory locations, indexed by [n] = {1,…,n}. Each object x [ELEMENT OF] S is assigned a small set e(x) [SUBSET OF OR EQUAL TO] [n] of locations by a random hash function, independent of other objects. Information on x must then be stored in the locations from e(x) only. It is possible that too many objects compete for the same locations, in particular if the load c = m/n is high. Successfully storing all information may then be impossible. For most distributions of e(x), however, success or failure can be predicted very reliably, since the success probability is close to 1 for loads c less than a certain load threshold c^* and close to 0 for loads greater than this load threshold. We mainly consider two types of data structures: • A cuckoo hash table is a dictionary data structure where each key x [ELEMENT OF] S is stored together with an associated value f(x) in one of the memory locations with an index from e(x). The distribution of e(x) is controlled by the hashing scheme. We analyse three known hashing schemes, and determine their exact load thresholds. The schemes are unaligned blocks, double hashing and a scheme for dynamically growing key sets. • A retrieval data structure also stores a value f(x) for each x [ELEMENT OF] S. This time, the values stored in the memory locations from e(x) must satisfy a linear equation that characterises the value f(x). The resulting data structure is extremely compact, but unusual. It cannot answer questions of the form “is y [ELEMENT OF] S?”. Given a key y it returns a value z. If y [ELEMENT OF] S, then z = f(y) is guaranteed, otherwise z may be an arbitrary value. We consider two new hashing schemes, where the elements of e(x) are contained in one or two contiguous blocks. This yields good access times on a word RAM and high cache efficiency. An important question is whether these types of data structures can be constructed in linear time. The success probability of a natural linear time greedy algorithm exhibits, once again, threshold behaviour with respect to the load c. We identify a hashing scheme that leads to a particularly high threshold value in this regard. In the mathematical model, the memory locations [n] correspond to vertices, and the sets e(x) for x [ELEMENT OF] S correspond to hyperedges. Three properties of the resulting hypergraphs turn out to be important: peelability, solvability and orientability. Therefore, large parts of this thesis examine how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds. Translated back into the world of data structures, we achieve low access times, high memory efficiency and low construction times. We complement and support the theoretical results by experiments.Diese Arbeit behandelt Wörterbücher und verwandte Datenstrukturen, die darauf aufbauen, mehrere zufällige Möglichkeiten zur Speicherung jedes Schlüssels vorzusehen. Man stelle sich vor, Information über eine Menge S von m = |S| Schlüsseln soll in n Speicherplätzen abgelegt werden, die durch [n] = {1,…,n} indiziert sind. Jeder Schlüssel x [ELEMENT OF] S bekommt eine kleine Menge e(x) [SUBSET OF OR EQUAL TO] [n] von Speicherplätzen durch eine zufällige Hashfunktion unabhängig von anderen Schlüsseln zugewiesen. Die Information über x darf nun ausschließlich in den Plätzen aus e(x) untergebracht werden. Es kann hierbei passieren, dass zu viele Schlüssel um dieselben Speicherplätze konkurrieren, insbesondere bei hoher Auslastung c = m/n. Eine erfolgreiche Speicherung der Gesamtinformation ist dann eventuell unmöglich. Für die meisten Verteilungen von e(x) lässt sich Erfolg oder Misserfolg allerdings sehr zuverlässig vorhersagen, da für Auslastung c unterhalb eines gewissen Auslastungsschwellwertes c* die Erfolgswahrscheinlichkeit nahezu 1 ist und für c jenseits dieses Auslastungsschwellwertes nahezu 0 ist. Hauptsächlich werden wir zwei Arten von Datenstrukturen betrachten: • Eine Kuckucks-Hashtabelle ist eine Wörterbuchdatenstruktur, bei der jeder Schlüssel x [ELEMENT OF] S zusammen mit einem assoziierten Wert f(x) in einem der Speicherplätze mit Index aus e(x) gespeichert wird. Die Verteilung von e(x) wird hierbei vom Hashing-Schema festgelegt. Wir analysieren drei bekannte Hashing-Schemata und bestimmen erstmals deren exakte Auslastungsschwellwerte im obigen Sinne. Die Schemata sind unausgerichtete Blöcke, Doppel-Hashing sowie ein Schema für dynamisch wachsenden Schlüsselmengen. • Auch eine Retrieval-Datenstruktur speichert einen Wert f(x) für alle x [ELEMENT OF] S. Diesmal sollen die Werte in den Speicherplätzen aus e(x) eine lineare Gleichung erfüllen, die den Wert f(x) charakterisiert. Die entstehende Datenstruktur ist extrem platzsparend, aber ungewöhnlich: Sie ist ungeeignet um Fragen der Form „ist y [ELEMENT OF] S?“ zu beantworten. Bei Anfrage eines Schlüssels y wird ein Ergebnis z zurückgegeben. Falls y [ELEMENT OF] S ist, so ist z = f(y) garantiert, andernfalls darf z ein beliebiger Wert sein. Wir betrachten zwei neue Hashing-Schemata, bei denen die Elemente von e(x) in einem oder in zwei zusammenhängenden Blöcken liegen. So werden gute Zugriffszeiten auf Word-RAMs und eine hohe Cache-Effizienz erzielt. Eine wichtige Frage ist, ob Datenstrukturen obiger Art in Linearzeit konstruiert werden können. Die Erfolgswahrscheinlichkeit eines naheliegenden Greedy-Algorithmus weist abermals ein Schwellwertverhalten in Bezug auf die Auslastung c auf. Wir identifizieren ein Hashing-Schema, das diesbezüglich einen besonders hohen Schwellwert mit sich bringt. In der mathematischen Modellierung werden die Speicherpositionen [n] als Knoten und die Mengen e(x) für x [ELEMENT OF] S als Hyperkanten aufgefasst. Drei Eigenschaften der entstehenden Hypergraphen stellen sich dann als zentral heraus: Schälbarkeit, Lösbarkeit und Orientierbarkeit. Weite Teile dieser Arbeit beschäftigen sich daher mit den Wahrscheinlichkeiten für das Vorliegen dieser Eigenschaften abhängig von Hashing Schema und Auslastung, sowie mit entsprechenden Schwellwerten. Eine Rückübersetzung der Ergebnisse liefert dann Datenstrukturen mit geringen Anfragezeiten, hoher Speichereffizienz und geringen Konstruktionszeiten. Die theoretischen Überlegungen werden dabei durch experimentelle Ergebnisse ergänzt und gestützt

Simple multi-party set reconciliation

Author: A Broder
A Juels
D Eppstein
D Shah
E Cohen
F Botelho
J Byers
JT Schwartz
M Luby
Michael Mitzenmacher
P Bailis
Rasmus Pagh
S Deb
S Hedetniemi
S Katti
W Knödel
Y Dodis
Y Minsky
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Explorative Graph Visualization

Author: Schulz Hans-Jörg (gnd: 142214140)
Publication venue: Universität Rostock Rostock
Publication date: 01/01/2010
Field of study

Netzwerkstrukturen (Graphen) sind heutzutage weit verbreitet. Ihre Untersuchung dient dazu, ein besseres Verständnis ihrer Struktur und der durch sie modellierten realen Aspekte zu gewinnen. Die Exploration solcher Netzwerke wird zumeist mit Visualisierungstechniken unterstützt. Ziel dieser Arbeit ist es, einen Überblick über die Probleme dieser Visualisierungen zu geben und konkrete Lösungsansätze aufzuzeigen. Dabei werden neue Visualisierungstechniken eingeführt, um den Nutzen der geführten Diskussion für die explorative Graphvisualisierung am konkreten Beispiel zu belegen.Network structures (graphs) have become a natural part of everyday life and their analysis helps to gain an understanding of their inherent structure and the real-world aspects thereby expressed. The exploration of graphs is largely supported and driven by visual means. The aim of this thesis is to give a comprehensive view on the problems associated with these visual means and to detail concrete solution approaches for them. Concrete visualization techniques are introduced to underline the value of this comprehensive discussion for supporting explorative graph visualization

Rostocker Dokumentenserver

Generation and properties of random graphs and analysis of randomized algorithms

Author: Gao Pu
Publication venue: 'University of Waterloo'
Publication date: 01/01/2009
Field of study

We study a new method of generating random

d

-regular graphs by repeatedly applying an operation called pegging. The pegging algorithm, which applies the pegging operation in each step, is a method of generating large random regular graphs beginning with small ones. We prove that the limiting joint distribution of the numbers of short cycles in the resulting graph is independent Poisson. We use the coupling method to bound the total variation distance between the joint distribution of short cycle counts and its limit and thereby show that

O(\epsilon^{-1})

is an upper bound of the \eps-mixing time. The coupling involves two different, though quite similar, Markov chains that are not time-homogeneous. We also show that the

\epsilon

-mixing time is not

o(\epsilon^{-1})

. This demonstrates that the upper bound is essentially tight. We study also the connectivity of random

d

-regular graphs generated by the pegging algorithm. We show that these graphs are asymptotically almost surely

d

-connected for any even constant

d\ge 4

. The problem of orientation of random hypergraphs is motivated by the classical load balancing problem. Let

h>w>0

be two fixed integers. Let \orH be a hypergraph whose hyperedges are uniformly of size

h

. To {\em

w

-orient} a hyperedge, we assign exactly

w

of its vertices positive signs with respect to this hyperedge, and the rest negative. A

(w,k)

-orientation of \orH consists of a

w

-orientation of all hyperedges of \orH, such that each vertex receives at most

k

positive signs from its incident hyperedges. When

k

is large enough, we determine the threshold of the existence of a

(w,k)

-orientation of a random hypergraph. The

(w,k)

-orientation of hypergraphs is strongly related to a general version of the off-line load balancing problem. The other topic we discuss is computing the probability of induced subgraphs in a random regular graph. Let

0<s<n

and

H

be a graph on

s

vertices. For any

S\subset [n]

with

|S|=s

, we compute the probability that the subgraph of

\mathcal{G}_{n,d}

induced by

S

H

. The result holds for any

d=o(n^{1/3})

and is further extended to

\mathcal{G}_{n,{\bf d}}

, the probability space of random graphs with given degree sequence

\bf d

. This result provides a basic tool for studying properties, for instance the existence or the counts, of certain types of induced subgraphs

CiteSeerX

University of Waterloo's Institutional Repository

Co-occurrence simplicial complexes in mathematics:identifying the holes of knowledge

Author: Cassese Daniele
Jones Nick S.
Lambiotte Renaud
Salnikov Vsevolod
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

In the last years complex networks tools contributed to provide insights on the structure of research, through the study of collaboration, citation and co-occurrence networks. The network approach focuses on pairwise relationships, often compressing multidimensional data structures and inevitably losing information. In this paper we propose for the first time a simplicial complex approach to word co-occurrences, providing a natural framework for the study of higher-order relations in the space of scientific knowledge. Using topological methods we explore the conceptual landscape of mathematical research, focusing on homological holes, regions with low connectivity in the simplicial structure. We find that homological holes are ubiquitous, which suggests that they capture some essential feature of research practice in mathematics. Holes die when a subset of their concepts appear in the same article, hence their death may be a sign of the creation of new knowledge, as we show with some examples. We find a positive relation between the dimension of a hole and the time it takes to be closed: larger holes may represent potential for important advances in the field because they separate conceptually distant areas. We also show that authors' conceptual entropy is positively related with their contribution to homological holes, suggesting that polymaths tend to be on the frontier of research

arXiv.org e-Print Archive

Directory of Open Access Journals

Oxford University Research Archive