19 research outputs found
The Multiple-orientability Thresholds for Random Hypergraphs
A -uniform hypergraph is called -orientable, if there
is an assignment of each edge to one of its vertices such
that no vertex is assigned more than edges. Let be a
hypergraph, drawn uniformly at random from the set of all -uniform
hypergraphs with vertices and edges. In this paper we establish the
threshold for the -orientability of for all and
, i.e., we determine a critical quantity such that
with probability the graph has an -orientation if
.
Our result has various applications including sharp load thresholds for
cuckoo hashing, load balancing with guaranteed maximum load, and massive
parallel access to hard disk arrays.Comment: An extended abstract appeared in the proceedings of SODA 201
The Multiple-Orientability Thresholds for Random Hypergraphs
A k-uniform hypergraph H = (V, E) is called l-orientable if there is an assignment of each edge e is an element of E to one of its vertices v is an element of e such that no vertex is assigned more than l edges. Let H-n,H-m,H-k be a hypergraph, drawn uniformly at random from the set of all k-uniform hypergraphs with n vertices and m edges. In this paper we establish the threshold for the l-orientability of H-n,H-m,H-k for all k >= 3 and l >= 2, that is, we determine a critical quantity c(*)k,l such that with probability 1-o(1) the graph H-n,H-cn,(k) has an l-orientation if c c(k,l)(*) . Our result has various applications, including sharp load thresholds for cuckoo hashing, load balancing with guaranteed maximum load, and massive parallel access to hard disk arrays
Multiple choice allocations with small maximum loads
The idea of using multiple choices to improve allocation schemes is now well understood and is often illustrated by the following example. Suppose balls are allocated to bins with each ball choosing a bin independently and uniformly at random. The \emph{maximum load}, or the number of balls in the most loaded bin, will then be approximately with high probability. Suppose now the balls are allocated sequentially by placing a ball in the least loaded bin among the bins chosen independently and uniformly at random. Azar, Broder,
Karlin, and Upfal showed that in this scenario, the maximum load drops to , with high probability, which is an exponential improvement over the previous case.
In this thesis we investigate multiple choice allocations from a slightly different perspective. Instead of minimizing the maximum load, we fix the bin capacities and focus on maximizing the number of balls that can be allocated without overloading any bin. In the process that we consider we have balls and bins. Each ball chooses bins independently and uniformly at random. \emph{Is it possible to assign each ball to one of its choices such that the no bin receives more than balls?} For all and we give a critical value, , such that when this is not the case.
In case such an allocation exists, \emph{how quickly can we find it?} Previous work on total allocation time for case and has analyzed a \emph{breadth first strategy} which is shown to be linear only in expectation. We give a simple and efficient algorithm which we also call \emph{local search allocation}(LSA) to find an allocation for all and . Provided the number of balls are below (but arbitrarily close to) the theoretical achievable load threshold, we give a \emph{linear} bound for the total allocation time that holds with high probability.
We demonstrate, through simulations, an order of magnitude improvement for total and maximum allocation times when compared to the state of the art method.
Our results find applications in many areas including hashing, load balancing, data management, orientability of random hypergraphs and maximum matchings in a special class of bipartite graphs.Die Idee, mehrere Wahlmöglichkeiten zu benutzen, um Zuordnungsschemas zu verbessern, ist mittlerweile gut verstanden und wird oft mit Hilfe des folgenden Beispiels illustriert: Man nehme an, dass n Kugeln auf n BehĂ€lter verteilt werden und jede Kugel unabhĂ€ngig und gleichverteilt per Zufall ihren BehĂ€lter wĂ€hlt. Die maximale Auslastung, bzw. die Anzahl an Kugeln im meist befĂŒllten BehĂ€lter, wird dann mit hoher Wahrscheinlichkeit schĂ€tzungsweise sein. Alternativ können die Kugeln sequenziell zugeordnet werden, indem jede Kugel k â„ 2 BehĂ€lter unabhĂ€ngig und gleichverteilt zufĂ€llig auswĂ€hlt und in dem am wenigsten befĂŒllten dieser k BehĂ€lter platziert wird. Azar, Broder, Karlin, and Upfal haben gezeigt, dass in diesem Szenario die maximale Auslastung mit hoher Wahrscheinlichkeit auf sinkt, was eine exponentielle Verbesserung des vorhergehenden Falls darstellt.
In dieser Doktorarbeit untersuchen wir solche Zuteilungschemas von einem etwas anderen Standpunkt. Statt die maximale Last zu minimieren, ïŹxieren wir die KapazitĂ€ten der BehĂ€lter und konzentrieren uns auf die Maximierung der Anzahl der Kugeln, die ohne Ăberlastung eines BehĂ€lters zugeteilt werden können. In dem von uns betrachteten Prozess haben wir m = bcnc Kugeln und n BehĂ€lter. Jede Kugel wĂ€hlt unabhĂ€ngig und gleichverteilt zufĂ€llig k BehĂ€lter. Ist es möglich, jeder Kugel einen BehĂ€lter ihrer Wahl zuzuordnen, so dass kein BehĂ€lter mehr als Kugeln erhĂ€lt? FĂŒr alle k â„ 3 und â„ 2 geben wir einen kritischen Wert , an sodass fĂŒr c c {k,\ell}^*\ell = 1\ell = 1$ ïŹndet. Sofern die Anzahl der Kugeln unter (aber beliebig nahe an) der theoretisch erreichbaren Lastschwelle ist, zeigen wir eine lineare Schranke fĂŒr die Gesamtzuordnungszeit, die mit hoher Wahrscheinlichkeit gilt. Anhand von Simulationen demonstrieren wir eine Verbesserung der Gesamt- und Maximalzuordnungszeiten um eine GröĂenordnung im Vergleich zu anderen aktuellen Methoden.
Unsere Ergebnisse ïŹnden Anwendung in vielen Bereichen einschlieĂlich Hashing, Lastbalancierung, Datenmanagement, Orientierbarkeit von zufĂ€lligen Hypergraphen und maximale Paarungen in einer speziellen Klasse von bipartiten Graphen
On randomness in Hash functions
In the talk, we shall discuss quality measures for hash functions used in data structures and algorithms, and survey positive and negative results. (This talk is not about cryptographic hash functions.) For the analysis of algorithms involving hash functions, it is often convenient to assume the hash functions used behave fully randomly; in some cases there is no analysis known that avoids this assumption. In practice, one needs to get by with weaker hash functions that can be generated by randomized algorithms. A well-studied range of applications concern realizations of dynamic dictionaries (linear probing, chained hashing, dynamic perfect hashing, cuckoo hashing and its generalizations) or Bloom filters and their variants. A particularly successful and useful means of classification are Carter and Wegman's universal or k-wise independent classes, introduced in 1977. A natural and widely used approach to analyzing an algorithm involving hash functions is to show that it works if a sufficiently strong universal class of hash functions is used, and to substitute one of the known constructions of such classes. This invites research into the question of just how much independence in the hash functions is necessary for an algorithm to work. Some recent analyses that gave impossibility results constructed rather artificial classes that would not work; other results pointed out natural, widely used hash classes that would not work in a particular application. Only recently it was shown that under certain assumptions on some entropy present in the set of keys even 2-wise independent hash classes will lead to strong randomness properties in the hash values. The negative results show that these results may not be taken as justification for using weak hash classes indiscriminately, in particular for key sets with structure. When stronger independence properties are needed for a theoretical analysis, one may resort to classic constructions. Only in 2003 it was found out how full randomness can be simulated using only linear space overhead (which is optimal). The "split-and-share" approach can be used to justify the full randomness assumption in some situations in which full randomness is needed for the analysis to go through, like in many applications involving multiple hash functions (e.g., generalized versions of cuckoo hashing with multiple hash functions or larger bucket sizes, load balancing, Bloom filters and variants, or minimal perfect hash function constructions). For practice, efficiency considerations beyond constant factors are important. It is not hard to construct very efficient 2-wise independent classes. Using k-wise independent classes for constant k bigger than 3 has become feasible in practice only by new constructions involving tabulation. This goes together well with the quite new result that linear probing works with 5-independent hash functions. Recent developments suggest that the classification of hash function constructions by their degree of independence alone may not be adequate in some cases. Thus, one may want to analyze the behavior of specific hash classes in specific applications, circumventing the concept of k-wise independence. Several such results were recently achieved concerning hash functions that utilize tabulation. In particular if the analysis of the application involves using randomness properties in graphs and hypergraphs (generalized cuckoo hashing, also in the version with a "stash", or load balancing), a hash class combining k-wise independence with tabulation has turned out to be very powerful
Thresholds for Extreme Orientability
Multiple-choice load balancing has been a topic of intense study since the
seminal paper of Azar, Broder, Karlin, and Upfal. Questions in this area can be
phrased in terms of orientations of a graph, or more generally a k-uniform
random hypergraph. A (d,b)-orientation is an assignment of each edge to d of
its vertices, such that no vertex has more than b edges assigned to it.
Conditions for the existence of such orientations have been completely
documented except for the "extreme" case of (k-1,1)-orientations. We consider
this remaining case, and establish:
- The density threshold below which an orientation exists with high
probability, and above which it does not exist with high probability.
- An algorithm for finding an orientation that runs in linear time with high
probability, with explicit polynomial bounds on the failure probability.
Previously, the only known algorithms for constructing (k-1,1)-orientations
worked for k<=3, and were only shown to have expected linear running time.Comment: Corrected description of relationship to the work of LeLarg
Load thresholds for cuckoo hashing with overlapping blocks
Dietzfelbinger and Weidling [DW07] proposed a natural variation of cuckoo
hashing where each of objects is assigned intervals of size
in a linear (or cyclic) hash table of size and both start points are chosen
independently and uniformly at random. Each object must be placed into a table
cell within its intervals, but each cell can only hold one object. Experiments
suggested that this scheme outperforms the variant with blocks in which
intervals are aligned at multiples of . In particular, the load threshold
is higher, i.e. the load that can be achieved with high probability. For
instance, Lehman and Panigrahy [LP09] empirically observed the threshold for
to be around as compared to roughly using blocks.
They managed to pin down the asymptotics of the thresholds for large ,
but the precise values resisted rigorous analysis.
We establish a method to determine these load thresholds for all , and, in fact, for general . For instance, for we
get . The key tool we employ is an insightful and general
theorem due to Leconte, Lelarge, and Massouli\'e [LLM13], which adapts methods
from statistical physics to the world of hypergraph orientability. In effect,
the orientability thresholds for our graph families are determined by belief
propagation equations for certain graph limits. As a side note we provide
experimental evidence suggesting that placements can be constructed in linear
time with loads close to the threshold using an adapted version of an algorithm
by Khosla [Kho13]
Load thresholds for cuckoo hashing with double hashing
In k-ary cuckoo hashing, each of cn objects is associated with k random buckets in a hash table of size n. An l-orientation is an assignment of objects to associated buckets such that each bucket receives at most l objects. Several works have determined load thresholds c^* = c^*(k,l) for k-ary cuckoo hashing; that is, for c c^* no l-orientation exists with high probability.
A natural variant of k-ary cuckoo hashing utilizes double hashing, where, when the buckets are numbered 0,1,...,n-1, the k choices of random buckets form an arithmetic progression modulo n. Double hashing simplifies implementation and requires less randomness, and it has been shown that double hashing has the same behavior as fully random hashing in several other data structures that similarly use multiple hashes for each object. Interestingly, previous work has come close to but has not fully shown that the load threshold for k-ary cuckoo hashing is the same when using double hashing as when using fully random hashing. Specifically, previous work has shown that the thresholds for both settings coincide, except that for double hashing it was possible that o(n) objects would have been left unplaced. Here we close this open question by showing the thresholds are indeed the same, by providing a combinatorial argument that reconciles this stubborn difference
The densest subgraph problem in sparse random graphs
We determine the asymptotic behavior of the maximum subgraph density of large
random graphs with a prescribed degree sequence. The result applies in
particular to the Erd\H{o}s-R\'{e}nyi model, where it settles a conjecture of
Hajek [IEEE Trans. Inform. Theory 36 (1990) 1398-1414]. Our proof consists in
extending the notion of balanced loads from finite graphs to their local weak
limits, using unimodularity. This is a new illustration of the objective method
described by Aldous and Steele [In Probability on Discrete Structures (2004)
1-72 Springer].Comment: Published at http://dx.doi.org/10.1214/14-AAP1091 in the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
On the Insertion Time of Cuckoo Hashing
Cuckoo hashing is an efficient technique for creating large hash tables with
high space utilization and guaranteed constant access times. There, each item
can be placed in a location given by any one out of k different hash functions.
In this paper we investigate further the random walk heuristic for inserting in
an online fashion new items into the hash table. Provided that k > 2 and that
the number of items in the table is below (but arbitrarily close) to the
theoretically achievable load threshold, we show a polylogarithmic bound for
the maximum insertion time that holds with high probability.Comment: 27 pages, final version accepted by the SIAM Journal on Computin