75 research outputs found

    Tight Thresholds for Cuckoo Hashing via XORSAT

    Full text link
    We settle the question of tight thresholds for offline cuckoo hashing. The problem can be stated as follows: we have n keys to be hashed into m buckets each capable of holding a single key. Each key has k >= 3 (distinct) associated buckets chosen uniformly at random and independently of the choices of other keys. A hash table can be constructed successfully if each key can be placed into one of its buckets. We seek thresholds alpha_k such that, as n goes to infinity, if n/m <= alpha for some alpha < alpha_k then a hash table can be constructed successfully with high probability, and if n/m >= alpha for some alpha > alpha_k a hash table cannot be constructed successfully with high probability. Here we are considering the offline version of the problem, where all keys and hash values are given, so the problem is equivalent to previous models of multiple-choice hashing. We find the thresholds for all values of k > 2 by showing that they are in fact the same as the previously known thresholds for the random k-XORSAT problem. We then extend these results to the setting where keys can have differing number of choices, and provide evidence in the form of an algorithm for a conjecture extending this result to cuckoo hash tables that store multiple keys in a bucket.Comment: Revision 3 contains missing details of proofs, as appendix

    Towards Optimal Degree-distributions for Left-perfect Matchings in Random Bipartite Graphs

    Full text link
    Consider a random bipartite multigraph GG with nn left nodes and m≥n≥2m \geq n \geq 2 right nodes. Each left node xx has dx≥1d_x \geq 1 random right neighbors. The average left degree Δ\Delta is fixed, Δ≥2\Delta \geq 2. We ask whether for the probability that GG has a left-perfect matching it is advantageous not to fix dxd_x for each left node xx but rather choose it at random according to some (cleverly chosen) distribution. We show the following, provided that the degrees of the left nodes are independent: If Δ\Delta is an integer then it is optimal to use a fixed degree of Δ\Delta for all left nodes. If Δ\Delta is non-integral then an optimal degree-distribution has the property that each left node xx has two possible degrees, \floor{\Delta} and \ceil{\Delta}, with probability pxp_x and 1−px1-p_x, respectively, where pxp_x is from the closed interval [0,1][0,1] and the average over all pxp_x equals \ceil{\Delta}-\Delta. Furthermore, if n=c⋅mn=c\cdot m and Δ>2\Delta>2 is constant, then each distribution of the left degrees that meets the conditions above determines the same threshold c∗(Δ)c^*(\Delta) that has the following property as nn goes to infinity: If c<c∗(Δ)c<c^*(\Delta) then there exists a left-perfect matching with high probability. If c>c∗(Δ)c>c^*(\Delta) then there exists no left-perfect matching with high probability. The threshold c∗(Δ)c^*(\Delta) is the same as the known threshold for offline kk-ary cuckoo hashing for integral or non-integral k=Δk=\Delta

    Fast Scalable Construction of (Minimal Perfect Hash) Functions

    Full text link
    Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03

    The satisfiability threshold for random linear equations

    Full text link
    Let AA be a random m×nm\times n matrix over the finite field FqF_q with precisely kk non-zero entries per row and let y∈Fqmy\in F_q^m be a random vector chosen independently of AA. We identify the threshold m/nm/n up to which the linear system Ax=yA x=y has a solution with high probability and analyse the geometry of the set of solutions. In the special case q=2q=2, known as the random kk-XORSAT problem, the threshold was determined by [Dubois and Mandler 2002, Dietzfelbinger et al. 2010, Pittel and Sorkin 2016], and the proof technique was subsequently extended to the cases q=3,4q=3,4 [Falke and Goerdt 2012]. But the argument depends on technically demanding second moment calculations that do not generalise to q>3q>3. Here we approach the problem from the viewpoint of a decoding task, which leads to a transparent combinatorial proof

    The Satisfiability Threshold for k-XORSAT

    Get PDF
    We consider "unconstrained" random kk-XORSAT, which is a uniformly random system of mm linear non-homogeneous equations in F2\mathbb{F}_2 over nn variables, each equation containing k≥3k \geq 3 variables, and also consider a "constrained" model where every variable appears in at least two equations. Dubois and Mandler proved that m/n=1m/n=1 is a sharp threshold for satisfiability of constrained 3-XORSAT, and analyzed the 2-core of a random 3-uniform hypergraph to extend this result to find the threshold for unconstrained 3-XORSAT. We show that m/n=1m/n=1 remains a sharp threshold for satisfiability of constrained kk-XORSAT for every k≥3k\ge 3, and we use standard results on the 2-core of a random kk-uniform hypergraph to extend this result to find the threshold for unconstrained kk-XORSAT. For constrained kk-XORSAT we narrow the phase transition window, showing that m−n→−∞m-n \to -\infty implies almost-sure satisfiability, while m−n→+∞m-n \to +\infty implies almost-sure unsatisfiability.Comment: Version 2 adds sharper phase transition result, new citation in literature survey, and improvements in presentation; removes Appendix treating k=

    Dense peelable random uniform hypergraphs

    Get PDF
    We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled "from the outside in". The density thresholds f_k for peelability of our hypergraphs (f_3 ~ 0.918, f_4 ~ 0.977, f_5 ~ 0.992, ...) are well beyond the corresponding thresholds (c_3 ~ 0.818, c_4 ~ 0.772, c_5 ~ 0.702, ...) of standard k-uniform random hypergraphs. To get a grip on f_k, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on [0,1]^Z and f_k can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods. Random hypergraphs underlie the construction of various data structures based on hashing, for instance invertible Bloom filters, perfect hash functions, retrieval data structures, error correcting codes and cuckoo hash tables, where inputs are mapped to edges using hash functions. Frequently, the data structures rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. Memory efficiency is closely tied to edge density while worst and average case query times are tied to maximum and average edge size. To demonstrate the usefulness of our construction, we used our 3-uniform hypergraphs as a drop-in replacement for the standard 3-uniform hypergraphs in a retrieval data structure by Botelho et al. [Fabiano Cupertino Botelho et al., 2013]. This reduces memory usage from 1.23m bits to 1.12m bits (m being the input size) with almost no change in running time. Using k > 3 attains, at small sacrifices in running time, further improvements to memory usage

    On randomness in Hash functions

    Get PDF
    In the talk, we shall discuss quality measures for hash functions used in data structures and algorithms, and survey positive and negative results. (This talk is not about cryptographic hash functions.) For the analysis of algorithms involving hash functions, it is often convenient to assume the hash functions used behave fully randomly; in some cases there is no analysis known that avoids this assumption. In practice, one needs to get by with weaker hash functions that can be generated by randomized algorithms. A well-studied range of applications concern realizations of dynamic dictionaries (linear probing, chained hashing, dynamic perfect hashing, cuckoo hashing and its generalizations) or Bloom filters and their variants. A particularly successful and useful means of classification are Carter and Wegman's universal or k-wise independent classes, introduced in 1977. A natural and widely used approach to analyzing an algorithm involving hash functions is to show that it works if a sufficiently strong universal class of hash functions is used, and to substitute one of the known constructions of such classes. This invites research into the question of just how much independence in the hash functions is necessary for an algorithm to work. Some recent analyses that gave impossibility results constructed rather artificial classes that would not work; other results pointed out natural, widely used hash classes that would not work in a particular application. Only recently it was shown that under certain assumptions on some entropy present in the set of keys even 2-wise independent hash classes will lead to strong randomness properties in the hash values. The negative results show that these results may not be taken as justification for using weak hash classes indiscriminately, in particular for key sets with structure. When stronger independence properties are needed for a theoretical analysis, one may resort to classic constructions. Only in 2003 it was found out how full randomness can be simulated using only linear space overhead (which is optimal). The "split-and-share" approach can be used to justify the full randomness assumption in some situations in which full randomness is needed for the analysis to go through, like in many applications involving multiple hash functions (e.g., generalized versions of cuckoo hashing with multiple hash functions or larger bucket sizes, load balancing, Bloom filters and variants, or minimal perfect hash function constructions). For practice, efficiency considerations beyond constant factors are important. It is not hard to construct very efficient 2-wise independent classes. Using k-wise independent classes for constant k bigger than 3 has become feasible in practice only by new constructions involving tabulation. This goes together well with the quite new result that linear probing works with 5-independent hash functions. Recent developments suggest that the classification of hash function constructions by their degree of independence alone may not be adequate in some cases. Thus, one may want to analyze the behavior of specific hash classes in specific applications, circumventing the concept of k-wise independence. Several such results were recently achieved concerning hash functions that utilize tabulation. In particular if the analysis of the application involves using randomness properties in graphs and hypergraphs (generalized cuckoo hashing, also in the version with a "stash", or load balancing), a hash class combining k-wise independence with tabulation has turned out to be very powerful
    • …
    corecore