31 research outputs found

    A More Reliable Greedy Heuristic for Maximum Matchings in Sparse Random Graphs

    Full text link
    We propose a new greedy algorithm for the maximum cardinality matching problem. We give experimental evidence that this algorithm is likely to find a maximum matching in random graphs with constant expected degree c>0, independent of the value of c. This is contrary to the behavior of commonly used greedy matching heuristics which are known to have some range of c where they probably fail to compute a maximum matching

    Load thresholds for cuckoo hashing with overlapping blocks

    Get PDF
    Dietzfelbinger and Weidling [DW07] proposed a natural variation of cuckoo hashing where each of cncn objects is assigned k=2k = 2 intervals of size ℓ\ell in a linear (or cyclic) hash table of size nn and both start points are chosen independently and uniformly at random. Each object must be placed into a table cell within its intervals, but each cell can only hold one object. Experiments suggested that this scheme outperforms the variant with blocks in which intervals are aligned at multiples of ℓ\ell. In particular, the load threshold is higher, i.e. the load cc that can be achieved with high probability. For instance, Lehman and Panigrahy [LP09] empirically observed the threshold for ℓ=2\ell = 2 to be around 96.5%96.5\% as compared to roughly 89.7%89.7\% using blocks. They managed to pin down the asymptotics of the thresholds for large ℓ\ell, but the precise values resisted rigorous analysis. We establish a method to determine these load thresholds for all ℓ≥2\ell \geq 2, and, in fact, for general k≥2k \geq 2. For instance, for k=ℓ=2k = \ell = 2 we get ≈96.4995%\approx 96.4995\%. The key tool we employ is an insightful and general theorem due to Leconte, Lelarge, and Massouli\'e [LLM13], which adapts methods from statistical physics to the world of hypergraph orientability. In effect, the orientability thresholds for our graph families are determined by belief propagation equations for certain graph limits. As a side note we provide experimental evidence suggesting that placements can be constructed in linear time with loads close to the threshold using an adapted version of an algorithm by Khosla [Kho13]

    On randomness in Hash functions

    Get PDF
    In the talk, we shall discuss quality measures for hash functions used in data structures and algorithms, and survey positive and negative results. (This talk is not about cryptographic hash functions.) For the analysis of algorithms involving hash functions, it is often convenient to assume the hash functions used behave fully randomly; in some cases there is no analysis known that avoids this assumption. In practice, one needs to get by with weaker hash functions that can be generated by randomized algorithms. A well-studied range of applications concern realizations of dynamic dictionaries (linear probing, chained hashing, dynamic perfect hashing, cuckoo hashing and its generalizations) or Bloom filters and their variants. A particularly successful and useful means of classification are Carter and Wegman's universal or k-wise independent classes, introduced in 1977. A natural and widely used approach to analyzing an algorithm involving hash functions is to show that it works if a sufficiently strong universal class of hash functions is used, and to substitute one of the known constructions of such classes. This invites research into the question of just how much independence in the hash functions is necessary for an algorithm to work. Some recent analyses that gave impossibility results constructed rather artificial classes that would not work; other results pointed out natural, widely used hash classes that would not work in a particular application. Only recently it was shown that under certain assumptions on some entropy present in the set of keys even 2-wise independent hash classes will lead to strong randomness properties in the hash values. The negative results show that these results may not be taken as justification for using weak hash classes indiscriminately, in particular for key sets with structure. When stronger independence properties are needed for a theoretical analysis, one may resort to classic constructions. Only in 2003 it was found out how full randomness can be simulated using only linear space overhead (which is optimal). The "split-and-share" approach can be used to justify the full randomness assumption in some situations in which full randomness is needed for the analysis to go through, like in many applications involving multiple hash functions (e.g., generalized versions of cuckoo hashing with multiple hash functions or larger bucket sizes, load balancing, Bloom filters and variants, or minimal perfect hash function constructions). For practice, efficiency considerations beyond constant factors are important. It is not hard to construct very efficient 2-wise independent classes. Using k-wise independent classes for constant k bigger than 3 has become feasible in practice only by new constructions involving tabulation. This goes together well with the quite new result that linear probing works with 5-independent hash functions. Recent developments suggest that the classification of hash function constructions by their degree of independence alone may not be adequate in some cases. Thus, one may want to analyze the behavior of specific hash classes in specific applications, circumventing the concept of k-wise independence. Several such results were recently achieved concerning hash functions that utilize tabulation. In particular if the analysis of the application involves using randomness properties in graphs and hypergraphs (generalized cuckoo hashing, also in the version with a "stash", or load balancing), a hash class combining k-wise independence with tabulation has turned out to be very powerful

    Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold

    Get PDF

    Orientability thresholds for random hypergraphs

    Full text link
    Let h>w>0h>w>0 be two fixed integers. Let \orH be a random hypergraph whose hyperedges are all of cardinality hh. To {\em ww-orient} a hyperedge, we assign exactly ww of its vertices positive signs with respect to the hyperedge, and the rest negative. A (w,k)(w,k)-orientation of \orH consists of a ww-orientation of all hyperedges of \orH, such that each vertex receives at most kk positive signs from its incident hyperedges. When kk is large enough, we determine the threshold of the existence of a (w,k)(w,k)-orientation of a random hypergraph. The (w,k)(w,k)-orientation of hypergraphs is strongly related to a general version of the off-line load balancing problem. The graph case, when h=2h=2 and w=1w=1, was solved recently by Cain, Sanders and Wormald and independently by Fernholz and Ramachandran, which settled a conjecture of Karp and Saks.Comment: 47 pages, 1 figures, the journal version of [16

    Succinct Data Structures for Retrieval and Approximate Membership

    Get PDF
    The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U ->{0,1}^r that has specified values on the elements of a given set S, a subset of U, |S|=n, but may have any value on elements outside S. Minimal perfect hashing makes it possible to avoid storing the set S, but this induces a space overhead of Theta(n) bits in addition to the nr bits needed for function values. In this paper we show how to eliminate this overhead. Moreover, we show that for any k query time O(k) can be achieved using space that is within a factor 1+e^{-k} of optimal, asymptotically for large n. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. The time to construct the data structure is O(n), expected. A main technical ingredient is to utilize existing tight bounds on the probability of almost square random matrices with rows of low weight to have full row rank. In addition to direct constructions, we point out a close connection between retrieval structures and hash tables where keys are stored in an array and some kind of probing scheme is used. Further, we propose a general reduction that transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Again, we show how to eliminate the space overhead present in previously known methods, and get arbitrarily close to the lower bound. The evaluation procedures of our data structures are extremely simple (similar to a Bloom filter). For the results stated above we assume free access to fully random hash functions. However, we show how to justify this assumption using extra space o(n) to simulate full randomness on a RAM

    On the Insertion Time of Cuckoo Hashing

    Full text link
    Cuckoo hashing is an efficient technique for creating large hash tables with high space utilization and guaranteed constant access times. There, each item can be placed in a location given by any one out of k different hash functions. In this paper we investigate further the random walk heuristic for inserting in an online fashion new items into the hash table. Provided that k > 2 and that the number of items in the table is below (but arbitrarily close) to the theoretically achievable load threshold, we show a polylogarithmic bound for the maximum insertion time that holds with high probability.Comment: 27 pages, final version accepted by the SIAM Journal on Computin
    corecore