12,290 research outputs found

    More Analysis of Double Hashing for Balanced Allocations

    Full text link
    With double hashing, for a key xx, one generates two hash values f(x)f(x) and g(x)g(x), and then uses combinations (f(x)+ig(x)) mod n(f(x) +i g(x)) \bmod n for i=0,1,2,...i=0,1,2,... to generate multiple hash values in the range [0,n−1][0,n-1] from the initial two. For balanced allocations, keys are hashed into a hash table where each bucket can hold multiple keys, and each key is placed in the least loaded of dd choices. It has been shown previously that asymptotically the performance of double hashing and fully random hashing is the same in the balanced allocation paradigm using fluid limit methods. Here we extend a coupling argument used by Lueker and Molodowitch to show that double hashing and ideal uniform hashing are asymptotically equivalent in the setting of open address hash tables to the balanced allocation setting, providing further insight into this phenomenon. We also discuss the potential for and bottlenecks limiting the use this approach for other multiple choice hashing schemes.Comment: 13 pages ; current draft ; will be submitted to conference shortl

    Fast and Powerful Hashing using Tabulation

    Get PDF
    Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of cc characters and we have precomputed character tables h1,...,hch_1,...,h_c mapping characters to random hash values. A key x=(x1,...,xc)x=(x_1,...,x_c) is hashed to h1[x1]⊕h2[x2].....⊕hc[xc]h_1[x_1] \oplus h_2[x_2].....\oplus h_c[x_c]. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. Next we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing. This also yields an extremely fast pseudo-random number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully-random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. While these tabulation schemes are all easy to implement and use, their analysis is not

    The Stability and the Security of the Tangle

    Get PDF
    In this paper we study the stability and the security of the distributed data structure at the base of the IOTA protocol, called the Tangle. The contribution of this paper is twofold. First, we present a simple model to analyze the Tangle and give the first discrete time formal analyzes of the average number of unconfirmed transactions and the average confirmation time of a transaction. Then, we define the notion of assiduous honest majority that captures the fact that the honest nodes have more hashing power than the adversarial nodes and that all this hashing power is constantly used to create transactions. This notion is important because we prove that it is a necessary assumption to protect the Tangle against double-spending attacks, and this is true for any tip selection algorithm (which is a fundamental building block of the protocol) that verifies some reasonable assumptions. In particular, the same is true with the Markov Chain Monte Carlo selection tip algorithm currently used in the IOTA protocol. Our work shows that either all the honest nodes must constantly use all their hashing power to validate the main chain (similarly to the Bitcoin protocol) or some kind of authority must be provided to avoid this kind of attack (like in the current version of the IOTA where a coordinator is used). The work presented here constitute a theoretical analysis and cannot be used to attack the current IOTA implementation. The goal of this paper is to present a formalization of the protocol and, as a starting point, to prove that some assumptions are necessary in order to defend the system again double-spending attacks. We hope that it will be used to improve the current protocol with a more formal approach

    Double Hashing Thresholds via Local Weak Convergence

    Get PDF
    International audienceA lot of interest has recently arisen in the analysis of multiple-choice "cuckoo hashing" schemes. In this context, a main performance criterion is the load threshold under which the hashing scheme is able to build a valid hashtable with high probability in the limit of large systems; various techniques have successfully been used to answer this question (differential equations, combinatorics, cavity method) for increasing levels of generality of the model. However, the hashing scheme analysed so far is quite utopic in that it requires to generate a lot of independent, fully random choices. Schemes with reduced randomness exists, such as "double hashing", which is expected to provide similar asymptotic results as the ideal scheme, yet they have been more resistant to analysis so far. In this paper, we point out that the approach via the cavity method extends quite naturally to the analysis of double hashing and allows to compute the corresponding threshold. The path followed is to show that the graph induced by the double hashing scheme has the same local weak limit as the one obtained with full randomness

    Balanced Allocations and Double Hashing

    Full text link
    Double hashing has recently found more common usage in schemes that use multiple hash functions. In double hashing, for an item xx, one generates two hash values f(x)f(x) and g(x)g(x), and then uses combinations (f(x)+kg(x)) mod n(f(x) +k g(x)) \bmod n for k=0,1,2,...k=0,1,2,... to generate multiple hash values from the initial two. We first perform an empirical study showing that, surprisingly, the performance difference between double hashing and fully random hashing appears negligible in the standard balanced allocation paradigm, where each item is placed in the least loaded of dd choices, as well as several related variants. We then provide theoretical results that explain the behavior of double hashing in this context.Comment: Further updated, small improvements/typos fixe

    Fast hashing with Strong Concentration Bounds

    Full text link
    Previous work on tabulation hashing by Patrascu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a given bin, but under some quite severe restrictions on the expected values of these sums. The basic idea in tabulation hashing is to view a key as consisting of c=O(1)c=O(1) characters, e.g., a 64-bit key as c=8c=8 characters of 8-bits. The character domain Σ\Sigma should be small enough that character tables of size ∣Σ∣|\Sigma| fit in fast cache. The schemes then use O(1)O(1) tables of this size, so the space of tabulation hashing is O(∣Σ∣)O(|\Sigma|). However, the concentration bounds by Patrascu and Thorup only apply if the expected sums are ≪∣Σ∣\ll |\Sigma|. To see the problem, consider the very simple case where we use tabulation hashing to throw nn balls into mm bins and want to analyse the number of balls in a given bin. With their concentration bounds, we are fine if n=mn=m, for then the expected value is 11. However, if m=2m=2, as when tossing nn unbiased coins, the expected value n/2n/2 is ≫∣Σ∣\gg |\Sigma| for large data sets, e.g., data sets that do not fit in fast cache. To handle expectations that go beyond the limits of our small space, we need a much more advanced analysis of simple tabulation, plus a new tabulation technique that we call \emph{tabulation-permutation} hashing which is at most twice as slow as simple tabulation. No other hashing scheme of comparable speed offers similar Chernoff-style concentration bounds.Comment: 54 pages, 3 figures. An extended abstract appeared at the 52nd Annual ACM Symposium on Theory of Computing (STOC20
    • …
    corecore