5 research outputs found

    Faster 64-bit universal hashing using carry-less multiplications

    Get PDF
    Intel and AMD support the Carry-less Multiplication (CLMUL) instruction set in their x64 processors. We use CLMUL to implement an almost universal 64-bit hash family (CLHASH). We compare this new family with what might be the fastest almost universal family on x64 processors (VHASH). We find that CLHASH is at least 60% faster. We also compare CLHASH with a popular hash function designed for speed (Google's CityHash). We find that CLHASH is 40% faster than CityHash on inputs larger than 64 bytes and just as fast otherwise

    Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

    Full text link
    We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar

    Regular and almost universal hashing: an efficient implementation

    Get PDF
    Random hashing can provide guarantees regarding the performance of data structures such as hash tables---even in an adversarial setting. Many existing families of hash functions are universal: given two data objects, the probability that they have the same hash value is low given that we pick hash functions at random. However, universality fails to ensure that all hash functions are well behaved. We further require regularity: when picking data objects at random they should have a low probability of having the same hash value, for any fixed hash function. We present the efficient implementation of a family of non-cryptographic hash functions (PM+) offering good running times, good memory usage as well as distinguishing theoretical guarantees: almost universality and component-wise regularity. On a variety of platforms, our implementations are comparable to the state of the art in performance. On recent Intel processors, PM+ achieves a speed of 4.7 bytes per cycle for 32-bit outputs and 3.3 bytes per cycle for 64-bit outputs. We review vectorization through SIMD instructions (e.g., AVX2) and optimizations for superscalar execution.Comment: accepted for publication in Software: Practice and Experience in September 201

    Adaptive cuckoo filters

    Get PDF
    We introduce the adaptive cuckoo filter (ACF), a data structure for approximate set membership that extends cuckoo filters by reacting to false positives, removing them for future queries. As an example application, in packet processing queries may correspond to flow identifiers, so a search for an element is likely to be followed by repeated searches for that element. Removing false positives can therefore significantly lower the false-positive rate. The ACF, like the cuckoo filter, uses a cuckoo hash table to store fingerprints. We allow fingerprint entries to be changed in response to a false positive in a manner designed to minimize the effect on the performance of the filter. We show that the ACF is able to significantly reduce the false-positive rate by presenting both a theoretical model for the false-positive rate and simulations using both synthetic data sets and real packet trace

    A Fast Single-Key Two-Level Universal Hash Function

    Get PDF
    Universal hash functions based on univariate polynomials are well known, e.g. Poly1305 and GHASH. Using Horner’s rule to evaluate such hash functionsrequire l − 1 field multiplications for hashing a message consisting of l blocks where each block is one field element. A faster method is based on the class of Bernstein-Rabin-Winograd (BRW) polynomials which require ⌊l/2⌋ multiplications and ⌊lgl⌋ squarings for l≥3 blocks. Though this is significantly smaller than Horner’s rule based hashing, implementation of BRW polynomials for variable length messages present significant difficulties. In this work, we propose a two-level hash function where BRW polynomial based hashing is done at the lower level and Horner’s rule based hashing is done at the higher level. The BRW polynomial based hashing is applied to a fixed number of blocks and hence the difficulties in handling variable length messages is avoided. Even though the hash function has two levels, we show that it is sufficient to use a single field element as the hash key. The basic idea is instantiated to propose two new hash functions, one which hashes a single binary string and the other can hash a vector of binary strings. We describe two actual implementations, one over F2128 and the other over F2256 both using the pclmulqdq instruction available in modern Intel processors. On both the Haswell and Skylake processors, the implementation over F2128 is faster than both an implementation of GHASH by Gueron; and a highly optimised implementation, also by Gueron, of another polynomial based hash function called POLYVAL. We further show that the Fast Fourier Transform based field multiplication over F2256 proposed by Bernstein and Chou can be used to evaluate the new hash function at a cost of about at most 46 bit operations per bit of digest, but, unlike the Bernstein-Chou analysis, there is no hidden cost of generating the hash key. More generally, the new idea of building a two-level hash function having a single field element as the hash key can be applied to other finite fields to build new hash functions
    corecore