5 research outputs found
Faster 64-bit universal hashing using carry-less multiplications
Intel and AMD support the Carry-less Multiplication (CLMUL) instruction set
in their x64 processors. We use CLMUL to implement an almost universal 64-bit
hash family (CLHASH). We compare this new family with what might be the fastest
almost universal family on x64 processors (VHASH). We find that CLHASH is at
least 60% faster. We also compare CLHASH with a popular hash function designed
for speed (Google's CityHash). We find that CLHASH is 40% faster than CityHash
on inputs larger than 64 bytes and just as fast otherwise
Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries
We present the first thorough practical study of the Lempel-Ziv-78 and the
Lempel-Ziv-Welch computation based on trie data structures. With a careful
selection of trie representations we can beat well-tuned popular trie data
structures like Judy, m-Bonsai or Cedar
Regular and almost universal hashing: an efficient implementation
Random hashing can provide guarantees regarding the performance of data
structures such as hash tables---even in an adversarial setting. Many existing
families of hash functions are universal: given two data objects, the
probability that they have the same hash value is low given that we pick hash
functions at random. However, universality fails to ensure that all hash
functions are well behaved. We further require regularity: when picking data
objects at random they should have a low probability of having the same hash
value, for any fixed hash function. We present the efficient implementation of
a family of non-cryptographic hash functions (PM+) offering good running times,
good memory usage as well as distinguishing theoretical guarantees: almost
universality and component-wise regularity. On a variety of platforms, our
implementations are comparable to the state of the art in performance. On
recent Intel processors, PM+ achieves a speed of 4.7 bytes per cycle for 32-bit
outputs and 3.3 bytes per cycle for 64-bit outputs. We review vectorization
through SIMD instructions (e.g., AVX2) and optimizations for superscalar
execution.Comment: accepted for publication in Software: Practice and Experience in
September 201
Adaptive cuckoo filters
We introduce the adaptive cuckoo filter (ACF), a data structure for approximate set membership that extends
cuckoo filters by reacting to false positives, removing them for future queries. As an example application,
in packet processing queries may correspond to flow identifiers, so a search for an element is likely to be
followed by repeated searches for that element. Removing false positives can therefore significantly lower
the false-positive rate. The ACF, like the cuckoo filter, uses a cuckoo hash table to store fingerprints. We allow
fingerprint entries to be changed in response to a false positive in a manner designed to minimize the effect
on the performance of the filter. We show that the ACF is able to significantly reduce the false-positive rate
by presenting both a theoretical model for the false-positive rate and simulations using both synthetic data
sets and real packet trace
A Fast Single-Key Two-Level Universal Hash Function
Universal hash functions based on univariate polynomials are well known, e.g. Poly1305 and GHASH. Using Horner’s rule to evaluate such hash functionsrequire l − 1 field multiplications for hashing a message consisting of l blocks where each block is one field element. A faster method is based on the class of Bernstein-Rabin-Winograd (BRW) polynomials which require ⌊l/2⌋ multiplications and ⌊lgl⌋ squarings for l≥3 blocks. Though this is significantly smaller than Horner’s rule based hashing, implementation of BRW polynomials for variable length messages present significant difficulties. In this work, we propose a two-level hash function where BRW polynomial based hashing is done at the lower level and Horner’s rule based hashing is done at the higher level. The BRW polynomial based hashing is applied to a fixed number of blocks and hence the difficulties in handling variable length messages is avoided. Even though the hash function has two levels, we show that it is sufficient to use a single field element as the hash key. The basic idea is instantiated to propose two new hash functions, one which hashes a single binary string and the other can hash a vector of binary strings. We describe two actual implementations, one over F2128 and the other over F2256 both using the pclmulqdq instruction available in modern Intel processors. On both the Haswell and Skylake processors, the implementation over F2128 is faster than both an implementation of GHASH by Gueron; and a highly optimised implementation, also by Gueron, of another polynomial based hash function called POLYVAL. We further show that the Fast Fourier Transform based field multiplication over F2256 proposed by Bernstein and Chou can be used to evaluate the new hash function at a cost of about at most 46 bit operations per bit of digest, but, unlike the Bernstein-Chou analysis, there is no hidden cost of generating the hash key. More generally, the new idea of building a two-level hash function having a single field element as the hash key can be applied to other finite fields to build new hash functions