Search CORE

8,072 research outputs found

Regular and almost universal hashing: an efficient implementation

Author: Ignatchenko Sergey
Ivanchykhin Dmytro
Lemire Daniel
Publication venue: 'Wiley'
Publication date: 18/10/2016
Field of study

Random hashing can provide guarantees regarding the performance of data structures such as hash tables---even in an adversarial setting. Many existing families of hash functions are universal: given two data objects, the probability that they have the same hash value is low given that we pick hash functions at random. However, universality fails to ensure that all hash functions are well behaved. We further require regularity: when picking data objects at random they should have a low probability of having the same hash value, for any fixed hash function. We present the efficient implementation of a family of non-cryptographic hash functions (PM+) offering good running times, good memory usage as well as distinguishing theoretical guarantees: almost universality and component-wise regularity. On a variety of platforms, our implementations are comparable to the state of the art in performance. On recent Intel processors, PM+ achieves a speed of 4.7 bytes per cycle for 32-bit outputs and 3.3 bytes per cycle for 64-bit outputs. We review vectorization through SIMD instructions (e.g., AVX2) and optimizations for superscalar execution.Comment: accepted for publication in Software: Practice and Experience in September 201

arXiv.org e-Print Archive

R-libre

Simple, compact and robust approximate string dictionary

Author: Belazzougui Djamal
Chegrane Ibrahim
Publication venue
Publication date: 22/08/2014
Field of study

This paper is concerned with practical implementations of approximate string dictionaries that allow edit errors. In this problem, we have as input a dictionary

D

d

strings of total length

n

over an alphabet of size

\sigma

. Given a bound

k

and a pattern

x

of length

m

, a query has to return all the strings of the dictionary which are at edit distance at most

k

from

x

, where the edit distance between two strings

x

and

y

is defined as the minimum-cost sequence of edit operations that transform

x

into

y

. The cost of a sequence of operations is defined as the sum of the costs of the operations involved in the sequence. In this paper, we assume that each of these operations has unit cost and consider only three operations: deletion of one character, insertion of one character and substitution of a character by another. We present a practical implementation of the data structure we recently proposed and which works only for one error. We extend the scheme to

2\leq k<m

. Our implementation has many desirable properties: it has a very fast and space-efficient building algorithm. The dictionary data structure is compact and has fast and robust query time. Finally our data structure is simple to implement as it only uses basic techniques from the literature, mainly hashing (linear probing and hash signatures) and succinct data structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures

arXiv.org e-Print Archive

CiteSeerX

Strongly universal string hashing is fast

Author: Kaser Owen
Lemire Daniel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/11/2014
Field of study

We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that these families---though they require a large buffer of random numbers---are often faster than popular hash functions with weaker theoretical guarantees. Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining. We present experimental results on several processors including low-powered processors. Our tests include hash functions designed for processors with the Carry-Less Multiplication (CLMUL) instruction set. We also prove, using accessible proofs, the strong universality of our families.Comment: Software is available at http://code.google.com/p/variablelengthstringhashing/ and https://github.com/lemire/StronglyUniversalStringHashin

arXiv.org e-Print Archive

R-libre

Crossref

Recommended from our members

GPERF : a perfect hash function generator

Author: Schmidt Douglas C.
Suda Tatsuya
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

gperf is a widely available perfect hash function generator written in C++. It automates a common system software operation: keyword recognition. gperf translates an n element user-specified keyword list keyfile into source code containing a k element lookup table and a pair of functions, phash and in_word_set. phash uniquely maps keywords in keyfile onto the range 0 .. k - 1, where k >/= n. If k = n, then phash is considered a minimal perfect hash function. in_word_set uses phash to determine whether a particular string of characters str occurs in the keyfile, using at most one string comparison.This paper describes the user-interface, options, features, algorithm design and implementation strategies incorporated in gperf. It also presents the results from an empirical comparison between gperf-generated recognizers and other popular techniques for reserved word lookup

eScholarship - University of California

Postprocessing for quantum random number generators: entropy evaluation and randomness extraction

Author: B. Chor
B. Schneier
Bing Qi
C. H. Bennett
D. Zuckerman
Feihu Xu
H. Krawczyk
He Xu
Hoi-Kwong Lo
L. Trevisan
M. Ben-Or
M. Epstein
R. Canetti
R. Impagliazzo
R. Raz
R. Renner
Xiaoqing Tan
Xiongfeng Ma
Y. Mansour
Publication venue: 'American Physical Society (APS)'
Publication date: 21/06/2013
Field of study

Quantum random-number generators (QRNGs) can offer a means to generate information-theoretically provable random numbers, in principle. In practice, unfortunately, the quantum randomness is inevitably mixed with classical randomness due to classical noises. To distill this quantum randomness, one needs to quantify the randomness of the source and apply a randomness extractor. Here, we propose a generic framework for evaluating quantum randomness of real-life QRNGs by min-entropy, and apply it to two different existing quantum random-number systems in the literature. Moreover, we provide a guideline of QRNG data postprocessing for which we implement two information-theoretically provable randomness extractors: Toeplitz-hashing extractor and Trevisan's extractor.Comment: 13 pages, 2 figure

arXiv.org e-Print Archive

Crossref

The universality of iterated hashing over variable-length strings

Author: Byers
Carter
Cohen
Daniel Lemire
Knuth
Krawczyk
Krawczyk
Krovetz
Kukich
Lemire
Liskov
Pagh
Pearson
Piret
Preneel
Ramakrishna
Rogaway
Sarkar
Shoup
Stinson
Zobrist
Publication venue: 'Elsevier BV'
Publication date: 24/11/2011
Field of study

Iterated hash functions process strings recursively, one character at a time. At each iteration, they compute a new hash value from the preceding hash value and the next character. We prove that iterated hashing can be pairwise independent, but never 3-wise independent. We show that it can be almost universal over strings much longer than the number of hash values; we bound the maximal string length given the collision probability

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

R-libre