6 research outputs found

    On randomness in Hash functions

    Get PDF
    In the talk, we shall discuss quality measures for hash functions used in data structures and algorithms, and survey positive and negative results. (This talk is not about cryptographic hash functions.) For the analysis of algorithms involving hash functions, it is often convenient to assume the hash functions used behave fully randomly; in some cases there is no analysis known that avoids this assumption. In practice, one needs to get by with weaker hash functions that can be generated by randomized algorithms. A well-studied range of applications concern realizations of dynamic dictionaries (linear probing, chained hashing, dynamic perfect hashing, cuckoo hashing and its generalizations) or Bloom filters and their variants. A particularly successful and useful means of classification are Carter and Wegman's universal or k-wise independent classes, introduced in 1977. A natural and widely used approach to analyzing an algorithm involving hash functions is to show that it works if a sufficiently strong universal class of hash functions is used, and to substitute one of the known constructions of such classes. This invites research into the question of just how much independence in the hash functions is necessary for an algorithm to work. Some recent analyses that gave impossibility results constructed rather artificial classes that would not work; other results pointed out natural, widely used hash classes that would not work in a particular application. Only recently it was shown that under certain assumptions on some entropy present in the set of keys even 2-wise independent hash classes will lead to strong randomness properties in the hash values. The negative results show that these results may not be taken as justification for using weak hash classes indiscriminately, in particular for key sets with structure. When stronger independence properties are needed for a theoretical analysis, one may resort to classic constructions. Only in 2003 it was found out how full randomness can be simulated using only linear space overhead (which is optimal). The "split-and-share" approach can be used to justify the full randomness assumption in some situations in which full randomness is needed for the analysis to go through, like in many applications involving multiple hash functions (e.g., generalized versions of cuckoo hashing with multiple hash functions or larger bucket sizes, load balancing, Bloom filters and variants, or minimal perfect hash function constructions). For practice, efficiency considerations beyond constant factors are important. It is not hard to construct very efficient 2-wise independent classes. Using k-wise independent classes for constant k bigger than 3 has become feasible in practice only by new constructions involving tabulation. This goes together well with the quite new result that linear probing works with 5-independent hash functions. Recent developments suggest that the classification of hash function constructions by their degree of independence alone may not be adequate in some cases. Thus, one may want to analyze the behavior of specific hash classes in specific applications, circumventing the concept of k-wise independence. Several such results were recently achieved concerning hash functions that utilize tabulation. In particular if the analysis of the application involves using randomness properties in graphs and hypergraphs (generalized cuckoo hashing, also in the version with a "stash", or load balancing), a hash class combining k-wise independence with tabulation has turned out to be very powerful

    Worst-case efficient single and multiple string matching on packed texts in the word-RAM model

    Get PDF
    AbstractIn this paper, we explore worst-case solutions for the problems of single and multiple matching on strings in the word-RAM model with word length w. In the first problem, we have to build a data structure based on a pattern p of length m over an alphabet of size σ such that we can answer to the following query: given a text T of length n, where each character is encoded using logσ bits return the positions of all the occurrences of p in T (in the following we refer by occ to the number of reported occurrences). For the multi-pattern matching problem we have a set S of d patterns of total length m and a query on a text T consists in finding all positions of all occurrences in T of the patterns in S. As each character of the text is encoded using logσ bits and we can read w bits in constant time in the RAM model, we assume that we can read up to Θ(w/logσ) consecutive characters of the text in one time step. This implies that the fastest possible query time for both problems is O(nlogσw+occ). In this paper we present several different results for both problems which come close to that best possible query time. We first present two different linear space data structures for the first and second problem: the first one answers to single pattern matching queries in time O(n(1m+logσw)+occ) while the second one answers to multiple pattern matching queries to O(n(logd+logy+loglogmy+logσw)+occ) where y is the length of the shortest pattern. We then show how a simple application of the four Russian technique permits to get data structures with query times independent of the length of the shortest pattern (the length of the only pattern in case of single string matching) at the expense of using more space

    Deterministic load balancing and dictionaries in the parallel disk model

    Full text link

    SoK: Security Models for Pseudo-Random Number Generators

    Get PDF
    Randomness plays an important role in multiple applications in cryptography. It is required in fundamental tasks such as key generation, masking and hiding values, nonces and initialization vectors generation. Pseudo-random number generators have been studied by numerous authors, either to propose clear security notions and associated constructions or to point out potential vulnerabilities. In this systematization of knowledge paper, we present the three notions of generators that have been successively formalized: standard generators, stateful generators and generators with input. For each notion, we present expected security properties, where adversaries have increasing capabilities (including access to partial information on the internal variables) and we propose secure and efficient constructions, all based on the block cipher AES. In our description of generators with input, we revisit the notions of accumulator and extractor and we point out that security crucially relies on the independence between the randomness source and the seeds of the accumulator and the extractor. To illustrate this requirement, we identify a potential vulnerability of the NIST standard CTR_DRBG
    corecore