99 research outputs found

    The universality of iterated hashing over variable-length strings

    Get PDF
    Iterated hash functions process strings recursively, one character at a time. At each iteration, they compute a new hash value from the preceding hash value and the next character. We prove that iterated hashing can be pairwise independent, but never 3-wise independent. We show that it can be almost universal over strings much longer than the number of hash values; we bound the maximal string length given the collision probability

    Regular and almost universal hashing: an efficient implementation

    Get PDF
    Random hashing can provide guarantees regarding the performance of data structures such as hash tables---even in an adversarial setting. Many existing families of hash functions are universal: given two data objects, the probability that they have the same hash value is low given that we pick hash functions at random. However, universality fails to ensure that all hash functions are well behaved. We further require regularity: when picking data objects at random they should have a low probability of having the same hash value, for any fixed hash function. We present the efficient implementation of a family of non-cryptographic hash functions (PM+) offering good running times, good memory usage as well as distinguishing theoretical guarantees: almost universality and component-wise regularity. On a variety of platforms, our implementations are comparable to the state of the art in performance. On recent Intel processors, PM+ achieves a speed of 4.7 bytes per cycle for 32-bit outputs and 3.3 bytes per cycle for 64-bit outputs. We review vectorization through SIMD instructions (e.g., AVX2) and optimizations for superscalar execution.Comment: accepted for publication in Software: Practice and Experience in September 201

    On the Security of Keyed Hashing Based on Public Permutations

    Get PDF
    Doubly-extendable cryptographic keyed functions (deck) generalize the concept of message authentication codes (MAC) and stream ciphers in that they support variable-length strings as input and return variable-length strings as output. A prominent example of building deck functions is Farfalle, which consists of a set of public permutations and rolling functions that are used in its compression and expansion layers. By generalizing the compression layer of Farfalle, we prove its universality in terms of the probability of differentials over the public permutation used in it. As the compression layer of Farfalle is inherently parallel, we compare it to a generalization of a serial compression function inspired by Pelican-MAC. The same public permutation may result in different universalities depending on whether the compression is done in parallel or serial. The parallel construction consistently performs better than the serial one, sometimes by a big factor. We demonstrate this effect using Xoodoo[3], which is a round-reduced variant of the public permutation used in the deck function Xoofff

    Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

    Full text link
    We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar

    Multimixer-128: Universal Keyed Hashing Based on Integer Multiplication

    Get PDF
    In this paper we introduce a new keyed hash function based on 32-bit integer multiplication that we call Multimixer-128. In our approach, we follow the key-then-hash parallel paradigm. So, we first add a variable length input message to a secret key and split the result into blocks. A fixed length public function based on integer multiplication is then applied on each block and their results are added to form the digest. We prove an upper bound of 2−127 for the universality of Multimixer-128 by means of the differential probability and image probability of the underlying public function. There are vector instructions for fast 32-bit integer multiplication on many CPUs and in such platforms, Multimixer-128 is very efficient. We compare our implementation of Multimixer-128 with NH hash function family that offers similar levels of security and with two fastest NIST LWC candidates. To the best of our knowledge, NH hash function is the fastest keyed hash function on software and Multimixer-128 outperforms NH while providing same levels of security

    Achievable secrecy enchancement through joint encryption and privacy amplification

    Get PDF
    In this dissertation we try to achieve secrecy enhancement in communications by resorting to both cryptographic and information theoretic secrecy tools and metrics. Our objective is to unify tools and measures from cryptography community with techniques and metrics from information theory community that are utilized to provide privacy and confidentiality in communication systems. For this purpose we adopt encryption techniques accompanied with privacy amplification tools in order to achieve secrecy goals that are determined based on information theoretic and cryptographic metrics. Every secrecy scheme relies on a certain advantage for legitimate users over adversaries viewed as an asymmetry in the system to deliver the required security for data transmission. In all of the proposed schemes in this dissertation, we resort to either inherently existing asymmetry in the system or proactively created advantage for legitimate users over a passive eavesdropper to further enhance secrecy of the communications. This advantage is manipulated by means of privacy amplification and encryption tools to achieve secrecy goals for the system evaluated based on information theoretic and cryptographic metrics. In our first work discussed in Chapter 2 and the third work explained in Chapter 4, we rely on a proactively established advantage for legitimate users based on eavesdropper’s lack of knowledge about a shared source of data. Unlike these works that assume an errorfree physical channel, in the second work discussed in Chapter 3 correlated erasure wiretap channel model is considered. This work relies on a passive and internally existing advantage for legitimate users that is built upon statistical and partial independence of eavesdropper’s channel errors from the errors in the main channel. We arrive at this secrecy advantage for legitimate users by exploitation of an authenticated but insecure feedback channel. From the perspective of the utilized tools, the first work discussed in Chapter 2 considers a specific scenario where secrecy enhancement of a particular block cipher called Data Encryption standard (DES) operating in cipher feedback mode (CFB) is studied. This secrecy enhancement is achieved by means of deliberate noise injection and wiretap channel encoding as a technique for privacy amplification against a resource constrained eavesdropper. Compared to the first work, the third work considers a more general framework in terms of both metrics and secrecy tools. This work studies secrecy enhancement of a general cipher based on universal hashing as a privacy amplification technique against an unbounded adversary. In this work, we have achieved the goal of exponential secrecy where information leakage to adversary, that is assessed in terms of mutual information as an information theoretic measure and Eve’s distinguishability as a cryptographic metric, decays at an exponential rate. In the second work generally encrypted data frames are transmitted through Automatic Repeat reQuest (ARQ) protocol to generate a common random source between legitimate users that later on is transformed into information theoretically secure keys for encryption by means of privacy amplification based on universal hashing. Towards the end, future works as an extension of the accomplished research in this dissertation are outlined. Proofs of major theorems and lemmas are presented in the Appendix

    Algorithms for sparse convolution and sublinear edit distance

    Get PDF
    In this PhD thesis on fine-grained algorithm design and complexity, we investigate output-sensitive and sublinear-time algorithms for two important problems. (1) Sparse Convolution: Computing the convolution of two vectors is a basic algorithmic primitive with applications across all of Computer Science and Engineering. In the sparse convolution problem we assume that the input and output vectors have at most t nonzero entries, and the goal is to design algorithms with running times dependent on t. For the special case where all entries are nonnegative, which is particularly important for algorithm design, it is known since twenty years that sparse convolutions can be computed in near-linear randomized time O(t log^2 n). In this thesis we develop a randomized algorithm with running time O(t \log t) which is optimal (under some mild assumptions), and the first near-linear deterministic algorithm for sparse nonnegative convolution. We also present an application of these results, leading to seemingly unrelated fine-grained lower bounds against distance oracles in graphs. (2) Sublinear Edit Distance: The edit distance of two strings is a well-studied similarity measure with numerous applications in computational biology. While computing the edit distance exactly provably requires quadratic time, a long line of research has lead to a constant-factor approximation algorithm in almost-linear time. Perhaps surprisingly, it is also possible to approximate the edit distance k within a large factor O(k) in sublinear time O~(n/k + poly(k)). We drastically improve the approximation factor of the known sublinear algorithms from O(k) to k^{o(1)} while preserving the O(n/k + poly(k)) running time.In dieser Doktorarbeit ĂŒber feinkörnige Algorithmen und KomplexitĂ€t untersuchen wir ausgabesensitive Algorithmen und Algorithmen mit sublinearer Lauf-zeit fĂŒr zwei wichtige Probleme. (1) DĂŒnne Faltungen: Die Berechnung der Faltung zweier Vektoren ist ein grundlegendes algorithmisches Primitiv, das in allen Bereichen der Informatik und des Ingenieurwesens Anwendung findet. FĂŒr das dĂŒnne Faltungsproblem nehmen wir an, dass die Eingabe- und Ausgabevektoren höchstens t EintrĂ€ge ungleich Null haben, und das Ziel ist, Algorithmen mit Laufzeiten in AbhĂ€ngigkeit von t zu entwickeln. FĂŒr den speziellen Fall, dass alle EintrĂ€ge nicht-negativ sind, was insbesondere fĂŒr den Entwurf von Algorithmen relevant ist, ist seit zwanzig Jahren bekannt, dass dĂŒnn besetzte Faltungen in nahezu linearer randomisierter Zeit O(t \log^2 n) berechnet werden können. In dieser Arbeit entwickeln wir einen randomisierten Algorithmus mit Laufzeit O(t \log t), der (unter milden Annahmen) optimal ist, und den ersten nahezu linearen deterministischen Algorithmus fĂŒr dĂŒnne nichtnegative Faltungen. Wir stellen auch eine Anwendung dieser Ergebnisse vor, die zu scheinbar unverwandten feinkörnigen unteren Schranken gegen Distanzorakel in Graphen fĂŒhrt. (2) Sublineare Editierdistanz: Die Editierdistanz zweier Zeichenketten ist ein gut untersuchtes Ähnlichkeitsmaß mit zahlreichen Anwendungen in der Computerbiologie. WĂ€hrend die exakte Berechnung der Editierdistanz nachweislich quadratische Zeit erfordert, hat eine lange Reihe von Forschungsarbeiten zu einem Approximationsalgorithmus mit konstantem Faktor in fast-linearer Zeit gefĂŒhrt. Überraschenderweise ist es auch möglich, die Editierdistanz k innerhalb eines großen Faktors O(k) in sublinearer Zeit O~(n/k + poly(k)) zu approximieren. Wir verbessern drastisch den Approximationsfaktor der bekannten sublinearen Algorithmen von O(k) auf k^{o(1)} unter Beibehaltung der O(n/k + poly(k))-Laufzeit

    New Proofs for NMAC and HMAC: Security Without Collision-Resistance

    Get PDF
    HMAC was proved by Bellare, Canetti and Krawczyk [2] to be a PRF assuming that (1) the underlying compression function is a PRF, and (2) the iterated hash function is weakly collision-resistant. However, recent attacks show that assumption (2) is false for MD5 and SHA-1, removing the proof-based support for HMAC in these cases. This paper proves that HMAC is a PRF under the sole assumption that the compression function is a PRF. This recovers a proof based guarantee since no known attacks compromise the pseudorandomness of the compression function, and it also helps explain the resistance-to-attack that HMAC has shown even when implemented with hash functions whose (weak) collision resistance is compromised. We also show that an even weaker-than-PRF condition on the compression function, namely that it is a privacy-preserving MAC, suffices to establish HMAC is a MAC as long as the hash function meets the very weak requirement of being computationally almost universal, where again the value lies in the fact that known attacks do not invalidate the assumptions made

    Random hypergraphs for hashing-based data structures

    Get PDF
    This thesis concerns dictionaries and related data structures that rely on providing several random possibilities for storing each key. Imagine information on a set S of m = |S| keys should be stored in n memory locations, indexed by [n] = {1,
,n}. Each object x [ELEMENT OF] S is assigned a small set e(x) [SUBSET OF OR EQUAL TO] [n] of locations by a random hash function, independent of other objects. Information on x must then be stored in the locations from e(x) only. It is possible that too many objects compete for the same locations, in particular if the load c = m/n is high. Successfully storing all information may then be impossible. For most distributions of e(x), however, success or failure can be predicted very reliably, since the success probability is close to 1 for loads c less than a certain load threshold c^* and close to 0 for loads greater than this load threshold. We mainly consider two types of data structures: ‱ A cuckoo hash table is a dictionary data structure where each key x [ELEMENT OF] S is stored together with an associated value f(x) in one of the memory locations with an index from e(x). The distribution of e(x) is controlled by the hashing scheme. We analyse three known hashing schemes, and determine their exact load thresholds. The schemes are unaligned blocks, double hashing and a scheme for dynamically growing key sets. ‱ A retrieval data structure also stores a value f(x) for each x [ELEMENT OF] S. This time, the values stored in the memory locations from e(x) must satisfy a linear equation that characterises the value f(x). The resulting data structure is extremely compact, but unusual. It cannot answer questions of the form “is y [ELEMENT OF] S?”. Given a key y it returns a value z. If y [ELEMENT OF] S, then z = f(y) is guaranteed, otherwise z may be an arbitrary value. We consider two new hashing schemes, where the elements of e(x) are contained in one or two contiguous blocks. This yields good access times on a word RAM and high cache efficiency. An important question is whether these types of data structures can be constructed in linear time. The success probability of a natural linear time greedy algorithm exhibits, once again, threshold behaviour with respect to the load c. We identify a hashing scheme that leads to a particularly high threshold value in this regard. In the mathematical model, the memory locations [n] correspond to vertices, and the sets e(x) for x [ELEMENT OF] S correspond to hyperedges. Three properties of the resulting hypergraphs turn out to be important: peelability, solvability and orientability. Therefore, large parts of this thesis examine how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds. Translated back into the world of data structures, we achieve low access times, high memory efficiency and low construction times. We complement and support the theoretical results by experiments.Diese Arbeit behandelt WörterbĂŒcher und verwandte Datenstrukturen, die darauf aufbauen, mehrere zufĂ€llige Möglichkeiten zur Speicherung jedes SchlĂŒssels vorzusehen. Man stelle sich vor, Information ĂŒber eine Menge S von m = |S| SchlĂŒsseln soll in n SpeicherplĂ€tzen abgelegt werden, die durch [n] = {1,
,n} indiziert sind. Jeder SchlĂŒssel x [ELEMENT OF] S bekommt eine kleine Menge e(x) [SUBSET OF OR EQUAL TO] [n] von SpeicherplĂ€tzen durch eine zufĂ€llige Hashfunktion unabhĂ€ngig von anderen SchlĂŒsseln zugewiesen. Die Information ĂŒber x darf nun ausschließlich in den PlĂ€tzen aus e(x) untergebracht werden. Es kann hierbei passieren, dass zu viele SchlĂŒssel um dieselben SpeicherplĂ€tze konkurrieren, insbesondere bei hoher Auslastung c = m/n. Eine erfolgreiche Speicherung der Gesamtinformation ist dann eventuell unmöglich. FĂŒr die meisten Verteilungen von e(x) lĂ€sst sich Erfolg oder Misserfolg allerdings sehr zuverlĂ€ssig vorhersagen, da fĂŒr Auslastung c unterhalb eines gewissen Auslastungsschwellwertes c* die Erfolgswahrscheinlichkeit nahezu 1 ist und fĂŒr c jenseits dieses Auslastungsschwellwertes nahezu 0 ist. HauptsĂ€chlich werden wir zwei Arten von Datenstrukturen betrachten: ‱ Eine Kuckucks-Hashtabelle ist eine Wörterbuchdatenstruktur, bei der jeder SchlĂŒssel x [ELEMENT OF] S zusammen mit einem assoziierten Wert f(x) in einem der SpeicherplĂ€tze mit Index aus e(x) gespeichert wird. Die Verteilung von e(x) wird hierbei vom Hashing-Schema festgelegt. Wir analysieren drei bekannte Hashing-Schemata und bestimmen erstmals deren exakte Auslastungsschwellwerte im obigen Sinne. Die Schemata sind unausgerichtete Blöcke, Doppel-Hashing sowie ein Schema fĂŒr dynamisch wachsenden SchlĂŒsselmengen. ‱ Auch eine Retrieval-Datenstruktur speichert einen Wert f(x) fĂŒr alle x [ELEMENT OF] S. Diesmal sollen die Werte in den SpeicherplĂ€tzen aus e(x) eine lineare Gleichung erfĂŒllen, die den Wert f(x) charakterisiert. Die entstehende Datenstruktur ist extrem platzsparend, aber ungewöhnlich: Sie ist ungeeignet um Fragen der Form „ist y [ELEMENT OF] S?“ zu beantworten. Bei Anfrage eines SchlĂŒssels y wird ein Ergebnis z zurĂŒckgegeben. Falls y [ELEMENT OF] S ist, so ist z = f(y) garantiert, andernfalls darf z ein beliebiger Wert sein. Wir betrachten zwei neue Hashing-Schemata, bei denen die Elemente von e(x) in einem oder in zwei zusammenhĂ€ngenden Blöcken liegen. So werden gute Zugriffszeiten auf Word-RAMs und eine hohe Cache-Effizienz erzielt. Eine wichtige Frage ist, ob Datenstrukturen obiger Art in Linearzeit konstruiert werden können. Die Erfolgswahrscheinlichkeit eines naheliegenden Greedy-Algorithmus weist abermals ein Schwellwertverhalten in Bezug auf die Auslastung c auf. Wir identifizieren ein Hashing-Schema, das diesbezĂŒglich einen besonders hohen Schwellwert mit sich bringt. In der mathematischen Modellierung werden die Speicherpositionen [n] als Knoten und die Mengen e(x) fĂŒr x [ELEMENT OF] S als Hyperkanten aufgefasst. Drei Eigenschaften der entstehenden Hypergraphen stellen sich dann als zentral heraus: SchĂ€lbarkeit, Lösbarkeit und Orientierbarkeit. Weite Teile dieser Arbeit beschĂ€ftigen sich daher mit den Wahrscheinlichkeiten fĂŒr das Vorliegen dieser Eigenschaften abhĂ€ngig von Hashing Schema und Auslastung, sowie mit entsprechenden Schwellwerten. Eine RĂŒckĂŒbersetzung der Ergebnisse liefert dann Datenstrukturen mit geringen Anfragezeiten, hoher Speichereffizienz und geringen Konstruktionszeiten. Die theoretischen Überlegungen werden dabei durch experimentelle Ergebnisse ergĂ€nzt und gestĂŒtzt
    • 

    corecore