25 research outputs found

    A Fast Single-Key Two-Level Universal Hash Function

    Get PDF
    Universal hash functions based on univariate polynomials are well known, e.g. Poly1305 and GHASH. Using Horner’s rule to evaluate such hash functionsrequire l − 1 field multiplications for hashing a message consisting of l blocks where each block is one field element. A faster method is based on the class of Bernstein-Rabin-Winograd (BRW) polynomials which require ⌊l/2⌋ multiplications and ⌊lgl⌋ squarings for l≥3 blocks. Though this is significantly smaller than Horner’s rule based hashing, implementation of BRW polynomials for variable length messages present significant difficulties. In this work, we propose a two-level hash function where BRW polynomial based hashing is done at the lower level and Horner’s rule based hashing is done at the higher level. The BRW polynomial based hashing is applied to a fixed number of blocks and hence the difficulties in handling variable length messages is avoided. Even though the hash function has two levels, we show that it is sufficient to use a single field element as the hash key. The basic idea is instantiated to propose two new hash functions, one which hashes a single binary string and the other can hash a vector of binary strings. We describe two actual implementations, one over F2128 and the other over F2256 both using the pclmulqdq instruction available in modern Intel processors. On both the Haswell and Skylake processors, the implementation over F2128 is faster than both an implementation of GHASH by Gueron; and a highly optimised implementation, also by Gueron, of another polynomial based hash function called POLYVAL. We further show that the Fast Fourier Transform based field multiplication over F2256 proposed by Bernstein and Chou can be used to evaluate the new hash function at a cost of about at most 46 bit operations per bit of digest, but, unlike the Bernstein-Chou analysis, there is no hidden cost of generating the hash key. More generally, the new idea of building a two-level hash function having a single field element as the hash key can be applied to other finite fields to build new hash functions

    Ubiquitous Weak-key Classes of BRW-polynomial Function

    Get PDF
    BRW-polynomial function is suggested as a preferred alternative of polynomial function, owing to its high efficiency and seemingly non-existent weak keys. In this paper we investigate the weak-key issue of BRW-polynomial function as well as BRW-instantiated cryptographic schemes. Though, in BRW-polynomial evaluation, the relationship between coefficients and input blocks is indistinct, we give out a recursive algorithm to compute another (2v+11)(2^{v+1}-1)-block message, for any given (2v+11)(2^{v+1}-1)-block message, such that their output-differential through BRW-polynomial evaluation, equals any given ss-degree polynomial, where vlog2(s+1)v\ge\lfloor\log_2(s+1)\rfloor. With such algorithm, we illustrate that any non-empty key subset is a weak-key class in BRW-polynomial function. Moreover any key subset of BRW-polynomial function, consisting of at least 22 keys, is a weak-key class in BRW-instantiated cryptographic schemes like the Wegman-Carter scheme, the UHF-then-PRF scheme, DCT, etc. Especially in the AE scheme DCT, its confidentiality, as well as its integrity, collapses totally, when using weak keys of BRW-polynomial function, which are ubiquitous

    Efficient Hardware Implementations of BRW Polynomials and Tweakable Enciphering Schemes

    Get PDF
    A new class of polynomials was introduced by Bernstein (Bernstein 2007) which were later named by Sarkar as Bernstein-Rabin-Winograd (BRW) polynomials (Sarkar 2009). For the purpose of authentication, BRW polynomials offer considerable computational advantage over usual polynomials: (m1)(m-1) multiplications for usual polynomial hashing versus m2\lfloor\frac{m}{2}\rfloor multiplications and log2m\lceil\log_2 m\rceil squarings for BRW hashing, where mm is the number of message blocks to be authenticated. In this paper, we develop an efficient pipelined hardware architecture for computing BRW polynomials. The BRW polynomials have a nice recursive structure which is amenable to parallelization. While exploring efficient ways to exploit the inherent parallelism in BRW polynomials we discover some interesting combinatorial structural properties of such polynomials. These are used to design an algorithm to decide the order of the multiplications which minimizes pipeline delays. Using the nice structural properties of the BRW polynomials we present a hardware architecture for efficient computation of BRW polynomials. Finally we provide implementations of tweakable enciphering schemes proposed in Sarkar 2009 which uses BRW polynomials. This leads to the fastest known implementation of disk encryption systems

    A New Multi-Linear Universal Hash Family

    Get PDF
    A new universal hash family is described. Messages are sequences over a finite field \rF_q while keys are sequences over an extension field \rF_{q^n}. A linear map ψ\psi from \rF_{q^n} to itself is used to compute the output digest. Of special interest is the case q=2q=2. For this case, we show that there is an efficient way to implement ψ\psi using a tower field representation of \rF_{q^n}. From a practical point of view, the focus of our constructions is small hardware and other resource constrained applications. For such platforms, our constructions compare favourably to previous work

    Polynomial Hashing over Prime Order Fields

    Get PDF
    This paper makes a comprehensive study of two important strategies for polynomial hashing over a prime order field Fp\mathbb{F}_p, namely usual polynomial based hashing and hashing based on Bernstein-Rabin-Winograd (BRW) polynomials, and the various ways to combine them. Several hash functions are proposed and upper bounds on their differential probabilities are derived. Concrete instantiations are provided for the primes p=21271p=2^{127}-1 and p=21305p=2^{130}-5. A major contribution of the paper is an extensive 64-bit implementation of all the proposed hash functions in assembly targeted at modern Intel processors. The timing results suggest that using the prime 212712^{127}-1 is significantly faster than using the prime 213052^{130}-5. Further, a judicious mix of the usual polynomial based hashing and BRW-polynomial based hashing can provide a significantly faster alternative to only usual polynomial based hashing. In particular, the timing results of our implementations show that our final hash function proposal for the prime 212712^{127}-1 is much faster than the well known Poly1305 hash function defined over the prime 213052^{130}-5, achieving speed improvements of up to 40%

    Implementation and Evaluation of Algorithmic Skeletons: Parallelisation of Computer Algebra Algorithms

    Get PDF
    This thesis presents design and implementation approaches for the parallel algorithms of computer algebra. We use algorithmic skeletons and also further approaches, like data parallel arithmetic and actors. We have implemented skeletons for divide and conquer algorithms and some special parallel loops, that we call ‘repeated computation with a possibility of premature termination’. We introduce in this thesis a rational data parallel arithmetic. We focus on parallel symbolic computation algorithms, for these algorithms our arithmetic provides a generic parallelisation approach. The implementation is carried out in Eden, a parallel functional programming language based on Haskell. This choice enables us to encode both the skeletons and the programs in the same language. Moreover, it allows us to refrain from using two different languages—one for the implementation and one for the interface—for our implementation of computer algebra algorithms. Further, this thesis presents methods for evaluation and estimation of parallel execution times. We partition the parallel execution time into two components. One of them accounts for the quality of the parallelisation, we call it the ‘parallel penalty’. The other is the sequential execution time. For the estimation, we predict both components separately, using statistical methods. This enables very confident estimations, although using drastically less measurement points than other methods. We have applied both our evaluation and estimation approaches to the parallel programs presented in this thesis. We haven also used existing estimation methods. We developed divide and conquer skeletons for the implementation of fast parallel multiplication. We have implemented the Karatsuba algorithm, Strassen’s matrix multiplication algorithm and the fast Fourier transform. The latter was used to implement polynomial convolution that leads to a further fast multiplication algorithm. Specially for our implementation of Strassen algorithm we have designed and implemented a divide and conquer skeleton basing on actors. We have implemented the parallel fast Fourier transform, and not only did we use new divide and conquer skeletons, but also developed a map-and-transpose skeleton. It enables good parallelisation of the Fourier transform. The parallelisation of Karatsuba multiplication shows a very good performance. We have analysed the parallel penalty of our programs and compared it to the serial fraction—an approach, known from literature. We also performed execution time estimations of our divide and conquer programs. This thesis presents a parallel map+reduce skeleton scheme. It allows us to combine the usual parallel map skeletons, like parMap, farm, workpool, with a premature termination property. We use this to implement the so-called ‘parallel repeated computation’, a special form of a speculative parallel loop. We have implemented two probabilistic primality tests: the Rabin–Miller test and the Jacobi sum test. We parallelised both with our approach. We analysed the task distribution and stated the fitting configurations of the Jacobi sum test. We have shown formally that the Jacobi sum test can be implemented in parallel. Subsequently, we parallelised it, analysed the load balancing issues, and produced an optimisation. The latter enabled a good implementation, as verified using the parallel penalty. We have also estimated the performance of the tests for further input sizes and numbers of processing elements. Parallelisation of the Jacobi sum test and our generic parallelisation scheme for the repeated computation is our original contribution. The data parallel arithmetic was defined not only for integers, which is already known, but also for rationals. We handled the common factors of the numerator or denominator of the fraction with the modulus in a novel manner. This is required to obtain a true multiple-residue arithmetic, a novel result of our research. Using these mathematical advances, we have parallelised the determinant computation using the Gauß elimination. As always, we have performed task distribution analysis and estimation of the parallel execution time of our implementation. A similar computation in Maple emphasised the potential of our approach. Data parallel arithmetic enables parallelisation of entire classes of computer algebra algorithms. Summarising, this thesis presents and thoroughly evaluates new and existing design decisions for high-level parallelisations of computer algebra algorithms

    Small Stretch Problem of the DCT Scheme and How to Fix it

    Get PDF
    DCT is a beyond-birthday-bound~(BBB) deterministic authenticated encryption~(DAE) mode proposed by Forler et al. in ACISP 2016, ensuring integrity by redundancy. The instantiation scheme of DCT employs the BRW polynomial, which is more efficient than the usual polynomial function in GCM by reducing half of the multiplication operations. However, we show that DCT suffers from a small stretch problem similar to GCM. When the stretch length τ\tau is small, choosing a special mm-block message, we can reduce the number of queries required by a successful forgery to O(2τ/m)\mathcal{O}(2^{\tau}/m). We emphasize that this attack efficiently balances space and time complexity, but does not contradict the security bounds of DCT. Finally, we propose an improved scheme named Robust DCT~(RDCT) with a minor change to DCT, which improves the security when τ\tau is small and makes it resist the above attack

    Part I:

    Get PDF
    corecore