1,158 research outputs found

    Faster binary-field multiplication and faster binary-field MACs

    Get PDF
    This paper shows how to securely authenticate messages using just 29 bit operations per authenticated bit, plus a constant overhead per message. The authenticator is a standard type of "universal" hash function providing information-theoretic security; what is new is computing this type of hash function at very high speed. At a lower level, this paper shows how to multiply two elements of a field of size 2^128 using just 9062 \approx 71 * 128 bit operations, and how to multiply two elements of a field of size 2^256 using just 22164 \approx 87 * 256 bit operations. This performance relies on a new representation of field elements and new FFT-based multiplication techniques. This paper's constant-time software uses just 1.89 Core 2 cycles per byte to authenticate very long messages. On a Sandy Bridge it takes 1.43 cycles per byte, without using Intel's PCLMULQDQ polynomial-multiplication hardware. This is much faster than the speed records for constant-time implementations of GHASH without PCLMULQDQ (over 10 cycles/byte), even faster than Intel's best Sandy Bridge implementation of GHASH with PCLMULQDQ (1.79 cycles/byte), and almost as fast as state-of-the-art 128-bit prime-field MACs using Intel's integer-multiplication hardware (around 1 cycle/byte). Keywords: Performance, FFTs, Polynomial multiplication, Universal hashing, Message authenticatio

    Faster Binary-Field Multiplication and Faster Binary-Field MACs

    Get PDF
    This paper shows how to securely authenticate messages using just 29 bit operations per authenticated bit, plus a constant overhead per message. The authenticator is a standard type of universal hash function providing information-theoretic security; what is new is computing this type of hash function at very high speed. At a lower level, this paper shows how to multiply two elements of a field of size 2^128 using just 9062 \approx 71 * 128 bit operations, and how to multiply two elements of a field of size 2^256 using just 22164 \approx 87 * 256 bit operations. This performance relies on a new representation of field elements and new FFT-based multiplication techniques. This paper\u27s constant-time software uses just 1.89 Core 2 cycles per byte to authenticate very long messages. On a Sandy Bridge it takes 1.43 cycles per byte, without using Intel\u27s PCLMULQDQ polynomial-multiplication hardware. This is much faster than the speed records for constant-time implementations of GHASH without PCLMULQDQ (over 10 cycles/byte), even faster than Intel\u27s best Sandy Bridge implementation of GHASH with PCLMULQDQ (1.79 cycles/byte), and almost as fast as state-of-the-art 128-bit prime-field MACs using Intel\u27s integer-multiplication hardware (around 1 cycle/byte)

    Comparison of the tally numbering system to traditional arithmetic systems in field programmable gate arrays

    Get PDF
    This research explores the use of heterogeneous computing platforms for use in machine learning as well as different neural network architectures. These platforms and architectures can be used to accelerate the complex operations that are required for machine learning, more specifically neural networks. The use of different architectures, implementing different types of numbering and mathematics systems is explored in hopes of accelerating mathematical functions. The heterogeneous computing platform explored in this thesis is a Field Programmable Gate Arrays (FPGA), specifically a SoC/FPGA which is a ARM CPU and a FPGA in the same chip. FPGAs are unique because they are low power, high customizable hardware with bit level control. A new numbering system, called Tally, can be simulated in software but only implemented in hardware with a FPGA, without time consuming and expensive ASIC (application specific integrated circuit) being designed. This thesis explores two different types of neural networks are explored the first a simple XOR (exclusive OR) gate neural network tested in the Tally system, 16-bit fixed point and 32-bit floating point. The MNIST (handwritten numbers) dataset is used with a pre-trained multi-layer perceptron with both 16-bit Fixed Point and 32-bit Floating Point numbers. This study will act as a preliminary exploration of the Tally System and an exercise in learning implementation of neural networks in hardware
    • …
    corecore