860 research outputs found
Synthesis Optimization on Galois-Field Based Arithmetic Operators for Rijndael Cipher
A series of experiments has been conducted to show that FPGA synthesis of Galois-Field (GF) based arithmetic operators can be optimized automatically to improve Rijndael Cipher throughput. Moreover, it has been demonstrated that efficiency improvement in GF operators does not directly correspond to the system performance at application level. The experiments were motivated by so many research works that focused on improving performance of GF operators. Each of the variants has the most efficient form in either time (fastest) or space (smallest occupied area) when implemented in FPGA chips. In fact, GF operators are not utilized individually, but rather integrated one to the others to implement algorithms. Contribution of this paper is to raise issue on GF-based application performance and suggest alternative aspects that potentially affect it. Instead of focusing on GF operator efficiency, system characteristics are worth considered in optimizing application performance
Large substitution boxes with efficient combinational implementations
At a fundamental level, the security of symmetric key cryptosystems ties back to Claude Shannon\u27s properties of confusion and diffusion. Confusion can be defined as the complexity of the relationship between the secret key and ciphertext, and diffusion can be defined as the degree to which the influence of a single input plaintext bit is spread throughout the resulting ciphertext. In constructions of symmetric key cryptographic primitives, confusion and diffusion are commonly realized with the application of nonlinear and linear operations, respectively. The Substitution-Permutation Network design is one such popular construction adopted by the Advanced Encryption Standard, among other block ciphers, which employs substitution boxes, or S-boxes, for nonlinear behavior. As a result, much research has been devoted to improving the cryptographic strength and implementation efficiency of S-boxes so as to prohibit cryptanalysis attacks that exploit weak constructions and enable fast and area-efficient hardware implementations on a variety of platforms. To date, most published and standardized S-boxes are bijective functions on elements of 4 or 8 bits. In this work, we explore the cryptographic properties and implementations of 8 and 16 bit S-boxes. We study the strength of these S-boxes in the context of Boolean functions and investigate area-optimized combinational hardware implementations. We then present a variety of new 8 and 16 bit S-boxes that have ideal cryptographic properties and enable low-area combinational implementations
Analysis of Parallel Montgomery Multiplication in CUDA
For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared
High Speed and Low Latency ECC Implementation over GF(2m) on FPGA
In this paper, a novel high-speed elliptic curve cryptography (ECC) processor implementation for point multiplication (PM) on field-programmable gate array (FPGA) is proposed. A new segmented pipelined full-precision multiplier is used to reduce the latency, and the Lopez-Dahab Montgomery PM algorithm is modified for careful scheduling to avoid data dependency resulting in a drastic reduction in the number of clock cycles (CCs) required. The proposed ECC architecture has been implemented on Xilinx FPGAs' Virtex4, Virtex5, and Virtex7 families. To the best of our knowledge, our single- and three-multiplier-based designs show the fastest performance to date when compared with reported works individually. Our one-multiplier-based ECC processor also achieves the highest reported speed together with the best reported area-time performance on Virtex4 (5.32 μs at 210 MHz), on Virtex5 (4.91 μs at 228 MHz), and on the more advanced Virtex7 (3.18 μs at 352 MHz). Finally, the proposed three-multiplier-based ECC implementation is the first work reporting the lowest number of CCs and the fastest ECC processor design on FPGA (450 CCs to get 2.83 μs on Virtex7)
Affine-Power S-Boxes over Galois Fields with Area-Optimized Logic Implementations
Cryptographic S-boxes are fundamental in key-iterated sub- stitution permutation network (SPN) designs for block ciphers. As a natural way for realizing Shannon’s confusion and diffusion properties in cryptographic primitives through nonlinear and linear behavior, re- spectively, SPN designs served as the basis for the Advanced Encryption Standard and a variety of other block ciphers. In this work we present a methodology for minimizing the logic resources for n-bit affine-power S- boxes over Galois fields based on measurable security properties and find- ing corresponding area-efficient combinational implementations in hard- ware. Motivated by the potential need for new and larger S-boxes, we use our methodology to find area-optimized circuits for 8- and 16-bit S-boxes. Our methodology is capable of finding good upper bounds on the number of XOR and AND gate equivalents needed for these circuits, which can be further optimized using modern CAD tools
Decoding Generalized Reed-Solomon Codes and Its Application to RLCE Encryption Schemes
This paper compares the efficiency of various algorithms for implementing
quantum resistant public key encryption scheme RLCE on 64-bit CPUs. By
optimizing various algorithms for polynomial and matrix operations over finite
fields, we obtained several interesting (or even surprising) results. For
example, it is well known (e.g., Moenck 1976 \cite{moenck1976practical}) that
Karatsuba's algorithm outperforms classical polynomial multiplication algorithm
from the degree 15 and above (practically, Karatsuba's algorithm only
outperforms classical polynomial multiplication algorithm from the degree 35
and above ). Our experiments show that 64-bit optimized Karatsuba's algorithm
will only outperform 64-bit optimized classical polynomial multiplication
algorithm for polynomials of degree 115 and above over finite field
. The second interesting (surprising) result shows that 64-bit
optimized Chien's search algorithm ourperforms all other 64-bit optimized
polynomial root finding algorithms such as BTA and FFT for polynomials of all
degrees over finite field . The third interesting (surprising)
result shows that 64-bit optimized Strassen matrix multiplication algorithm
only outperforms 64-bit optimized classical matrix multiplication algorithm for
matrices of dimension 750 and above over finite field . It should
be noted that existing literatures and practices recommend Strassen matrix
multiplication algorithm for matrices of dimension 40 and above. All our
experiments are done on a 64-bit MacBook Pro with i7 CPU and single thread C
codes. It should be noted that the reported results should be appliable to 64
or larger bits CPU architectures. For 32 or smaller bits CPUs, these results
may not be applicable. The source code and library for the algorithms covered
in this paper are available at http://quantumca.org/
- …