1,145 research outputs found

    Efficient Algorithms for Elliptic Curve Cryptosystems

    Get PDF
    Elliptic curves are the basis for a relative new class of public-key schemes. It is predicted that elliptic curves will replace many existing schemes in the near future. It is thus of great interest to develop algorithms which allow efficient implementations of elliptic curve crypto systems. This thesis deals with such algorithms. Efficient algorithms for elliptic curves can be classified into low-level algorithms, which deal with arithmetic in the underlying finite field and high-level algorithms, which operate with the group operation. This thesis describes three new algorithms for efficient implementations of elliptic curve cryptosystems. The first algorithm describes the application of the Karatsuba-Ofman Algorithm to multiplication in composite fields GF((2n)m). The second algorithm deals with efficient inversion in composite Galois fields of the form GF((2n)m). The third algorithm is an entirely new approach which accelerates the multiplication of points which is the core operation in elliptic curve public-key systems. The algorithm explores computational advantages by computing repeated point doublings directly through closed formulae rather than from individual point doublings. Finally we apply all three algorithms to an implementation of an elliptic curve system over GF((216)11). We provide ablolute performance measures for the field operations and for an entire point multiplication. We also show the improvements gained by the new point multiplication algorithm in conjunction with the k-ary and improved k-ary methods for exponentiation

    Private and Public-Key Side-Channel Threats Against Hardware Accelerated Cryptosystems

    Get PDF
    Modern side-channel attacks (SCA) have the ability to reveal sensitive data from non-protected hardware implementations of cryptographic accelerators whether they be private or public-key systems. These protocols include but are not limited to symmetric, private-key encryption using AES-128, 192, 256, or public-key cryptosystems using elliptic curve cryptography (ECC). Traditionally, scalar point (SP) operations are compelled to be high-speed at any cost to reduce point multiplication latency. The majority of high-speed architectures of contemporary elliptic curve protocols rely on non-secure SP algorithms. This thesis delivers a novel design, analysis, and successful results from a custom differential power analysis attack on AES-128. The resulting SCA can break any 16-byte master key the sophisticated cipher uses and it\u27s direct applications towards public-key cryptosystems will become clear. Further, the architecture of a SCA resistant scalar point algorithm accompanied by an implementation of an optimized serial multiplier will be constructed. The optimized hardware design of the multiplier is highly modular and can use either NIST approved 233 & 283-bit Kobliz curves utilizing a polynomial basis. The proposed architecture will be implemented on Kintex-7 FPGA to later be integrated with the ARM Cortex-A9 processor on the Zynq-7000 AP SoC (XC7Z045) for seamless data transfer and analysis of the vulnerabilities SCAs can exploit

    Throughput/Area-efficient ECC Processor Using Montgomery Point Multiplication on FPGA

    Get PDF
    High throughput while maintaining low resource is a key issue for elliptic curve cryptography (ECC) hardware implementations in many applications. In this brief, an ECC processor architecture over Galois fields is presented, which achieves the best reported throughput/area performance on field-programmable gate array (FPGA) to date. A novel segmented pipelining digit serial multiplier is developed to speed up ECC point multiplication. To achieve low latency, a new combined algorithm is developed for point addition and point doubling with careful scheduling. A compact and flexible distributed-RAM-based memory unit design is developed to increase speed while keeping area low. Further optimizations were made via timing constraints and logic level modifications at the implementation level. The proposed architecture is implemented on Virtex4 (V4), Virtex5 (V5), and Virtex7 (V7) FPGA technologies and, respectively, achieved throughout/slice figures of 19.65, 65.30, and 64.48 (106/(Seconds × Slices))

    High Speed and Low Latency ECC Implementation over GF(2m) on FPGA

    Get PDF
    In this paper, a novel high-speed elliptic curve cryptography (ECC) processor implementation for point multiplication (PM) on field-programmable gate array (FPGA) is proposed. A new segmented pipelined full-precision multiplier is used to reduce the latency, and the Lopez-Dahab Montgomery PM algorithm is modified for careful scheduling to avoid data dependency resulting in a drastic reduction in the number of clock cycles (CCs) required. The proposed ECC architecture has been implemented on Xilinx FPGAs' Virtex4, Virtex5, and Virtex7 families. To the best of our knowledge, our single- and three-multiplier-based designs show the fastest performance to date when compared with reported works individually. Our one-multiplier-based ECC processor also achieves the highest reported speed together with the best reported area-time performance on Virtex4 (5.32 μs at 210 MHz), on Virtex5 (4.91 μs at 228 MHz), and on the more advanced Virtex7 (3.18 μs at 352 MHz). Finally, the proposed three-multiplier-based ECC implementation is the first work reporting the lowest number of CCs and the fastest ECC processor design on FPGA (450 CCs to get 2.83 μs on Virtex7)

    Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2<sup>m</sup>)

    Get PDF
    Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures. This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF(2m) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein's) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved. Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w. r. t. other designs while being more flexible, reliable and with minimum number of inter-cell interconnects. The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delays structure. Further a novel restoring mechanism for result sequences of EEA is proposed using a double delay element implementation. Finally, using this systolic architecture a CMD (Combined Multiplier Divider) datapath is designed which is used as the core of a novel systolic elliptic curve processor. This EC processor uses affine coordinates to compute scalar point multiplication which results in having a very small control unit and negligible with respect to the datapath for all practical values of m. The throughput of this EC based on this bit serial systolic architecture is comparable with designs many times larger than itself reported previously

    Large substitution boxes with efficient combinational implementations

    Get PDF
    At a fundamental level, the security of symmetric key cryptosystems ties back to Claude Shannon\u27s properties of confusion and diffusion. Confusion can be defined as the complexity of the relationship between the secret key and ciphertext, and diffusion can be defined as the degree to which the influence of a single input plaintext bit is spread throughout the resulting ciphertext. In constructions of symmetric key cryptographic primitives, confusion and diffusion are commonly realized with the application of nonlinear and linear operations, respectively. The Substitution-Permutation Network design is one such popular construction adopted by the Advanced Encryption Standard, among other block ciphers, which employs substitution boxes, or S-boxes, for nonlinear behavior. As a result, much research has been devoted to improving the cryptographic strength and implementation efficiency of S-boxes so as to prohibit cryptanalysis attacks that exploit weak constructions and enable fast and area-efficient hardware implementations on a variety of platforms. To date, most published and standardized S-boxes are bijective functions on elements of 4 or 8 bits. In this work, we explore the cryptographic properties and implementations of 8 and 16 bit S-boxes. We study the strength of these S-boxes in the context of Boolean functions and investigate area-optimized combinational hardware implementations. We then present a variety of new 8 and 16 bit S-boxes that have ideal cryptographic properties and enable low-area combinational implementations

    Efficient algorithms for pairing-based cryptosystems

    Get PDF
    We describe fast new algorithms to implement recent cryptosystems based on the Tate pairing. In particular, our techniques improve pairing evaluation speed by a factor of about 55 compared to previously known methods in characteristic 3, and attain performance comparable to that of RSA in larger characteristics.We also propose faster algorithms for scalar multiplication in characteristic 3 and square root extraction over Fpm, the latter technique being also useful in contexts other than that of pairing-based cryptography
    corecore