292 research outputs found

    Parallelizing GF(P) Elliptic Curve Cryptography Computations for Security and Speed

    Get PDF
    The elliptic curve cryptography can be observed as two levels of computations, upper scalar multiplication level and lower point operations level. We combine the inherited parallelism in both levels to reduce the delay and improve security against the simple power attack. The best security and speed performance is achieved when parallelizing the computation to eight parallel multiplication operations. This strategy is worth considering since it shows very attractive performance conclusions

    High Speed Low Power GF(2k) Elliptic Curve Cryptography Processor Architecture

    Get PDF
    A new elliptic curve cryptographic processor architecture is proposed in this paper. It gives a choice of performance base depending on the importance of speed and/or power consumption. This flexibility is accomplished by utilizing the normal parallelism in the elliptic curve point operations. Scalable multipliers are adopted to compensate for the extra hardware due to parallelism instead of using the conventional parallel multipliers. It is shown in the paper that this parallelism can be exploited either to increase the speed of operation or to reduce power consumption by reducing the frequency of operation

    A low-complexity Edward-Curve point multiplication architecture

    Get PDF
    The Binary Edwards Curves (BEC) are becoming more and more important, as compared to other forms of elliptic curves, thanks to their faster operations and resistance against side channel attacks. This work provides a low-complexity architecture for point multiplication computations using BEC over GF(2 233). There are three major contributions in this article. The first contribution is the reduction of instruction-level complexity for unified point addition and point doubling laws by eliminating multiple operations in a single instruction format. The second contribution is the optimization of hardware resources by minimizing the number of required storage elements. Finally, the third contribution is to reduce the number of required clock cycles by incorporating a 32-bit finite field digit-parallel multiplier in the datapath. As a result, the achieved throughput over area ratio over GF(2 233) on Virtex-4, Virtex-5, Virtex-6 and Virtex-7 Xilinx FPGA (Field Programmable Gate Array) devices are 2.29, 19.49, 21.5 and 20.82, respectively. Furthermore, on the Virtex-7 device, the required computation time for one point multiplication operation is 18 µs, while the power consumption is 266 mW. This reveals that the proposed architecture is best suited for those applications where the optimization of both area and throughput parameters are required at the same time

    High Speed Low Power GF(2k) Elliptic Curve Cryptography Processor Architecture

    Get PDF
    A new elliptic curve cryptographic processor architecture is proposed in this paper. It gives a choice of performance base depending on the importance of speed and/or power consumption. This flexibility is accomplished by utilizing the normal parallelism in the elliptic curve point operations. Scalable multipliers are adopted to compensate for the extra hardware due to parallelism instead of using the conventional parallel multipliers. It is shown in the paper that this parallelism can be exploited either to increase the speed of operation or to reduce power consumption by reducing the frequency of operation

    Parallelizing GF(P) Elliptic Curve Cryptography Computations for Security and Speed

    Get PDF
    The elliptic curve cryptography can be observed as two levels of computations, upper scalar multiplication level and lower point operations level. We combine the inherited parallelism in both levels to reduce the delay and improve security against the simple power attack. The best security and speed performance is achieved when parallelizing the computation to eight parallel multiplication operations. This strategy is worth considering since it shows very attractive performance conclusions

    Bit-parallel word-serial polynomial basis finite field multiplier in GF(2(233)).

    Get PDF
    Smart card gains extensive uses as a cryptographic hardware in security applications in daily life. The characteristics of smart card require that the cryptographic hardware inside the smart card have the trade-off between area and speed. There are two main public key cryptosystems, these are RSA cryptosystem and elliptic curve (EC) cryptosystem. EC has many advantages compared with RSA such as shorter key length and more suitable for VLSI implementation. Such advantages make EC an ideal candidate for smart card. Finite field multiplier is the key component in EC hardware. In this thesis, bit-parallel word-serial (BPWS) polynomial basis (PB) finite field multipliers are designed. Such architectures trade-off area with speed and are very useful for smart card. An ASIC chip which can perform finite field multiplication and finite field squaring using the BPWS PB finite field multiplier is designed in this thesis. The proposed circuit has been implemented using TSMC 0.18 CMOS technology. A novel 8 x 233 bit-parallel partial product generator is also designed. This new partial product generator has low circuit complexity. The design algorithm can be easily extended to w x m bit-parallel partial product generator for GF(2m).Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .T36. Source: Masters Abstracts International, Volume: 43-01, page: 0286. Advisers: H. Wu; M. Ahmadi. Thesis (M.A.Sc.)--University of Windsor (Canada), 2004

    Hardware processors for pairing-based cryptography

    Get PDF
    Bilinear pairings can be used to construct cryptographic systems with very desirable properties. A pairing performs a mapping on members of groups on elliptic and genus 2 hyperelliptic curves to an extension of the finite field on which the curves are defined. The finite fields must, however, be large to ensure adequate security. The complicated group structure of the curves and the expensive field operations result in time consuming computations that are an impediment to the practicality of pairing-based systems. The Tate pairing can be computed efficiently using the ɳT method. Hardware architectures can be used to accelerate the required operations by exploiting the parallelism inherent to the algorithmic and finite field calculations. The Tate pairing can be performed on elliptic curves of characteristic 2 and 3 and on genus 2 hyperelliptic curves of characteristic 2. Curve selection is dependent on several factors including desired computational speed, the area constraints of the target device and the required security level. In this thesis, custom hardware processors for the acceleration of the Tate pairing are presented and implemented on an FPGA. The underlying hardware architectures are designed with care to exploit available parallelism while ensuring resource efficiency. The characteristic 2 elliptic curve processor contains novel units that return a pairing result in a very low number of clock cycles. Despite the more complicated computational algorithm, the speed of the genus 2 processor is comparable. Pairing computation on each of these curves can be appealing in applications with various attributes. A flexible processor that can perform pairing computation on elliptic curves of characteristic 2 and 3 has also been designed. An integrated hardware/software design and verification environment has been developed. This system automates the procedures required for robust processor creation and enables the rapid provision of solutions for a wide range of cryptographic applications

    A high performance pseudo-multi-core elliptic curve cryptographic processor over GF(2^163)

    Get PDF
    Elliptic curve cryptosystem is one type of public-key system, and it can guarantee the same security level with Rivest, Shamir and Adleman (RSA) with a smaller key size. Therefore, the key of elliptic curve cryptography (ECC) can be more compact, and it brings many advantages such as circuit area, memory requirement, power consumption, performance and bandwidth. However, compared to private key system, like Advanced Encryption Standard (AES), ECC is still much more complicated and computationally intensive. In some real applications, people usually combine private-key system with public-key system to achieve high performance. The ultimate goal of this research is to architect a high performance ECC processor for high performance applications such as network server and cellular sites. In this thesis, a high performance processor for ECC over Galois field (GF)(2^163) by using polynomial presentation is proposed for high-performance applications. It has three finite field (FF) reduced instruction set computer (RISC) cores and a main controller to achieve instruction-level parallelism (ILP) with pipeline so that the largely parallelized algorithm for elliptic curve point multiplication (PM) can be well suited on this platform. Instructions for combined FF operation are proposed to decrease clock cycles in the instruction set. The interconnection among three FF cores and the main controller is obtained by analyzing the data dependency in the parallelized algorithm. Five-stage pipeline is employed in this architecture. Finally, the u-code executed on these three FF cores is manually optimized to save clock cycles. The proposed design can reach 185 MHz with 20; 807 slices when implemented on Xilinx XC4VLX80 FPGA device and 263 MHz with 217,904 gates when synthesized with TSMC .18um CMOS technology. The implementation of the proposed architecture can complete one ECC PM in 1428 cycles, and is 1.3 times faster than the current fastest implementation over GF(2^163) reported in literature while consumes only 14:6% less area on the same FPGA device

    Diseño de criptoprocesadores de curva elíptica sobre gf(2^163) usando bases normales gaussianas

    Get PDF
    This paper presents the efficient hardware implementation of cryptoprocessors that carry out the scalar multiplication kP over finite field GF(2163) using two digit-level multipliers. The finite field arithmetic operations were implemented using Gaussian normal basis (GNB) representation, and the scalar multiplication kP was implemented using Lopez-Dahab algorithm, 2-NAF halve-and-add algorithm and w-tNAF method for Koblitz curves. The processors were designed using VHDL description, synthesized on the Stratix-IV FPGA using Quartus II 12.0 and verified using SignalTAP II and Matlab. The simulation results show that the cryptoprocessors present a very good performance to carry out the scalar multiplication kP. In this case, the computation times of the multiplication kP using Lopez-Dahab, 2-NAF halve-and-add and 16-tNAF for Koblitz curves were 13.37 µs, 16.90 µs and 5.05 µs, respectively.En este trabajo se presenta la implementación eficiente en hardware de criptoprocesadores que permiten llevar a cabo la multiplicación escalar kP sobre el campo finito GF(2163) usando dos multiplicadores a nivel de digito. Las operaciones aritméticas de campo finito fueron implementadas usando la representación de bases normales Gaussianas (GNB), y la multiplicación escalar kP fue implementada usando el algoritmo de López-Dahab, el algoritmo de bisección de punto 2-NAF y el método w-tNAF para curvas de Koblitz. Los criptoprocesadores fueron diseñados usando descripción VHDL, sintetizados en el FPGA Stratix-IV usando Quartus II 12.0 y verificados usando SignalTAP II y Matlab. Los resultados de simulación muestran que los criptoprocesadores presentan un muy buen desempeño para llevar a cabo la multiplicación escalar kP. En este caso, los tiempos de computo de la multiplicación kP usando Lopez-Dahab, bisección de punto 2-NAF y 16-tNAF para curvas de Koblitz fueron 13.37 µs, 16.90 µs and 5.05 µs, respectivamente

    Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2<sup>m</sup>)

    Get PDF
    Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures. This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF(2m) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein's) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved. Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w. r. t. other designs while being more flexible, reliable and with minimum number of inter-cell interconnects. The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delays structure. Further a novel restoring mechanism for result sequences of EEA is proposed using a double delay element implementation. Finally, using this systolic architecture a CMD (Combined Multiplier Divider) datapath is designed which is used as the core of a novel systolic elliptic curve processor. This EC processor uses affine coordinates to compute scalar point multiplication which results in having a very small control unit and negligible with respect to the datapath for all practical values of m. The throughput of this EC based on this bit serial systolic architecture is comparable with designs many times larger than itself reported previously
    corecore