78 research outputs found

    A versatile Montgomery multiplier architecture with characteristic three support

    Get PDF
    We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%

    Hardware implementation of elliptic curve Diffie-Hellman key agreement scheme in GF(p)

    Get PDF
    With the advent of technology there are many applications that require secure communication. Elliptic Curve Public-key Cryptosystems are increasingly becoming popular due to their small key size and efficient algorithm. Elliptic curves are widely used in various key exchange techniques including Diffie-Hellman Key Agreement scheme. Modular multiplication and modular division are one of the basic operations in elliptic curve cryptography. Much effort has been made in developing efficient modular multiplication designs, however few works has been proposed for the modular division. Nevertheless, these operations are needed in various cryptographic systems. This thesis examines various scalable implementations of elliptic curve scalar multiplication employing multiplicative inverse or field division in GF(p) focussing mainly on modular divison architectures. Next, this thesis presents a new architecture for modular division based on the variant of Extended Binary GCD algorithm. The main contribution at system level architecture to the modular division unit is use of counters in place of shift registers that are basis of the algorithm and modifying the algorithm to introduce a modular correction unit for the output logic. This results in 62% increase in speed with respect to a prototype design. Finally, using the modular division architecture an Elliptic Curve ALU in GF(p) was implemented which can be used as the core arithmetic unit of an elliptic curve processor. The resulting architecture was targeted to Xilinx Vertex2v6000-bf957 FPGA device and can be implemented for different elliptic curves for almost all practical values of field p. The frequency of the ALU is 58.8 MHz for 128-bits utilizing 20% of the device at 27712 gates which is 30% faster than a prototype implementation with a 2% increase in area utilization. The ALU was tested to perform Diffie-Hellman Key Agreement Scheme and is suitable for other public-key cryptographic algorithms

    Efficient Design and implementation of Elliptic Curve Cryptography on FPGA

    Get PDF

    Novel algorithms and hardware architectures for Montgomery Multiplication over GF(p)

    Get PDF
    This report describes the design and implementation results in FPGAs of a scalable hardware architecture for computing modular multiplication in prime fields GF(pp), based on the Montgomery multiplication (MM) algorithm. Starting from an existing digit-serial version of the MM algorithm, a novel {\it digit-digit} based MM algorithm is derived and two hardware architectures that compute that algorithm are described. In the proposed approach, the input operands (multiplicand, multiplier and modulus) are represented using as radix β=2k\beta = 2^k. Operands of arbitrary size can be multiplied with modular reduction using almost the same hardware since the multiplier\u27s kernel module that performs the modular multiplication depends only on kk. The novel hardware architectures proposed in this paper were verified by modeling them using VHDL and implementing them in the Xilinx FPGAs Spartan and Virtex5. Design trade-offs are analyzed considering different operand sizes commonly used in cryptography and different values for kk. The proposed designs for MM are well suited to be implemented in modern FPGAs, making use of available dedicated multiplier and memory blocks reducing drastically the FPGA\u27s standard logic while keeping an acceptable performance compared with other implementation approaches. From the Virtex5 implementation, the proposed MM multiplier reaches a throughput of 242Mbps using only 219 FPGA slices and achieving a 1024-bit modular multiplication in 4.21μ\musecs

    A survey of hardware implementations of elliptic curve cryptographic systems

    No full text
    Elliptic Curve Cryptography (ECC) has gained much recognition over the last decades and has established itself among the well known public-key cryptography schemes, not least due its smaller key size and relatively lower computational effort compared to RSA. The wide employment of Elliptic Curve Cryptography in many different application areas has been leading to a variety of implementation types and domains ranging from pure software approaches over hardware implemenations to hardware/software co-designs. The following review provides an overview of state of the art hardware implemenations of ECC, specifically in regard to their targeted design goals. In this context the suitability of the hardware/software approach in regard to the security challenges opposed by the low-end embedded devices of the Internet of Things is briefly examined. The paper also outlines ECC’s vulnerability against quantum attacks and references one possible solution to that problem

    Pipelining GF(P) Elliptic Curve Cryptography Computation

    Get PDF
    This paper proposes a new method to compute Elliptic Curve Cryptography in Galois Fields GF(p). The method incorporates pipelining to utilize the benefit of both parallel and serial methodology used before. It allows the exploitation of the inherited independency that exists in elliptic curve point addition and doubling operations. The results showed attraction because of its improvement over many parallel and serial techniques of elliptic curve crypto-computations

    Unified field multiplier for GF(p) and GF(2 n) with novel digit encoding

    Get PDF
    In recent years, there has been an increase in demand for unified field multipliers for Elliptic Curve Cryptography in the electronics industry because they provide flexibility for customers to choose between Prime (GF(p)) and Binary (GF(2")) Galois Fields. Also, having the ability to carry out arithmetic over both GF(p) and GF(2") in the same hardware provides the possibility of performing any cryptographic operation that requires the use of both fields. The unified field multiplier is relatively future proof compared with multipliers that only perform arithmetic over a single chosen field. The security provided by the architecture is also very important. It is known that the longer the key length, the more susceptible the system is to differential power attacks due to the increased amount of data leakage. Therefore, it is beneficial to design hardware that is scalable, so that more data can be processed per cycle. Another advantage of designing a multiplier that is capable of dealing with long word length is improvement in performance in terms of delay, because less cycles are needed. This is very important because typical elliptic curve cryptography involves key size of 160 bits. A novel unified field radix-4 multiplier using Montgomery Multiplication for the use of G(p) and GF(2") has been proposed. This design makes use of the unexploited state in number representation for operation in GF(2") where all carries are suppressed. The addition is carried out using a modified (4:2) redundant adder to accommodate the extra 1 * state. The proposed adder and the partial product generator design are capable of radix-4 operation, which reduces the number of computation cycles required. Also, the proposed adder is more scalable than existing designs.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Versatile Montgomery Multiplier Architectures

    Get PDF
    Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery\u27s method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption
    corecore