924 research outputs found

    Low-Latency Elliptic Curve Scalar Multiplication

    Get PDF
    This paper presents a low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis. A graphics processing unit implementation using a standardized elliptic curve over a 224-bit prime field, complying with the new 112-bit security level, computes the scalar multiplication in 1.9ms on the NVIDIA GTX 500 architecture family. The presented methods and implementation considerations can be applied to any parallel 32-bit architectur

    Hardware/software optimizations for elliptic curve scalar multiplication on hybrid FPGAs

    Get PDF
    Elliptic curve cryptography (ECC) offers a viable alternative to Rivest-Shamir-Adleman (RSA) by delivering equivalent security with a smaller key size. This has several advantages, including smaller bandwidth demands, faster key exchange, and lower latency encryption and decryption. The fundamental operation for ECC is scalar point multiplication, wherein a point P on an elliptic curve defined over a finite field is multiplied by a scalar k. The complexity of this operation requires a hardware implementation to achieve high performance. The algorithms involved in scalar point multiplication are constantly evolving, incorporating the latest developments in number theory to improve computation time. These competing needs, high performance and flexibility, have caused previous implementations to either limit their adaptability or to incur performance losses. This thesis explores the use of a hybrid-FPGA for scalar point multiplication. A hybrid- FPGA contains a general purpose processor (GPP) in addition to reconfigurable fabric. This allows for a software/hardware co-design with low latency communication between the GPP and custom hardware. The elliptic curve operations and finite field inversion are programmed in C code. All other finite field arithmetic is implemented in the FPGA hardware, providing higher performance while retaining flexibility. The resulting implementation achieves speedups ranging from 24 times to 55 times faster than an optimized software implementation executing on a Pentium II workstation. The scalability of the design is investigated in two directions: faster finite field multiplication and increased instruction level parallelism exploitation. Increasing the number of parallel arithmetic units beyond two is shown to be less efficient than increasing the speed of the finite field multiplier

    A versatile Montgomery multiplier architecture with characteristic three support

    Get PDF
    We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%

    Implementing a protected zone in a reconfigurable processor for isolated execution of cryptographic algorithms

    Get PDF
    We design and realize a protected zone inside a reconfigurable and extensible embedded RISC processor for isolated execution of cryptographic algorithms. The protected zone is a collection of processor subsystems such as functional units optimized for high-speed execution of integer operations, a small amount of local memory, and general and special-purpose registers. We outline the principles for secure software implementation of cryptographic algorithms in a processor equipped with the protected zone. We also demonstrate the efficiency and effectiveness of the protected zone by implementing major cryptographic algorithms, namely RSA, elliptic curve cryptography, and AES in the protected zone. In terms of time efficiency, software implementations of these three cryptographic algorithms outperform equivalent software implementations on similar processors reported in the literature. The protected zone is designed in such a modular fashion that it can easily be integrated into any RISC processor; its area overhead is considerably moderate in the sense that it can be used in vast majority of embedded processors. The protected zone can also provide the necessary support to implement TPM functionality within the boundary of a processor

    Analysis of Parallel Montgomery Multiplication in CUDA

    Get PDF
    For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

    Throughput/Area-efficient ECC Processor Using Montgomery Point Multiplication on FPGA

    Get PDF
    High throughput while maintaining low resource is a key issue for elliptic curve cryptography (ECC) hardware implementations in many applications. In this brief, an ECC processor architecture over Galois fields is presented, which achieves the best reported throughput/area performance on field-programmable gate array (FPGA) to date. A novel segmented pipelining digit serial multiplier is developed to speed up ECC point multiplication. To achieve low latency, a new combined algorithm is developed for point addition and point doubling with careful scheduling. A compact and flexible distributed-RAM-based memory unit design is developed to increase speed while keeping area low. Further optimizations were made via timing constraints and logic level modifications at the implementation level. The proposed architecture is implemented on Virtex4 (V4), Virtex5 (V5), and Virtex7 (V7) FPGA technologies and, respectively, achieved throughout/slice figures of 19.65, 65.30, and 64.48 (106/(Seconds × Slices))

    Efficient Arithmetic for the Implementation of Elliptic Curve Cryptography

    Get PDF
    The technology of elliptic curve cryptography is now an important branch in public-key based crypto-system. Cryptographic mechanisms based on elliptic curves depend on the arithmetic of points on the curve. The most important arithmetic is multiplying a point on the curve by an integer. This operation is known as elliptic curve scalar (or point) multiplication operation. A cryptographic device is supposed to perform this operation efficiently and securely. The elliptic curve scalar multiplication operation is performed by combining the elliptic curve point routines that are defined in terms of the underlying finite field arithmetic operations. This thesis focuses on hardware architecture designs of elliptic curve operations. In the first part, we aim at finding new architectures to implement the finite field arithmetic multiplication operation more efficiently. In this regard, we propose novel schemes for the serial-out bit-level (SOBL) arithmetic multiplication operation in the polynomial basis over F_2^m. We show that the smallest SOBL scheme presented here can provide about 26-30\% reduction in area-complexity cost and about 22-24\% reduction in power consumptions for F_2^{163} compared to the current state-of-the-art bit-level multiplier schemes. Then, we employ the proposed SOBL schemes to present new hybrid-double multiplication architectures that perform two multiplications with latency comparable to the latency of a single multiplication. Then, in the second part of this thesis, we investigate the different algorithms for the implementation of elliptic curve scalar multiplication operation. We focus our interest in three aspects, namely, the finite field arithmetic cost, the critical path delay, and the protection strength from side-channel attacks (SCAs) based on simple power analysis. In this regard, we propose a novel scheme for the scalar multiplication operation that is based on processing three bits of the scalar in the exact same sequence of five point arithmetic operations. We analyse the security of our scheme and show that its security holds against both SCAs and safe-error fault attacks. In addition, we show how the properties of the proposed elliptic curve scalar multiplication scheme yields an efficient hardware design for the implementation of a single scalar multiplication on a prime extended twisted Edwards curve incorporating 8 parallel multiplication operations. Our comparison results show that the proposed hardware architecture for the twisted Edwards curve model implemented using the proposed scalar multiplication scheme is the fastest secure SCA protected scalar multiplication scheme over prime field reported in the literature

    Hardware Implementation of Efficient Elliptic Curve Scalar Multiplication using Vedic Multiplier

    Get PDF
    This paper presents an area efficient and high-speed FPGA implementation of scalar multiplication using a Vedic multiplier. Scalar multiplication is the most important operation in Elliptic Curve Cryptography(ECC), which used for public key generation and the performance of ECC greatly depends on it. The scalar multiplication is multiplying integer k with scalar P to compute  Q=kP, where k is private key and P is a base point on the Elliptic curve. The Scalar multiplication underlying finite field arithmetic operation i.e. addition multiplication, squaring and inversion to compute Q. From these finite field operations, multiplication is the most time-consuming operation, occupy more device space and it dominates the speed of Scalar multiplication. This paper presents an efficient implementation of finite field multiplication using a Vedic multiplier.  The scalar multiplier is designed over Galois Binary field GF(2233) for field size=233-bit which is secured curve according to NIST.  The performances of the proposed design are evaluated by comparing it with  Karatsuba based scalar multiplier for area and delay. The results show that the proposed scalar multiplication using Vedic multiplier has consumed 22% less area on FPGA and also has 12% less delay, than Karatsuba, based scalar multiplier. The scalar multiplier is coded in Verilog HDL, synthesize and simulated in Xilinx 13.2 ISE on Virtex6 FPGA
    corecore