256 research outputs found

    Efficient Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT Coordinators

    Get PDF
    Security management for IoT applications is a critical research field, especially when taking into account the performance variation over the very different IoT devices. In this paper, we present high-performance client/server coordinators on low-cost SoC-FPGA devices for secure IoT data collection. Security is ensured by using the Transport Layer Security (TLS) protocol based on the TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 cipher suite. The hardware architecture of the proposed coordinators is based on SW/HW co-design, implementing within the hardware accelerator core Elliptic Curve Scalar Multiplication (ECSM), which is the core operation of Elliptic Curve Cryptosystems (ECC). Meanwhile, the control of the overall TLS scheme is performed in software by an ARM Cortex-A9 microprocessor. In fact, the implementation of the ECC accelerator core around an ARM microprocessor allows not only the improvement of ECSM execution but also the performance enhancement of the overall cryptosystem. The integration of the ARM processor enables to exploit the possibility of embedded Linux features for high system flexibility. As a result, the proposed ECC accelerator requires limited area, with only 3395 LUTs on the Zynq device used to perform high-speed, 233-bit ECSMs in 413 µs, with a 50 MHz clock. Moreover, the generation of a 384-bit TLS handshake secret key between client and server coordinators requires 67.5 ms on a low cost Zynq 7Z007S device

    Enhancing an Embedded Processor Core with a Cryptographic Unit for Performance and Security

    Get PDF
    We present a set of low-cost architectural enhancements to accelerate the execution of certain arithmetic operations common in cryptographic applications on an extensible embedded processor core. The proposed enhancements are generic in the sense that they can be beneficially applied in almost any RISC processor. We implemented the enhancements in form of a cryptographic unit (CU) that offers the programmer an extended instruction set. The CU features a 128-bit wide register file and datapath, which enables it to process 128-bit words and perform 128-bit loads/stores. We analyze the speed-up factors for some arithmetic operations and public-key cryptographic algorithms obtained through these enhancements. In addition, we evaluate the hardware overhead (i.e. silicon area) of integrating the CU into an embedded RISC processor. Our experimental results show that the proposed architectural enhancements allow for a significant performance gain for both RSA and ECC at the expense of an acceptable increase in silicon area. We also demonstrate that the proposed enhancements facilitate the protection of cryptographic algorithms against certain types of side-channel attacks and present an AES implementation hardened against cache-based attacks as a case study

    A versatile Montgomery multiplier architecture with characteristic three support

    Get PDF
    We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%

    Hardware implementation of elliptic curve Diffie-Hellman key agreement scheme in GF(p)

    Get PDF
    With the advent of technology there are many applications that require secure communication. Elliptic Curve Public-key Cryptosystems are increasingly becoming popular due to their small key size and efficient algorithm. Elliptic curves are widely used in various key exchange techniques including Diffie-Hellman Key Agreement scheme. Modular multiplication and modular division are one of the basic operations in elliptic curve cryptography. Much effort has been made in developing efficient modular multiplication designs, however few works has been proposed for the modular division. Nevertheless, these operations are needed in various cryptographic systems. This thesis examines various scalable implementations of elliptic curve scalar multiplication employing multiplicative inverse or field division in GF(p) focussing mainly on modular divison architectures. Next, this thesis presents a new architecture for modular division based on the variant of Extended Binary GCD algorithm. The main contribution at system level architecture to the modular division unit is use of counters in place of shift registers that are basis of the algorithm and modifying the algorithm to introduce a modular correction unit for the output logic. This results in 62% increase in speed with respect to a prototype design. Finally, using the modular division architecture an Elliptic Curve ALU in GF(p) was implemented which can be used as the core arithmetic unit of an elliptic curve processor. The resulting architecture was targeted to Xilinx Vertex2v6000-bf957 FPGA device and can be implemented for different elliptic curves for almost all practical values of field p. The frequency of the ALU is 58.8 MHz for 128-bits utilizing 20% of the device at 27712 gates which is 30% faster than a prototype implementation with a 2% increase in area utilization. The ALU was tested to perform Diffie-Hellman Key Agreement Scheme and is suitable for other public-key cryptographic algorithms

    Reconfigurable Architecture for Elliptic Curve Cryptography Using FPGA

    Get PDF
    The high performance of an elliptic curve (EC) crypto system depends efficiently on the arithmetic in the underlying finite field. We have to propose and compare three levels of Galois Field , , and . The proposed architecture is based on Lopez-Dahab elliptic curve point multiplication algorithm, which uses Gaussian normal basis for field arithmetic. The proposed is based on an efficient Montgomery add and double algorithm, also the Karatsuba-Ofman multiplier and Itoh-Tsujii algorithm are used as the inverse component. The hardware design is based on optimized finite state machine (FSM), with a single cycle 193 bits multiplier, field adder, and field squarer. The another proposed architecture is based on applications for which compactness is more important than speed. The FPGA’s dedicated multipliers and carry-chain logic are used to obtain the small data path. The different optimization at the hardware level improves the acceleration of the ECC scalar multiplication, increases frequency and the speed of operation such as key generation, encryption, and decryption. Finally, we have to implement our design using Xilinx XC4VLX200 FPGA device

    Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography

    Get PDF
    Efficient implementation of the number theoretic transform(NTT), also known as the discrete Fourier transform(DFT) over a finite field, has been studied actively for decades and found many applications in digital signal processing. In 1971 Schonhage and Strassen proposed an NTT based asymptotically fast multiplication method with the asymptotic complexity O(m log m log log m) for multiplication of mm-bit integers or (m-1)st degree polynomials. Schonhage and Strassen\u27s algorithm was known to be the asymptotically fastest multiplication algorithm until Furer improved upon it in 2007. However, unfortunately, both algorithms bear significant overhead due to the conversions between the time and frequency domains which makes them impractical for small operands, e.g. less than 1000 bits in length as used in many applications. With this work we investigate for the first time the practical application of the NTT, which found applications in digital signal processing, to finite field multiplication with an emphasis on elliptic curve cryptography(ECC). We present efficient parameters for practical application of NTT based finite field multiplication to ECC which requires key and operand sizes as short as 160 bits in length. With this work, for the first time, the use of NTT based finite field arithmetic is proposed for ECC and shown to be efficient. We introduce an efficient algorithm, named DFT modular multiplication, for computing Montgomery products of polynomials in the frequency domain which facilitates efficient multiplication in GF(p^m). Our algorithm performs the entire modular multiplication, including modular reduction, in the frequency domain, and thus eliminates costly back and forth conversions between the frequency and time domains. We show that, especially in computationally constrained platforms, multiplication of finite field elements may be achieved more efficiently in the frequency domain than in the time domain for operand sizes relevant to ECC. This work presents the first hardware implementation of a frequency domain multiplier suitable for ECC and the first hardware implementation of ECC in the frequency domain. We introduce a novel area/time efficient ECC processor architecture which performs all finite field arithmetic operations in the frequency domain utilizing DFT modular multiplication over a class of Optimal Extension Fields(OEF). The proposed architecture achieves extension field modular multiplication in the frequency domain with only a linear number of base field GF(p) multiplications in addition to a quadratic number of simpler operations such as addition and bitwise rotation. With its low area and high speed, the proposed architecture is well suited for ECC in small device environments such as smart cards and wireless sensor networks nodes. Finally, we propose an adaptation of the Itoh-Tsujii algorithm to the frequency domain which can achieve efficient inversion in a class of OEFs relevant to ECC. This is the first time a frequency domain finite field inversion algorithm is proposed for ECC and we believe our algorithm will be well suited for efficient constrained hardware implementations of ECC in affine coordinates

    Low Power Elliptic Curve Cryptography

    Get PDF
    This M.S. thesis introduces new modulus scaling techniques for transforming a class of primes into special forms which enable efficient arithmetic. The scaling technique may be used to improve multiplication and inversion in finite fields. We present an efficient inversion algorithm that utilizes the structure of a scaled modulus. Our inversion algorithm exhibits superior performance to the Euclidean algorithm and lends itself to efficient hardware implementation due to its simplicity. Using the scaled modulus technique and our specialized inversion algorithm we develop an elliptic curve processor architecture. The resulting architecture successfully utilizes redundant representation of elements in GF(p) and provides a low-power, high speed, and small footprint specialized elliptic curve implementation. We also introduce a unified Montgomery multiplier architecture working on the extension fields GF(p), GF(2) and GF(3). With the increasing research activity for identity based encryption schemes, there has been an increasing need for arithmetic operations in field GF(3). Since we based our research on low-power and small footprint applications, we designed a unified architecture rather than having a seperate hardware for GF{3}. To the best of our knowledge, this is the first time a unified architecture was built working on three different extension fields
    corecore