1,116 research outputs found

    Parametric, Secure and Compact Implementation of RSA on FPGA

    Get PDF
    We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks

    Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

    Get PDF
    Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

    Analysis of Parallel Montgomery Multiplication in CUDA

    Get PDF
    For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

    Homomorphic Data Isolation for Hardware Trojan Protection

    Full text link
    The interest in homomorphic encryption/decryption is increasing due to its excellent security properties and operating facilities. It allows operating on data without revealing its content. In this work, we suggest using homomorphism for Hardware Trojan protection. We implement two partial homomorphic designs based on ElGamal encryption/decryption scheme. The first design is a multiplicative homomorphic, whereas the second one is an additive homomorphic. We implement the proposed designs on a low-cost Xilinx Spartan-6 FPGA. Area utilization, delay, and power consumption are reported for both designs. Furthermore, we introduce a dual-circuit design that combines the two earlier designs using resource sharing in order to have minimum area cost. Experimental results show that our dual-circuit design saves 35% of the logic resources compared to a regular design without resource sharing. The saving in power consumption is 20%, whereas the number of cycles needed remains almost the sam

    A versatile Montgomery multiplier architecture with characteristic three support

    Get PDF
    We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%

    Maximizing the Efficiency using Montgomery Multipliers on FPGA in RSA Cryptography for Wireless Sensor Networks

    Get PDF
    The architecture and modeling of RSA public key encryption/decryption systems are presented in this work. Two different architectures are proposed, mMMM42 (modified Montgomery Modular Multiplier 4 to 2 Carry Save Architecture) and RSACIPHER128 to check the suitability for implementation in Wireless Sensor Nodes to utilize the same in Wireless Sensor Networks. It can easily be fitting into systems that require different levels of security by changing the key size. The processing time is increased and space utilization is reduced in FPGA due to its reusability. VHDL code is synthesized and simulated using Xilinx-ISE for both the architectures. Architectures are compared in terms of area and time. It is verified that this architecture support for a key size of 128bits. The implementation of RSA encryption/decryption algorithm on FPGA using 128 bits data and key size with RSACIPHER128 gives good result with 50% less utilization of hardware. This design is also implemented for ASIC using Mentor Graphics

    Implementing a protected zone in a reconfigurable processor for isolated execution of cryptographic algorithms

    Get PDF
    We design and realize a protected zone inside a reconfigurable and extensible embedded RISC processor for isolated execution of cryptographic algorithms. The protected zone is a collection of processor subsystems such as functional units optimized for high-speed execution of integer operations, a small amount of local memory, and general and special-purpose registers. We outline the principles for secure software implementation of cryptographic algorithms in a processor equipped with the protected zone. We also demonstrate the efficiency and effectiveness of the protected zone by implementing major cryptographic algorithms, namely RSA, elliptic curve cryptography, and AES in the protected zone. In terms of time efficiency, software implementations of these three cryptographic algorithms outperform equivalent software implementations on similar processors reported in the literature. The protected zone is designed in such a modular fashion that it can easily be integrated into any RISC processor; its area overhead is considerably moderate in the sense that it can be used in vast majority of embedded processors. The protected zone can also provide the necessary support to implement TPM functionality within the boundary of a processor

    Efficient Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT Coordinators

    Get PDF
    Security management for IoT applications is a critical research field, especially when taking into account the performance variation over the very different IoT devices. In this paper, we present high-performance client/server coordinators on low-cost SoC-FPGA devices for secure IoT data collection. Security is ensured by using the Transport Layer Security (TLS) protocol based on the TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 cipher suite. The hardware architecture of the proposed coordinators is based on SW/HW co-design, implementing within the hardware accelerator core Elliptic Curve Scalar Multiplication (ECSM), which is the core operation of Elliptic Curve Cryptosystems (ECC). Meanwhile, the control of the overall TLS scheme is performed in software by an ARM Cortex-A9 microprocessor. In fact, the implementation of the ECC accelerator core around an ARM microprocessor allows not only the improvement of ECSM execution but also the performance enhancement of the overall cryptosystem. The integration of the ARM processor enables to exploit the possibility of embedded Linux features for high system flexibility. As a result, the proposed ECC accelerator requires limited area, with only 3395 LUTs on the Zynq device used to perform high-speed, 233-bit ECSMs in 413 µs, with a 50 MHz clock. Moreover, the generation of a 384-bit TLS handshake secret key between client and server coordinators requires 67.5 ms on a low cost Zynq 7Z007S device
    corecore