8 research outputs found

    Parametric, Secure and Compact Implementation of RSA on FPGA

    Get PDF
    We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks

    Efficient Hardware Accelerator for IPSec Based on Partial Reconfiguration on Xilinx FPGAs

    Full text link
    Abstract—In this paper we present a practical low-end embed-ded system solution for Internet Protocol Security (IPSec) imple-mented on the smallest Xilinx Field Programmable Gate Array (FPGA) device in the Virtex 4 family. The proposed solution supports the three main IPSec protocols: Encapsulating Security Payload (ESP), Authentication Header (AH) and Internet Key Exchange (IKE). This system uses efficiently hardware-software co-design and partial reconfiguration techniques. Thanks to utilization of both methods we were able to save a significant portion of hardware resources with a relatively small penalty in terms of performance. In this work we propose a division of the basic mechanisms of IPSec protocols, namely cryptographic algorithms and their modes of operation to be implemented either in software or hardware. Through this, we were able to combine the high performance offered by a hardware solution with the flexibility of a software implementation. We show that a typical IPSec protocol configuration can be combined with Partial Reconfiguration techniques in order to efficiently utilize hardware resources. Index Terms—Partial reconfiguration; IPSec; Xilinx FPGA I

    LLMonPro: Low-Latency Montgomery Modular Multiplication Suitable for Verifiable Delay Functions

    Get PDF
    This study presents a method to perform low-latency modular multiplication operation based on both Montgomery and Ozturk methods. The design space exploration of the proposed method on a latest FPGA device is also given. Through series of experiments on the FPGA using an high-level synthesis tool, optimal parameter selection of the proposed method for the low-latency constraint is also presented for the proposed technique

    An RSA Encryption Hardware Algorithm using a Single DSP Block and a Single Block RAM on the FPGA

    Full text link

    Smaller Keys for Code-Based Cryptography: QC-MDPC McEliece Implementations on Embedded Devices

    Get PDF
    In the last years code-based cryptosystems were established as promising alternatives for asymmetric cryptography since they base their security on well-known NP-hard problems and still show decent performance on a wide range of computing platforms. The main drawback of code-based schemes, including the popular proposals by McEliece and Niederreiter, are the large keys whose size is inherently determined by the underlying code. In a very recent approach, Misoczki et al. proposed to use quasi-cyclic MDPC (QC-MDPC) codes that allow for a very compact key representation. In this work, we investigate novel implementations of the McEliece scheme using such QC-MDPC codes tailored for embedded devices, namely a Xilinx Virtex-6 FPGA and an 8-bit AVR microcontroller. In particular, we evaluate and improve different approaches to decode QC-MDPC codes. Besides competitive performance for encryption and decryption on the FPGA, we achieved a very compact implementation on the microcontroller using only 4,800 and 9,600 bits for the public and secret key at 80 bits of equivalent symmetric security

    Fast, compact and secure implementation of rsa on dedicated hardware

    Get PDF
    RSA is the most popular Public Key Cryptosystem (PKC) and is heavily used today. PKC comes into play, when two parties, who have previously never met, want to create a secure channel between them. The core operation in RSA is modular multiplication, which requires lots of computational power especially when the operands are longer than 1024-bits. Although today’s powerful PC’s can easily handle one RSA operation in a fraction of a second, small devices such as PDA’s, cell phones, smart cards, etc. have limited computational power, thus there is a need for dedicated hardware which is specially designed to meet the demand of this heavy calculation. Additionally, web servers, which thousands of users can access at the same time, need to perform many PKC operations in a very short time and this can create a performance bottleneck. Special algorithms implemented on dedicated hardware can take advantage of true massive parallelism and high utilization of the data path resulting in high efficiency in terms of both power and execution time while keeping the chip cost low. We will use the “Montgomery Modular Multiplication” algorithm in our implementation, which is considered one of the most efficient multiplication schemes, and has many applications in PKC. In the first part of the thesis, our “2048-bit Radix-4 based Modular Multiplier” design is introduced and compared with the conventional radix-2 modular multipliers of previous works. Our implementation for 2048-bit modular multiplication features up to 82% shorter execution time with 33% increase in the area over the conventional radix-2 designs and can achieve 132 MHz on a Xilinx xc2v6000 FPGA. The proposed multiplier has one of the fastest execution times in terms of latency and performs better than (37% better) our reference radix-2 design in terms of time-area product. The results are similar in the ASIC case where we implement our design for UMC 0.18 μm technology. In the second part, a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit intended for inexpensive FPGAs are presented. The design utilizes hardwired block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The deployed method is based on the Montgomery modular multiplication algorithm and the architecture utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications* in 7.62 μs and 27.0 μs respectively with approximately the same device usage on Xilinx Spartan-3E 500. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine easily fits into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks with an insignificant overhead in area and execution time. The upper limit on the operand precision is dictated only by the available Block-RAM to accommodate the operands within the FPGA. This design is also compared to the first one, considering the relative advantages and disadvantages of each circuit

    Highly secure cryptographic computations against side-channel attacks

    Get PDF
    Side channel attacks (SCAs) have been considered as great threats to modern cryptosystems, including RSA and elliptic curve public key cryptosystems. This is because the main computations involved in these systems, as the Modular Exponentiation (ME) in RSA and scalar multiplication (SM) in elliptic curve system, are potentially vulnerable to SCAs. Montgomery Powering Ladder (MPL) has been shown to be a good choice for ME and SM with counter-measures against certain side-channel attacks. However, recent research shows that MPL is still vulnerable to some advanced attacks [21, 30 and 34]. In this thesis, an improved sequence masking technique is proposed to enhance the MPL\u27s resistance towards Differential Power Analysis (DPA). Based on the new technique, a modified MPL with countermeasure in both data and computation sequence is developed and presented. Two efficient hardware architectures for original MPL algorithm are also presented by using binary and radix-4 representations, respectively

    Reconfigurable Architectures for Cryptographic Systems

    No full text
    Field Programmable Gate Arrays (FPGAs) are suitable platforms for implementing cryptographic algorithms in hardware due to their flexibility, good performance and low power consumption. Computer security is becoming increasingly important and security requirements such as key sizes are quickly evolving. This creates the need for customisable hardware designs for cryptographic operations capable of covering a large design space. In this thesis we explore the four design dimensions relevant to cryptography - speed, area, power consumption and security of the crypto-system - by developing parametric designs for public-key generation and encryption as well as side-channel attack countermeasures. There are four contributions. First, we present new architectures for Montgomery multiplication and exponentiation based on variable pipelining and variable serial replication. Our implementations of these architectures are compared to the best implementations in the literature and the design space is explored in terms of speed and area trade-offs. Second, we generalise our Montgomery multiplier design ideas by developing a parametric model to allow rapid optimisation of a general class of algorithms containing loops with dependencies carried from one iteration to the next. By predicting the throughput and the area of the design, our model facilitates and speeds up design space exploration. Third, we develop new architectures for primality testing including the first hardware architecture for the NIST approved Lucas primality test. We explore the area, speed and power consumption trade-offs by comparing our Lucas architectures on CPU, FPGA and ASIC. Finally, we tackle the security issue by presenting two novel power attack countermeasures based on on-chip power monitoring. Our constant power framework uses a closed-loop control system to keep the power consumption of any FPGA implementation constant. Our attack detection framework uses a network of ring-oscillators to detect the insertion of a shunt resistor-based power measurement circuit on a device's power rail. This countermeasure is lightweight and has a relatively low power overhead compared to existing masking and hiding countermeasures
    corecore