156 research outputs found

    Parametric, Secure and Compact Implementation of RSA on FPGA

    Get PDF
    We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications in 7.62 ÎĽs and 27.0 ÎĽs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks

    Efficient Hardware Accelerator for IPSec Based on Partial Reconfiguration on Xilinx FPGAs

    Full text link
    Abstract—In this paper we present a practical low-end embed-ded system solution for Internet Protocol Security (IPSec) imple-mented on the smallest Xilinx Field Programmable Gate Array (FPGA) device in the Virtex 4 family. The proposed solution supports the three main IPSec protocols: Encapsulating Security Payload (ESP), Authentication Header (AH) and Internet Key Exchange (IKE). This system uses efficiently hardware-software co-design and partial reconfiguration techniques. Thanks to utilization of both methods we were able to save a significant portion of hardware resources with a relatively small penalty in terms of performance. In this work we propose a division of the basic mechanisms of IPSec protocols, namely cryptographic algorithms and their modes of operation to be implemented either in software or hardware. Through this, we were able to combine the high performance offered by a hardware solution with the flexibility of a software implementation. We show that a typical IPSec protocol configuration can be combined with Partial Reconfiguration techniques in order to efficiently utilize hardware resources. Index Terms—Partial reconfiguration; IPSec; Xilinx FPGA I

    A Programmable SoC-Based Accelerator for Privacy-Enhancing Technologies and Functional Encryption

    Get PDF
    A multitude of privacy-enhancing technologies (PETs) has been presented recently to solve the privacy problems of contemporary services utilizing cloud computing. Many of them are based on additively homomorphic encryption (AHE) that allows the computation of additions on encrypted data. The main technical obstacles for adaptation of PETs in practical systems are related to performance overheads compared with current privacy-violating alternatives. In this article, we present a hardware/software (HW/SW) codesign for programmable systems-on-chip (SoCs) that is designed for accelerating applications based on the Paillier encryption. Our implementation is a microcode-based multicore architecture that is suitable for accelerating various PETs using AHE with large integer modular arithmetic. We instantiate the implementation in a Xilinx Zynq-7000 programmable SoC and provide performance evaluations in real hardware. We also investigate its efficiency in a high-end Xilinx UltraScale+ programmable SoC. We evaluate the implementation with two target use cases that have relevance in PETs: privacy-preserving computation of squared Euclidean distances over encrypted data and multi-input functional encryption (FE) for inner products. Both of them represent the first hardware acceleration results for such operations, and in particular, the latter one is among the very first published implementation results of FE on any platform.Peer reviewe

    Dynamic Polymorphic Reconfiguration to Effectively “CLOAK” a Circuit’s Function

    Get PDF
    Today\u27s society has become more dependent on the integrity and protection of digital information used in daily transactions resulting in an ever increasing need for information security. Additionally, the need for faster and more secure cryptographic algorithms to provide this information security has become paramount. Hardware implementations of cryptographic algorithms provide the necessary increase in throughput, but at a cost of leaking critical information. Side Channel Analysis (SCA) attacks allow an attacker to exploit the regular and predictable power signatures leaked by cryptographic functions used in algorithms such as RSA. In this research the focus on a means to counteract this vulnerability by creating a Critically Low Observable Anti-Tamper Keeping Circuit (CLOAK) capable of continuously changing the way it functions in both power and timing. This research has determined that a polymorphic circuit design capable of varying circuit power consumption and timing can protect a cryptographic device from an Electromagnetic Analysis (EMA) attacks. In essence, we are effectively CLOAKing the circuit functions from an attacker

    An RSA Encryption Hardware Algorithm using a Single DSP Block and a Single Block RAM on the FPGA

    Full text link

    Fast, compact and secure implementation of rsa on dedicated hardware

    Get PDF
    RSA is the most popular Public Key Cryptosystem (PKC) and is heavily used today. PKC comes into play, when two parties, who have previously never met, want to create a secure channel between them. The core operation in RSA is modular multiplication, which requires lots of computational power especially when the operands are longer than 1024-bits. Although today’s powerful PC’s can easily handle one RSA operation in a fraction of a second, small devices such as PDA’s, cell phones, smart cards, etc. have limited computational power, thus there is a need for dedicated hardware which is specially designed to meet the demand of this heavy calculation. Additionally, web servers, which thousands of users can access at the same time, need to perform many PKC operations in a very short time and this can create a performance bottleneck. Special algorithms implemented on dedicated hardware can take advantage of true massive parallelism and high utilization of the data path resulting in high efficiency in terms of both power and execution time while keeping the chip cost low. We will use the “Montgomery Modular Multiplication” algorithm in our implementation, which is considered one of the most efficient multiplication schemes, and has many applications in PKC. In the first part of the thesis, our “2048-bit Radix-4 based Modular Multiplier” design is introduced and compared with the conventional radix-2 modular multipliers of previous works. Our implementation for 2048-bit modular multiplication features up to 82% shorter execution time with 33% increase in the area over the conventional radix-2 designs and can achieve 132 MHz on a Xilinx xc2v6000 FPGA. The proposed multiplier has one of the fastest execution times in terms of latency and performs better than (37% better) our reference radix-2 design in terms of time-area product. The results are similar in the ASIC case where we implement our design for UMC 0.18 μm technology. In the second part, a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit intended for inexpensive FPGAs are presented. The design utilizes hardwired block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The deployed method is based on the Montgomery modular multiplication algorithm and the architecture utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications* in 7.62 μs and 27.0 μs respectively with approximately the same device usage on Xilinx Spartan-3E 500. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine easily fits into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks with an insignificant overhead in area and execution time. The upper limit on the operand precision is dictated only by the available Block-RAM to accommodate the operands within the FPGA. This design is also compared to the first one, considering the relative advantages and disadvantages of each circuit

    ALU for mbedTLS Diffie-Hellman Parameters Generator on FPGA Embedded Processor System

    Get PDF
    Safe prime is a unique subset of the general prime number where both p and (p-1)/2 are primes. Commonly used Public Key encryption scheme Diffie-Hellman key exchange algorithm utilizes ultra large safe primes as the private key. In practice, crypto software libraries implement a specific Diffie-Hellman parameters generator that searches for safe primes with Rabin-Miller probabilistic primality test algorithm. Without any proven theory to predict their occurrences among natural numbers, generator programs generally start at a randomly seeded odd positive integer of a predetermined size; and perform primality tests in iterations over incrementing candidates until success. The staggeringly low density of safe primes causes a prohibitive amount of computing resources to be dedicated in the generation process. As the result, power conscious mobile and embedded devices can no longer compute the standard 2048-bit safe primes without causing prolonged disruption to the overall system performance. Based on the hot path analysis of the generator program, a parallelized and pipelined ALU is proposed and implemented on the FPGA embedded processor system. Utilizing merely 3% of LUT (584/17600) and 20% of DSP (16/80) available from the Xilinx Zynq 7010 All Programmable SoC, the suggested design is theoretically capable of offsetting more than 90% of CPU utilizations needed for the entire safe prime generation process. Such results demonstrate the deficiency of today's general purpose CPU in handling certain complex and resource intensive computations. Such scenarios greatly incentivize the integration of programmable hardware with fixed design CPU. Additional research is suggested to focus in the area of automating the processes of locating the specific CPU intensive task, translating such task onto programmable hardware, and providing software accessible interface to enable fast development and deployment of the hot function based programmable hardware design. From there, programmable hardware assisted computing platforms can be further enhanced to dynamically program hardware modules based on real-time utilizations to achieve even greater overall system performance. A new system design paradigm can potentially be introduced as the result

    Speed reading in the dark : Accelerating functional encryption for quadratic functions with reprogrammable hardware

    Get PDF
    Functional encryption is a new paradigm for encryption where decryption does not give the entire plaintext but only some function of it. Functional encryption has great potential in privacy-enhancing technologies but suffers from excessive computational overheads. We introduce the first hardware accelerator that supports functional encryption for quadratic functions. Our accelerator is implemented on a reprogrammable system-on-chip following the hardware/software codesign methogol-ogy. We benchmark our implementation for two privacy-preserving machine learning applications: (1) classification of handwritten digits from the MNIST database and (2) classification of clothes images from the Fashion MNIST database. In both cases, classification is performed with encrypted images. We show that our implementation offers speedups of over 200 times compared to a published software implementation and permits applications which are unfeasible with software-only solutions.Peer reviewe

    Speed reading in the dark : Accelerating functional encryption for quadratic functions with reprogrammable hardware

    Get PDF
    Functional encryption is a new paradigm for encryption where decryption does not give the entire plaintext but only some function of it. Functional encryption has great potential in privacy-enhancing technologies but suffers from excessive computational overheads. We introduce the first hardware accelerator that supports functional encryption for quadratic functions. Our accelerator is implemented on a reprogrammable system-on-chip following the hardware/software codesign methogol-ogy. We benchmark our implementation for two privacy-preserving machine learning applications: (1) classification of handwritten digits from the MNIST database and (2) classification of clothes images from the Fashion MNIST database. In both cases, classification is performed with encrypted images. We show that our implementation offers speedups of over 200 times compared to a published software implementation and permits applications which are unfeasible with software-only solutions.Peer reviewe

    Highly secure cryptographic computations against side-channel attacks

    Get PDF
    Side channel attacks (SCAs) have been considered as great threats to modern cryptosystems, including RSA and elliptic curve public key cryptosystems. This is because the main computations involved in these systems, as the Modular Exponentiation (ME) in RSA and scalar multiplication (SM) in elliptic curve system, are potentially vulnerable to SCAs. Montgomery Powering Ladder (MPL) has been shown to be a good choice for ME and SM with counter-measures against certain side-channel attacks. However, recent research shows that MPL is still vulnerable to some advanced attacks [21, 30 and 34]. In this thesis, an improved sequence masking technique is proposed to enhance the MPL\u27s resistance towards Differential Power Analysis (DPA). Based on the new technique, a modified MPL with countermeasure in both data and computation sequence is developed and presented. Two efficient hardware architectures for original MPL algorithm are also presented by using binary and radix-4 representations, respectively
    • …
    corecore