174 research outputs found

    Parametric, Secure and Compact Implementation of RSA on FPGA

    Get PDF
    We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks

    Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

    Get PDF
    Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

    Homomorphic Data Isolation for Hardware Trojan Protection

    Full text link
    The interest in homomorphic encryption/decryption is increasing due to its excellent security properties and operating facilities. It allows operating on data without revealing its content. In this work, we suggest using homomorphism for Hardware Trojan protection. We implement two partial homomorphic designs based on ElGamal encryption/decryption scheme. The first design is a multiplicative homomorphic, whereas the second one is an additive homomorphic. We implement the proposed designs on a low-cost Xilinx Spartan-6 FPGA. Area utilization, delay, and power consumption are reported for both designs. Furthermore, we introduce a dual-circuit design that combines the two earlier designs using resource sharing in order to have minimum area cost. Experimental results show that our dual-circuit design saves 35% of the logic resources compared to a regular design without resource sharing. The saving in power consumption is 20%, whereas the number of cycles needed remains almost the sam

    Comparison of Scalable Montgomery Modular Multiplication Implementations Embedded in Reconfigurable Hardware

    No full text
    International audienceThis paper presents a comparison of possible approaches for an efficient implementation of Multiple-word radix-2 Montgomery Modular Multiplication (MM) on modern Field Programmable Gate Arrays (FPGAs). The hardware implementation of MM coprocessor is fully scalable what means that it can be reused in order to generate long-precision results independently on the word length of the originally proposed coprocessor. The first of analyzed implementations uses a data path based on traditionally used redundant carry-save adders, the second one exploits, in scalable designs not yet applied, standard carry-propagate adders with fast carry chain logic. As a control unit and a platform for purely software implementation an embedded soft-core processor Altera NIOS is employed. All implementations use large embedded memory blocks available in recent FPGAs. Speed and logic requirements comparisons are performed on the optimized software and combined hardware-software designs in Altera FPGAs. The issues of targeting a design specifically for a FPGA are considered taking into account the underlying architecture imposed by the target FPGA technology. It is shown that the coprocessors based on carry-save adders and carry-propagate adders provide comparable results in constrained FPGA implementations but in case of carry-propagate logic, the solution requires less embedded memory and provides some additional implementation advantages presented in the paper

    Implementing a protected zone in a reconfigurable processor for isolated execution of cryptographic algorithms

    Get PDF
    We design and realize a protected zone inside a reconfigurable and extensible embedded RISC processor for isolated execution of cryptographic algorithms. The protected zone is a collection of processor subsystems such as functional units optimized for high-speed execution of integer operations, a small amount of local memory, and general and special-purpose registers. We outline the principles for secure software implementation of cryptographic algorithms in a processor equipped with the protected zone. We also demonstrate the efficiency and effectiveness of the protected zone by implementing major cryptographic algorithms, namely RSA, elliptic curve cryptography, and AES in the protected zone. In terms of time efficiency, software implementations of these three cryptographic algorithms outperform equivalent software implementations on similar processors reported in the literature. The protected zone is designed in such a modular fashion that it can easily be integrated into any RISC processor; its area overhead is considerably moderate in the sense that it can be used in vast majority of embedded processors. The protected zone can also provide the necessary support to implement TPM functionality within the boundary of a processor

    Maximizing the Efficiency using Montgomery Multipliers on FPGA in RSA Cryptography for Wireless Sensor Networks

    Get PDF
    The architecture and modeling of RSA public key encryption/decryption systems are presented in this work. Two different architectures are proposed, mMMM42 (modified Montgomery Modular Multiplier 4 to 2 Carry Save Architecture) and RSACIPHER128 to check the suitability for implementation in Wireless Sensor Nodes to utilize the same in Wireless Sensor Networks. It can easily be fitting into systems that require different levels of security by changing the key size. The processing time is increased and space utilization is reduced in FPGA due to its reusability. VHDL code is synthesized and simulated using Xilinx-ISE for both the architectures. Architectures are compared in terms of area and time. It is verified that this architecture support for a key size of 128bits. The implementation of RSA encryption/decryption algorithm on FPGA using 128 bits data and key size with RSACIPHER128 gives good result with 50% less utilization of hardware. This design is also implemented for ASIC using Mentor Graphics

    Dynamic Polymorphic Reconfiguration to Effectively “CLOAK” a Circuit’s Function

    Get PDF
    Today\u27s society has become more dependent on the integrity and protection of digital information used in daily transactions resulting in an ever increasing need for information security. Additionally, the need for faster and more secure cryptographic algorithms to provide this information security has become paramount. Hardware implementations of cryptographic algorithms provide the necessary increase in throughput, but at a cost of leaking critical information. Side Channel Analysis (SCA) attacks allow an attacker to exploit the regular and predictable power signatures leaked by cryptographic functions used in algorithms such as RSA. In this research the focus on a means to counteract this vulnerability by creating a Critically Low Observable Anti-Tamper Keeping Circuit (CLOAK) capable of continuously changing the way it functions in both power and timing. This research has determined that a polymorphic circuit design capable of varying circuit power consumption and timing can protect a cryptographic device from an Electromagnetic Analysis (EMA) attacks. In essence, we are effectively CLOAKing the circuit functions from an attacker

    Hardware and Software Multi-precision Implementations of Cryptographic Algorithms

    Get PDF
    The software implementations of cryptographic algorithms are considered to be very slow, when there are requirements of multi-precision arithmetic operations on very long integers. These arithmetic operations may include addition, subtraction, multiplication, division and exponentiation. Several research papers have been published providing different solutions to make these operations faster. Digital Signature Algorithm (DSA) is a cryptographic application that requires multi-precision arithmetic operations. These arithmetic operations are mostly based upon modular multiplication and exponentiation on integers of the size of 1024 bits. The use of such numbers is an essential part of providing high security against the cryptanalytic attacks on the authenticated messages. When these operations are implemented in software, performance in terms of speed becomes very low. The major focus of the thesis is the study of various arithmetic operations for public key cryptography and selecting the fast multi-precision arithmetic algorithms for hardware implementation. These selected algorithms are implemented in hardware and software for performance comparison and they are used to implement Digital Signature Algorithm for performance analysis

    Montgomery Modular Multiplication on Reconfigurable Hardware: Systolic versus Multiplexed Implementation

    Get PDF
    This paper describes a comparison of two Montgomery modular multiplication architectures: a systolic and a multiplexed. Both implementations target FPGA devices. The modular multiplication is employed in modular exponentiation processes, which are the most important operations of some public-key cryptographic algorithms, including the most popular of them, the RSA. The proposed systolic architecture presents a high-radix implementation with a one-dimensional array of Processing Elements. The multiplexed implementation is a new alternative and is composed of multiplier blocks in parallel with the new simplified Processing Elements, and it provides a pipelined operation mode. We compare the time × area efficiency for both architectures as well as an RSA application. The systolic implementation can run the 1024 bits RSA decryption process in just 3.23 ms, and the multiplexed architecture executes the same operation in 4.36 ms, but the second approach saves up to 28% of logical resources. These results are competitive with the state-of-the-art performance
    corecore