174 research outputs found
Parametric, Secure and Compact Implementation of RSA on FPGA
We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as
available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our
design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks
Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath
Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This
conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We
show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra
functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area
Homomorphic Data Isolation for Hardware Trojan Protection
The interest in homomorphic encryption/decryption is increasing due to its
excellent security properties and operating facilities. It allows operating on
data without revealing its content. In this work, we suggest using homomorphism
for Hardware Trojan protection. We implement two partial homomorphic designs
based on ElGamal encryption/decryption scheme. The first design is a
multiplicative homomorphic, whereas the second one is an additive homomorphic.
We implement the proposed designs on a low-cost Xilinx Spartan-6 FPGA. Area
utilization, delay, and power consumption are reported for both designs.
Furthermore, we introduce a dual-circuit design that combines the two earlier
designs using resource sharing in order to have minimum area cost. Experimental
results show that our dual-circuit design saves 35% of the logic resources
compared to a regular design without resource sharing. The saving in power
consumption is 20%, whereas the number of cycles needed remains almost the sam
Comparison of Scalable Montgomery Modular Multiplication Implementations Embedded in Reconfigurable Hardware
International audienceThis paper presents a comparison of possible approaches for an efficient implementation of Multiple-word radix-2 Montgomery Modular Multiplication (MM) on modern Field Programmable Gate Arrays (FPGAs). The hardware implementation of MM coprocessor is fully scalable what means that it can be reused in order to generate long-precision results independently on the word length of the originally proposed coprocessor. The first of analyzed implementations uses a data path based on traditionally used redundant carry-save adders, the second one exploits, in scalable designs not yet applied, standard carry-propagate adders with fast carry chain logic. As a control unit and a platform for purely software implementation an embedded soft-core processor Altera NIOS is employed. All implementations use large embedded memory blocks available in recent FPGAs. Speed and logic requirements comparisons are performed on the optimized software and combined hardware-software designs in Altera FPGAs. The issues of targeting a design specifically for a FPGA are considered taking into account the underlying architecture imposed by the target FPGA technology. It is shown that the coprocessors based on carry-save adders and carry-propagate adders provide comparable results in constrained FPGA implementations but in case of carry-propagate logic, the solution requires less embedded memory and provides some additional implementation advantages presented in the paper
Implementing a protected zone in a reconfigurable processor for isolated execution of cryptographic algorithms
We design and realize a protected zone inside a reconfigurable and extensible embedded RISC processor for isolated execution of cryptographic algorithms. The protected zone is a collection of processor subsystems such as functional units optimized for high-speed execution of integer operations, a small amount of local memory, and general and special-purpose registers. We outline the principles for secure software implementation of cryptographic algorithms
in a processor equipped with the protected zone. We also demonstrate the efficiency and effectiveness of the protected zone by implementing major cryptographic algorithms, namely RSA, elliptic curve cryptography, and AES in the protected zone. In terms of time efficiency, software implementations
of these three cryptographic algorithms outperform equivalent software implementations on similar processors reported in the literature. The protected zone is designed in such a modular fashion that it can easily be integrated into any RISC processor; its area overhead is considerably moderate in the sense that
it can be used in vast majority of embedded processors. The protected zone can also provide the necessary support to implement TPM functionality within the boundary of a processor
Customisable arithmetic hardware designs
Imperial Users onl
Maximizing the Efficiency using Montgomery Multipliers on FPGA in RSA Cryptography for Wireless Sensor Networks
The architecture and modeling of RSA public key encryption/decryption systems are presented in this work. Two different architectures are proposed, mMMM42 (modified Montgomery Modular Multiplier 4 to 2 Carry Save Architecture) and RSACIPHER128 to check the suitability for implementation in Wireless Sensor Nodes to utilize the same in Wireless Sensor Networks. It can easily be fitting into systems that require different levels of security by changing the key size. The processing time is increased and space utilization is reduced in FPGA due to its reusability. VHDL code is synthesized and simulated using Xilinx-ISE for both the architectures. Architectures are compared in terms of area and time. It is verified that this architecture support for a key size of 128bits. The implementation of RSA encryption/decryption algorithm on FPGA using 128 bits data and key size with RSACIPHER128 gives good result with 50% less utilization of hardware. This design is also implemented for ASIC using Mentor Graphics
Dynamic Polymorphic Reconfiguration to Effectively “CLOAK” a Circuit’s Function
Today\u27s society has become more dependent on the integrity and protection of digital information used in daily transactions resulting in an ever increasing need for information security. Additionally, the need for faster and more secure cryptographic algorithms to provide this information security has become paramount. Hardware implementations of cryptographic algorithms provide the necessary increase in throughput, but at a cost of leaking critical information. Side Channel Analysis (SCA) attacks allow an attacker to exploit the regular and predictable power signatures leaked by cryptographic functions used in algorithms such as RSA. In this research the focus on a means to counteract this vulnerability by creating a Critically Low Observable Anti-Tamper Keeping Circuit (CLOAK) capable of continuously changing the way it functions in both power and timing. This research has determined that a polymorphic circuit design capable of varying circuit power consumption and timing can protect a cryptographic device from an Electromagnetic Analysis (EMA) attacks. In essence, we are effectively CLOAKing the circuit functions from an attacker
Hardware and Software Multi-precision Implementations of Cryptographic Algorithms
The software implementations of cryptographic algorithms are considered to be very slow, when there are requirements of multi-precision arithmetic operations on very long integers. These arithmetic operations may include addition, subtraction, multiplication, division and exponentiation. Several research papers have been published providing different solutions to make these operations faster. Digital Signature Algorithm (DSA) is a cryptographic application that requires multi-precision arithmetic operations. These arithmetic operations are mostly based upon modular multiplication and exponentiation on integers of the size of 1024 bits. The use of such numbers is an essential part of providing high security against the cryptanalytic attacks on the authenticated messages. When these operations are implemented in software, performance in terms of speed becomes very low. The major focus of the thesis is the study of various arithmetic operations for public key cryptography and selecting the fast multi-precision arithmetic algorithms for hardware implementation. These selected algorithms are implemented in hardware and software for performance comparison and they are used to implement Digital Signature Algorithm for performance analysis
Montgomery Modular Multiplication on Reconfigurable Hardware: Systolic versus Multiplexed Implementation
This paper describes a comparison of two Montgomery
modular multiplication architectures: a systolic and a
multiplexed. Both implementations target FPGA devices. The
modular multiplication is employed in modular exponentiation
processes, which are the most important operations of some
public-key cryptographic algorithms, including the most popular
of them, the RSA. The proposed systolic architecture presents
a high-radix implementation with a one-dimensional array of
Processing Elements. The multiplexed implementation is a new
alternative and is composed of multiplier blocks in parallel
with the new simplified Processing Elements, and it provides a
pipelined operation mode. We compare the time × area efficiency
for both architectures as well as an RSA application. The systolic
implementation can run the 1024 bits RSA decryption process
in just 3.23 ms, and the multiplexed architecture executes the
same operation in 4.36 ms, but the second approach saves up to
28% of logical resources. These results are competitive with the
state-of-the-art performance
- …