47 research outputs found
Efficient Pipelining for Modular Multiplication Architectures in Prime Fields
This paper presents a pipelined architecture of a modular Montgomery multiplier, which is suitable to be used in public key coprocessors. Starting from a baseline implementation of the Montgomery algorithm, a more compact pipelined version is derived. The design makes use of 16bit integer multiplication blocks that are available on recently manufactured FPGAs. The critical path is optimized by omitting the exact computation of intermediate results in the Montgomery algorithm using a 6-2 carry-save notation. This results in a high-speed architecture, which outperforms previously designed Montgomery multipliers. Because a very popular application of Montgomery multiplication is public key cryptography, we compare our implementation to the state-of-the-art in Montgomery multipliers on the basis of performance results for 1024-bit RSA
Implementing Homomorphic Encryption Based Secure Feedback Control for Physical Systems
This paper is about an encryption based approach to the secure implementation
of feedback controllers for physical systems. Specifically, Paillier's
homomorphic encryption is used to digitally implement a class of linear dynamic
controllers, which includes the commonplace static gain and PID type feedback
control laws as special cases. The developed implementation is amenable to
Field Programmable Gate Array (FPGA) realization. Experimental results,
including timing analysis and resource usage characteristics for different
encryption key lengths, are presented for the realization of an inverted
pendulum controller; as this is an unstable plant, the control is necessarily
fast
Hardware Implementations of Scalable and Unified Elliptic Curve Cryptosystem Processors
As the amount of information exchanged through the network grows, so does the demand for increased security over the transmission of this information. As the growth of computers increased in the past few decades, more sophisticated methods of cryptography have been developed. One method of transmitting data securely over the network is by using symmetric-key cryptography. However, a drawback of symmetric-key cryptography is the need to exchange the shared key securely. One of the solutions is to use public-key cryptography.
One of the modern public-key cryptography algorithms is called Elliptic Curve Cryptography (ECC). The advantage of ECC over some older algorithms is the smaller number of key sizes to provide a similar level of security. As a result, implementations of ECC are much faster and consume fewer resources. In order to achieve better performance, ECC operations are often offloaded onto hardware to alleviate the workload from the servers' processors.
The most important and complex operation in ECC schemes is the elliptic curve point multiplication (ECPM). This thesis explores the implementation of hardware accelerators that offload the ECPM operation to hardware. These processors are referred to as ECC processors, or simply ECPs. This thesis targets the efficient hardware implementation of ECPs specifically for the 15 elliptic curves recommended by the National Institute of Standards and Technology (NIST).
The main contribution of this thesis is the implementation of highly efficient hardware for scalable and unified finite field arithmetic units that are used in the design of ECPs. In this thesis, scalability refers to the processor's ability to support multiple key sizes without the need to reconfigure the hardware. By doing so, the hardware does not need to be redesigned for the server to handle different levels of security. Unified refers to the ability of the ECP to handle both prime and binary fields. The resultant designs are valuable to the research community and industry, as a single hardware device is able to handle a wide range of ECC operations efficiently and at high speeds. Thus, improving the ability of network servers to handle secure transaction more quickly and improve productivity at lower costs
Analysis and Comparison of Different Multiplier
Multiplication is one of the important parameter in various digital applications such as in digital signal processor, microprocessor so in this paper firstly we analyse various 4*4 multiplier circuit and then analyses various 12*12 bit multiplier circuit. After then their parameters i.e area, power and delay are analyzed. All these multiplier are designed in Verilog language and synthesized on Xilinx ISE simulator and using cadence RTL schematic respectively. Multipliers included in this paper are Array multiplier, Radix-4 multiplier, Radix-8 multiplier, Wallace Multiplier and Conventional multiplier. On comparison it is found that for 4*4 and 12*12 multiplier, array multiplier have highest delay but have less power consumption while Booth multiplier(Radix-4) is having high speed with moderate power consumption
Versatile Montgomery Multiplier Architectures
Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive.
Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery\u27s method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)).
Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n).
Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption
Montgomery Modular Multiplication on Reconfigurable Hardware: Systolic versus Multiplexed Implementation
This paper describes a comparison of two Montgomery
modular multiplication architectures: a systolic and a
multiplexed. Both implementations target FPGA devices. The
modular multiplication is employed in modular exponentiation
processes, which are the most important operations of some
public-key cryptographic algorithms, including the most popular
of them, the RSA. The proposed systolic architecture presents
a high-radix implementation with a one-dimensional array of
Processing Elements. The multiplexed implementation is a new
alternative and is composed of multiplier blocks in parallel
with the new simplified Processing Elements, and it provides a
pipelined operation mode. We compare the time × area efficiency
for both architectures as well as an RSA application. The systolic
implementation can run the 1024 bits RSA decryption process
in just 3.23 ms, and the multiplexed architecture executes the
same operation in 4.36 ms, but the second approach saves up to
28% of logical resources. These results are competitive with the
state-of-the-art performance
Pipelining GF(P) Elliptic Curve Cryptography Computation
This paper proposes a new method to compute Elliptic Curve Cryptography in Galois Fields GF(p). The method incorporates pipelining to utilize the benefit of both parallel and serial methodology used before. It allows the exploitation of the inherited independency that exists in elliptic curve point addition and doubling operations. The results showed attraction because of its improvement over many parallel and serial techniques of elliptic curve crypto-computations
Fast 160-Bits GF (P) Elliptic Curve Crypto Hardware of High-Radix Scalable Multipliers
In this paper, a fast hardware architecture for elliptic curve cryptography computation in Galois Field GF(p) is proposed. The architecture is implemented for 160-bits, as its data size to handle. The design adopts projective coordinates to eliminate most of the required GF(p) inversion calculations replacing them with several multiplication operations. The hardware is intended to be scalable, which allows the hardware to compute long precision numbers in a repetitive way. The design involves four parallel scalable multipliers to gain the best speed. This scalable design was implemented in different versions depending on the area and speed. All scalable implementations were compared with an available FPGA design. The proposed scalable hardware showed interesting results in both area and speed. It also showed some area-time flexibility to accommodate the variation needed by different crypto applications
Fast 160-Bits GF (P) Elliptic Curve Crypto Hardware of High-Radix Scalable Multipliers
In this paper, a fast hardware architecture for elliptic curve cryptography computation in Galois Field GF(p) is proposed. The architecture is implemented for 160-bits, as its data size to handle. The design adopts projective coordinates to eliminate most of the required GF(p) inversion calculations replacing them with several multiplication operations. The hardware is intended to be scalable, which allows the hardware to compute long precision numbers in a repetitive way. The design involves four parallel scalable multipliers to gain the best speed. This scalable design was implemented in different versions depending on the area and speed. All scalable implementations were compared with an available FPGA design. The proposed scalable hardware showed interesting results in both area and speed. It also showed some area-time flexibility to accommodate the variation needed by different crypto applications
Fast, compact and secure implementation of rsa on dedicated hardware
RSA is the most popular Public Key Cryptosystem (PKC) and is heavily used today. PKC comes into play, when two parties, who have previously never met, want to create a secure channel between them. The core operation in RSA is modular multiplication, which requires lots of computational power especially when the operands are longer than 1024-bits. Although today’s powerful PC’s can easily handle one RSA operation in a fraction of a second, small devices such as PDA’s, cell phones, smart cards, etc. have limited computational power, thus there is a need for dedicated hardware which is specially designed to meet the demand of this heavy calculation. Additionally, web servers, which thousands of users can access at the same time, need to perform many PKC operations in a very short time and this can create a performance bottleneck. Special algorithms implemented on dedicated hardware can take advantage of true massive parallelism and high utilization of the data path resulting in high efficiency in terms of both power and execution time while keeping the chip cost low. We will use the “Montgomery Modular Multiplication” algorithm in our implementation, which is considered one of the most efficient multiplication schemes, and has many applications in PKC. In the first part of the thesis, our “2048-bit Radix-4 based Modular Multiplier” design is introduced and compared with the conventional radix-2 modular multipliers of previous works. Our implementation for 2048-bit modular multiplication features up to 82% shorter execution time with 33% increase in the area over the conventional radix-2 designs and can achieve 132 MHz on a Xilinx xc2v6000 FPGA. The proposed multiplier has one of the fastest execution times in terms of latency and performs better than (37% better) our reference radix-2 design in terms of time-area product. The results are similar in the ASIC case where we implement our design for UMC 0.18 μm technology. In the second part, a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit intended for inexpensive FPGAs are presented. The design utilizes hardwired block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The deployed method is based on the Montgomery modular multiplication algorithm and the architecture utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications* in 7.62 μs and 27.0 μs respectively with approximately the same device usage on Xilinx Spartan-3E 500. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine easily fits into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks with an insignificant overhead in area and execution time. The upper limit on the operand precision is dictated only by the available Block-RAM to accommodate the operands within the FPGA. This design is also compared to the first one, considering the relative advantages and disadvantages of each circuit