6 research outputs found
Parametric, Secure and Compact Implementation of RSA on FPGA
We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as
available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our
design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks
Teaching FPGA Security
International audienceTeaching FPGA security to electrical engineering students is new at graduate level. It requires a wide field of knowledge and a lot of time. This paper describes a compact course on FPGA security that is available to electrical engineering master's students at the Saint-Etienne Institute of Telecom, University of Lyon, France. It is intended for instructors who wish to design a new course on this topic. The paper reviews the motivation for the course, the pedagogical issues involved, the curriculum, the lab materials and tools used, and the results. Details are provided on two original lab sessions, in particular, a compact lab that requires students to perform differential power analysis of FPGA implementation of the AES symmetric cipher. The paper gives numerous relevant references to allow the reader to prepare a similar curriculum
Recommended from our members
RSA in hardware
textThis report presents the RSA encryption and decryption schemes and discusses several methods for expediting the computations required, specifically the modular exponentiation operation that is required for RSA. A hardware implementation of the CIOS (Coarsely Integrated Operand Scanning) algorithm for modular multiplication is attempted on a XILINX Spartan3 FPGA in the TLL-5000 development platform used at the University of Texas at Austin. The development of the hardware is discussed in detail and some Verilog source code is provided for an implementation of modular multiplication. Some source code is also provided for an RSA executable to run on the TLL-6219 ARM-based development platform, to be used to generate test vectors.Electrical and Computer Engineerin
Fast, compact and secure implementation of rsa on dedicated hardware
RSA is the most popular Public Key Cryptosystem (PKC) and is heavily used today. PKC comes into play, when two parties, who have previously never met, want to create a secure channel between them. The core operation in RSA is modular multiplication, which requires lots of computational power especially when the operands are longer than 1024-bits. Although today’s powerful PC’s can easily handle one RSA operation in a fraction of a second, small devices such as PDA’s, cell phones, smart cards, etc. have limited computational power, thus there is a need for dedicated hardware which is specially designed to meet the demand of this heavy calculation. Additionally, web servers, which thousands of users can access at the same time, need to perform many PKC operations in a very short time and this can create a performance bottleneck. Special algorithms implemented on dedicated hardware can take advantage of true massive parallelism and high utilization of the data path resulting in high efficiency in terms of both power and execution time while keeping the chip cost low. We will use the “Montgomery Modular Multiplication” algorithm in our implementation, which is considered one of the most efficient multiplication schemes, and has many applications in PKC. In the first part of the thesis, our “2048-bit Radix-4 based Modular Multiplier” design is introduced and compared with the conventional radix-2 modular multipliers of previous works. Our implementation for 2048-bit modular multiplication features up to 82% shorter execution time with 33% increase in the area over the conventional radix-2 designs and can achieve 132 MHz on a Xilinx xc2v6000 FPGA. The proposed multiplier has one of the fastest execution times in terms of latency and performs better than (37% better) our reference radix-2 design in terms of time-area product. The results are similar in the ASIC case where we implement our design for UMC 0.18 μm technology. In the second part, a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit intended for inexpensive FPGAs are presented. The design utilizes hardwired block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The deployed method is based on the Montgomery modular multiplication algorithm and the architecture utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications* in 7.62 μs and 27.0 μs respectively with approximately the same device usage on Xilinx Spartan-3E 500. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine easily fits into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks with an insignificant overhead in area and execution time. The upper limit on the operand precision is dictated only by the available Block-RAM to accommodate the operands within the FPGA. This design is also compared to the first one, considering the relative advantages and disadvantages of each circuit
Uso eficiente de aritmética redundante en FPGAs
Hasta hace pocos años, la utilización de aritmética redundante en FPGAs había
sido descartada por dos razones principalmente. En primer lugar, por el buen
rendimiento que ofrecían los sumadores de acarreo propagado, gracias a la lógica de
de acarreo que poseían de fábrica y al pequeño tamaño de los operandos en las
aplicaciones típicas para FPGAs. En segundo lugar, el excesivo consumo de área que
las herramientas de síntesis obtenían cuando mapeaban unidades que trabajan en carrysave.
En este trabajo, se muestra que es posible la utilización de aritmética redundante
carry-save en FPGAs de manera eficiente, consiguiendo un aumento en la velocidad de
operación con un consumo de recursos razonable. Se ha introducido un nuevo formato
redundante doble carry-save y se ha demostrado que la manera óptima para la
realización de multiplicadores de elevado ancho de palabra es la combinación de
multiplicadores empotrados con sumadores carry-save.Till a few years ago, redundant arithmetic had been discarded to be use in FPGA
mainly for two reasons. First, the efficient results obtained using carry-propagate adders
thanks to the carry-logic embedded in FPGAs and the small sizes of operands in typical
FPGA applications. Second, the high number of resources that the synthesis tools
utilizes to implement carry-save circuits.
In this work, it is demonstrated that carry-save arithmetic can be efficiently used
in FPGA, obtaining an important speed improvement with a reasonable area cost. A
new redundant format, double carry-save, has been introduced, and the optimal
implementation of large size multipliers has been shown based on embedded multipliers
and carry-save adders