1,856 research outputs found
Efficient Software Implementations of Modular Exponentiation
RSA computations have a significant effect on the workloads of SSL/TLS servers, and therefore their software implementations on general purpose processors are an important target for optimization. We concentrate here on 512-bit modular exponentiation, used for 1024-bit RSA. We propose optimizations in two directions. At the primitives’ level, we study and improve the performance of an “Almost” Montgomery Multiplication. At the exponentiation level, we propose a method to reduce the cost of protecting the w-ary exponentiation algorithm against cache/timing side channel attacks. Together, these lead to an efficient software implementation of 512-bit modular exponentiation, which outperforms the currently fastest publicly available alternative. When measured on the latest x86-64 architecture, the 2nd Generation Intel® Core™ processor, our implementation is 43% faster than that of the current version of OpenSSL (1.0.0d)
Fast modular squaring with AVX512IFMA
Modular exponentiation represents a signicant workload for public key cryptosystems. Examples include not only the classical RSA, DSA, and DH algorithms, but also the partially homomorphic Paillier encryption. As a result, efficient software implementations of modular exponentiation are an important target for optimization. This paper studies methods for using Intel\u27s forthcoming AVX512 Integer Fused Multiply Accumulate (AVX512IFMA) instructions in order to speed up modular (Montgomery) squaring, which dominates the cost of the exponentiation. We further show how a minor tweak in the architectural definition of AVX512IFMA has the potential to further speed up modular squaring
Parametric, Secure and Compact Implementation of RSA on FPGA
We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as
available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our
design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks
Implementing a protected zone in a reconfigurable processor for isolated execution of cryptographic algorithms
We design and realize a protected zone inside a reconfigurable and extensible embedded RISC processor for isolated execution of cryptographic algorithms. The protected zone is a collection of processor subsystems such as functional units optimized for high-speed execution of integer operations, a small amount of local memory, and general and special-purpose registers. We outline the principles for secure software implementation of cryptographic algorithms
in a processor equipped with the protected zone. We also demonstrate the efficiency and effectiveness of the protected zone by implementing major cryptographic algorithms, namely RSA, elliptic curve cryptography, and AES in the protected zone. In terms of time efficiency, software implementations
of these three cryptographic algorithms outperform equivalent software implementations on similar processors reported in the literature. The protected zone is designed in such a modular fashion that it can easily be integrated into any RISC processor; its area overhead is considerably moderate in the sense that
it can be used in vast majority of embedded processors. The protected zone can also provide the necessary support to implement TPM functionality within the boundary of a processor
Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath
Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This
conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We
show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra
functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area
Secure and Efficient RNS Approach for Elliptic Curve Cryptography
Scalar multiplication, the main operation in elliptic
curve cryptographic protocols, is vulnerable to side-channel
(SCA) and fault injection (FA) attacks. An efficient countermeasure
for scalar multiplication can be provided by using alternative
number systems like the Residue Number System (RNS). In RNS,
a number is represented as a set of smaller numbers, where each
one is the result of the modular reduction with a given moduli
basis. Under certain requirements, a number can be uniquely
transformed from the integers to the RNS domain (and vice
versa) and all arithmetic operations can be performed in RNS.
This representation provides an inherent SCA and FA resistance
to many attacks and can be further enhanced by RNS arithmetic
manipulation or more traditional algorithmic countermeasures.
In this paper, extending our previous work, we explore the
potentials of RNS as an SCA and FA countermeasure and provide
an description of RNS based SCA and FA resistance means. We
propose a secure and efficient Montgomery Power Ladder based
scalar multiplication algorithm on RNS and discuss its SCAFA
resistance. The proposed algorithm is implemented on an
ARM Cortex A7 processor and its SCA-FA resistance is evaluated
by collecting preliminary leakage trace results that validate our
initial assumptions
Recommended from our members
ASIC design and implementation of a parallel exponentiation algorithm using optimized scalable Montgomery multipliers
Modular exponentiation and modular multiplication are the most used operations in current cryptographic systems. Some well-known cryptographic algorithms, such as RSA, Diffie-Hellman key exchange, and DSA, require modular exponentiation operations. This is performed with a series of modular multiplications to the extent of its exponent in a certain fashion depending on the exponentiation algorithm used. Cryptographic functions are very likely to be applied in current applications that perform information exchange to secure, verify, or authenticate data. Most notable is the use of such applications in Internet based information exchange. Smart cards, hand-helds, cell phones and many other small devices also need to perform information exchange and are likely to apply cryptographic functions. A hardware solution to perform a cryptographic function is generally faster and more secure than a software solution. Thus, a fast and area efficient modular exponentiation hardware solution would provide a better infrastructure for current cryptographic techniques. In certain cryptographic algorithms, very large precisions are used. Further, the precision may vary. Most of the hardware designs for modular multiplication and modular exponentiation are fixed-precision solutions. A scalable Montgomery Multiplier (MM) to perform modular multiplication has been proposed and can operate on input values of any bit-size, but the maximum bit-size should be known and is the limiting factor. The multiplier can calculate any operand size less than the maximal precision. However, this design's parameters should be optimized depending on the operand precision for which the design is used. A software application was developed in C to find the optimized design for the scalable MM module. It performs area-time trade-off for the most commonly used precisions in order to obtain a fast and area efficient solution for the common case. A modular exponentiation system is developed using this scalable multiplier design. Since the multiplier can operate on any operand size up to a certain maximum value, the exponentiation system that utilizes the multiplier will inherit the same capability. This thesis work presents the design and implementation of an exponentiation algorithm in hardware utilizing the optimized scalable Montgomery Multiplier. The design uses a parallel exponentiation algorithm to reduce the total computation time. The modular exponentiation system experimental results are analyzed and compared with software and other hardware implementations
- …