1,756 research outputs found
Efficient Software Implementations of Modular Exponentiation
RSA computations have a significant effect on the workloads of SSL/TLS servers, and therefore their software implementations on general purpose processors are an important target for optimization. We concentrate here on 512-bit modular exponentiation, used for 1024-bit RSA. We propose optimizations in two directions. At the primitives’ level, we study and improve the performance of an “Almost” Montgomery Multiplication. At the exponentiation level, we propose a method to reduce the cost of protecting the w-ary exponentiation algorithm against cache/timing side channel attacks. Together, these lead to an efficient software implementation of 512-bit modular exponentiation, which outperforms the currently fastest publicly available alternative. When measured on the latest x86-64 architecture, the 2nd Generation Intel® Core™ processor, our implementation is 43% faster than that of the current version of OpenSSL (1.0.0d)
Fast modular squaring with AVX512IFMA
Modular exponentiation represents a signicant workload for public key cryptosystems. Examples include not only the classical RSA, DSA, and DH algorithms, but also the partially homomorphic Paillier encryption. As a result, efficient software implementations of modular exponentiation are an important target for optimization. This paper studies methods for using Intel\u27s forthcoming AVX512 Integer Fused Multiply Accumulate (AVX512IFMA) instructions in order to speed up modular (Montgomery) squaring, which dominates the cost of the exponentiation. We further show how a minor tweak in the architectural definition of AVX512IFMA has the potential to further speed up modular squaring
Parametric, Secure and Compact Implementation of RSA on FPGA
We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as
available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our
design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks
Implementing a protected zone in a reconfigurable processor for isolated execution of cryptographic algorithms
We design and realize a protected zone inside a reconfigurable and extensible embedded RISC processor for isolated execution of cryptographic algorithms. The protected zone is a collection of processor subsystems such as functional units optimized for high-speed execution of integer operations, a small amount of local memory, and general and special-purpose registers. We outline the principles for secure software implementation of cryptographic algorithms
in a processor equipped with the protected zone. We also demonstrate the efficiency and effectiveness of the protected zone by implementing major cryptographic algorithms, namely RSA, elliptic curve cryptography, and AES in the protected zone. In terms of time efficiency, software implementations
of these three cryptographic algorithms outperform equivalent software implementations on similar processors reported in the literature. The protected zone is designed in such a modular fashion that it can easily be integrated into any RISC processor; its area overhead is considerably moderate in the sense that
it can be used in vast majority of embedded processors. The protected zone can also provide the necessary support to implement TPM functionality within the boundary of a processor
Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath
Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This
conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We
show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra
functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area
Secure and Efficient RNS Approach for Elliptic Curve Cryptography
Scalar multiplication, the main operation in elliptic
curve cryptographic protocols, is vulnerable to side-channel
(SCA) and fault injection (FA) attacks. An efficient countermeasure
for scalar multiplication can be provided by using alternative
number systems like the Residue Number System (RNS). In RNS,
a number is represented as a set of smaller numbers, where each
one is the result of the modular reduction with a given moduli
basis. Under certain requirements, a number can be uniquely
transformed from the integers to the RNS domain (and vice
versa) and all arithmetic operations can be performed in RNS.
This representation provides an inherent SCA and FA resistance
to many attacks and can be further enhanced by RNS arithmetic
manipulation or more traditional algorithmic countermeasures.
In this paper, extending our previous work, we explore the
potentials of RNS as an SCA and FA countermeasure and provide
an description of RNS based SCA and FA resistance means. We
propose a secure and efficient Montgomery Power Ladder based
scalar multiplication algorithm on RNS and discuss its SCAFA
resistance. The proposed algorithm is implemented on an
ARM Cortex A7 processor and its SCA-FA resistance is evaluated
by collecting preliminary leakage trace results that validate our
initial assumptions
- …