1,756 research outputs found

    Efficient Software Implementations of Modular Exponentiation

    Get PDF
    RSA computations have a significant effect on the workloads of SSL/TLS servers, and therefore their software implementations on general purpose processors are an important target for optimization. We concentrate here on 512-bit modular exponentiation, used for 1024-bit RSA. We propose optimizations in two directions. At the primitives’ level, we study and improve the performance of an “Almost” Montgomery Multiplication. At the exponentiation level, we propose a method to reduce the cost of protecting the w-ary exponentiation algorithm against cache/timing side channel attacks. Together, these lead to an efficient software implementation of 512-bit modular exponentiation, which outperforms the currently fastest publicly available alternative. When measured on the latest x86-64 architecture, the 2nd Generation Intel® Core™ processor, our implementation is 43% faster than that of the current version of OpenSSL (1.0.0d)

    Fast modular squaring with AVX512IFMA

    Get PDF
    Modular exponentiation represents a signicant workload for public key cryptosystems. Examples include not only the classical RSA, DSA, and DH algorithms, but also the partially homomorphic Paillier encryption. As a result, efficient software implementations of modular exponentiation are an important target for optimization. This paper studies methods for using Intel\u27s forthcoming AVX512 Integer Fused Multiply Accumulate (AVX512IFMA) instructions in order to speed up modular (Montgomery) squaring, which dominates the cost of the exponentiation. We further show how a minor tweak in the architectural definition of AVX512IFMA has the potential to further speed up modular squaring

    Parametric, Secure and Compact Implementation of RSA on FPGA

    Get PDF
    We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks

    Implementing a protected zone in a reconfigurable processor for isolated execution of cryptographic algorithms

    Get PDF
    We design and realize a protected zone inside a reconfigurable and extensible embedded RISC processor for isolated execution of cryptographic algorithms. The protected zone is a collection of processor subsystems such as functional units optimized for high-speed execution of integer operations, a small amount of local memory, and general and special-purpose registers. We outline the principles for secure software implementation of cryptographic algorithms in a processor equipped with the protected zone. We also demonstrate the efficiency and effectiveness of the protected zone by implementing major cryptographic algorithms, namely RSA, elliptic curve cryptography, and AES in the protected zone. In terms of time efficiency, software implementations of these three cryptographic algorithms outperform equivalent software implementations on similar processors reported in the literature. The protected zone is designed in such a modular fashion that it can easily be integrated into any RISC processor; its area overhead is considerably moderate in the sense that it can be used in vast majority of embedded processors. The protected zone can also provide the necessary support to implement TPM functionality within the boundary of a processor

    Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

    Get PDF
    Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

    Secure and Efficient RNS Approach for Elliptic Curve Cryptography

    Get PDF
    Scalar multiplication, the main operation in elliptic curve cryptographic protocols, is vulnerable to side-channel (SCA) and fault injection (FA) attacks. An efficient countermeasure for scalar multiplication can be provided by using alternative number systems like the Residue Number System (RNS). In RNS, a number is represented as a set of smaller numbers, where each one is the result of the modular reduction with a given moduli basis. Under certain requirements, a number can be uniquely transformed from the integers to the RNS domain (and vice versa) and all arithmetic operations can be performed in RNS. This representation provides an inherent SCA and FA resistance to many attacks and can be further enhanced by RNS arithmetic manipulation or more traditional algorithmic countermeasures. In this paper, extending our previous work, we explore the potentials of RNS as an SCA and FA countermeasure and provide an description of RNS based SCA and FA resistance means. We propose a secure and efficient Montgomery Power Ladder based scalar multiplication algorithm on RNS and discuss its SCAFA resistance. The proposed algorithm is implemented on an ARM Cortex A7 processor and its SCA-FA resistance is evaluated by collecting preliminary leakage trace results that validate our initial assumptions
    corecore