5 research outputs found

    Area Efficient Modular Reduction in Hardware for Arbitrary Static Moduli

    Full text link
    Modular reduction is a crucial operation in many post-quantum cryptographic schemes, including the Kyber key exchange method or Dilithium signature scheme. However, it can be computationally expensive and pose a performance bottleneck in hardware implementations. To address this issue, we propose a novel approach for computing modular reduction efficiently in hardware for arbitrary static moduli. Unlike other commonly used methods such as Barrett or Montgomery reduction, the method does not require any multiplications. It is not dependent on properties of any particular choice of modulus for good performance and low area consumption. Its major strength lies in its low area consumption, which was reduced by 60% for optimized and up to 90% for generic Barrett implementations for Kyber and Dilithium. Additionally, it is well suited for parallelization and pipelining and scales linearly in hardware resource consumption with increasing operation width. All operations can be performed in the bit-width of the modulus, rather than the size of the number being reduced. This shortens carry chains and allows for faster clocking. Moreover, our method can be executed in constant time, which is essential for cryptography applications where timing attacks can be used to obtain information about the secret key.Comment: 7 pages, 2 figure

    PipeMSM: Hardware Acceleration for Multi-Scalar Multiplication

    Get PDF
    Multi-Scalar Multiplication (MSM) is a fundamental computational problem. Interest in this problem was recently prompted by its application to ZK-SNARKs, where it often turns out to be the main computational bottleneck. In this paper we set forth a pipelined design for computing MSM. Our design is based on a novel algorithmic approach and hardware-specific optimizations. At the core, we rely on a modular multiplication technique which we deem to be of independent interest. We implemented and tested our design on FPGA. We highlight the promise of optimized hardware over state-of-the-art GPU- based MSM solver in terms of speed and energy expenditure

    A Fast Modular Reduction Method

    Get PDF
    We put forth a lookup-table-based modular reduction method which partitions the binary string of an integer to be reduced into blocks according to its runs. Its complexity depends on the amount of runs in the binary string. We show that the new reduction is almost twice as fast as the popular Barrett’s reduction, and provide a thorough complexity analysis of the method
    corecore