5 research outputs found
Area Efficient Modular Reduction in Hardware for Arbitrary Static Moduli
Modular reduction is a crucial operation in many post-quantum cryptographic
schemes, including the Kyber key exchange method or Dilithium signature scheme.
However, it can be computationally expensive and pose a performance bottleneck
in hardware implementations. To address this issue, we propose a novel approach
for computing modular reduction efficiently in hardware for arbitrary static
moduli. Unlike other commonly used methods such as Barrett or Montgomery
reduction, the method does not require any multiplications. It is not dependent
on properties of any particular choice of modulus for good performance and low
area consumption. Its major strength lies in its low area consumption, which
was reduced by 60% for optimized and up to 90% for generic Barrett
implementations for Kyber and Dilithium. Additionally, it is well suited for
parallelization and pipelining and scales linearly in hardware resource
consumption with increasing operation width. All operations can be performed in
the bit-width of the modulus, rather than the size of the number being reduced.
This shortens carry chains and allows for faster clocking. Moreover, our method
can be executed in constant time, which is essential for cryptography
applications where timing attacks can be used to obtain information about the
secret key.Comment: 7 pages, 2 figure
PipeMSM: Hardware Acceleration for Multi-Scalar Multiplication
Multi-Scalar Multiplication (MSM) is a fundamental computational problem. Interest in this problem was recently prompted by its application to ZK-SNARKs, where it often turns out to be the main computational bottleneck.
In this paper we set forth a pipelined design for computing MSM. Our design is based on a novel algorithmic approach and hardware-specific optimizations. At the core, we rely on a modular multiplication technique which we deem to be of independent interest.
We implemented and tested our design on FPGA. We highlight the promise of optimized hardware over state-of-the-art GPU- based MSM solver in terms of speed and energy expenditure
A Fast Modular Reduction Method
We put forth a lookup-table-based modular reduction method which partitions the binary string of an integer to be reduced into blocks according to its runs. Its complexity depends on the amount of runs in the binary string. We show that the new reduction is almost twice as fast as the popular Barrett’s reduction, and provide a thorough complexity analysis of the method