International Association for Cryptologic Research (IACR)
Abstract
Many lattice based cryptosystems are based on the Ring learning with errors (Ring-LWE) problem.
The most critical and computationally intensive operation of these Ring-LWE based cryptosystems is polynomial multiplication over rings.
In this paper, we exploit the number theoretic transform (NTT) to build a family of scalable polynomial multiplier architectures,
which provide designers with a trade-off choice of speed vs. area.
Our polynomial multipliers are capable to calculate the product of two n-degree polynomials in about (1.5nlgn+1.5n)/b clock cycles,
where b is the number of the butterfly operators.
In addition, we exploit the cancellation lemma to reduce the required ROM storage.
The experimental results on a Spartan-6 FPGA show that the proposed polynomial multiplier architectures achieve a speedup of 3 times on average
and consume less Block RAMs and slices
when compared with the compact design.
Compared with the state of the art of high-speed design,
the proposed hardware architectures save up to 46.64\% clock cycles
and improve the utilization rate of the main data processing units by 42.27\%.
Meanwhile, our designs can save up to 29.41\% block RAMs