CRYSTALS-Dilithium is a lattice-based signature scheme to be standardized by
NIST as the primary post-quantum signature algorithm. In this work, we make a
thorough study of optimizing the implementations of Dilithium by utilizing the
Advanced Vector Extension (AVX) instructions, specifically AVX2 and the latest
AVX512.
We first present an improved parallel small polynomial multiplication with
tailored early evaluation (PSPM-TEE) to further speed up the signing procedure,
which results in a speedup of 5\%-6\% compared with the original PSPM Dilithium
implementation. We then present a tailored reduction method that is simpler and
faster than Montgomery reduction. Our optimized AVX2 implementation exhibits a
speedup of 3\%-8\% compared with the state-of-the-art of Dilithium AVX2
software. Finally, for the first time, we propose a fully and highly vectorized
implementation of Dilithium using AVX-512. This is achieved by carefully
vectorizing most of Dilithium functions with the AVX512 instructions in order
to improve efficiency both for time and for space simultaneously.
With all the optimization efforts, our AVX-512 implementation improves the
performance by 37.3\%/50.7\%/39.7\% in key generation, 34.1\%/37.1\%/42.7\% in
signing, and 38.1\%/38.7\%/40.7\% in verification for the parameter sets of
Dilithium2/3/5 respectively. To the best of our knowledge, our AVX512
implementation has the best performance for Dilithium on the Intel x64 CPU
platform to date.Comment: 13 pages, 5 figure