Time-memory Trade-offs for Saber+ on Memory-constrained RISC-V

Jipeng Zhang; Junhao Huang; Sujoy Sinha Roy; Zhe Liu

Time-memory Trade-offs for Saber+ on Memory-constrained RISC-V

Authors: Jipeng Zhang
Junhao Huang
Sujoy Sinha Roy
Zhe Liu
Publication date: 29 November 2021
Publisher: International Association for Cryptologic Research (IACR)

Abstract

Saber is a module-lattice-based key encapsulation scheme that has been selected as a finalist in the NIST Post-Quantum Cryptography Standardization Project. As Saber computes on considerably large matrices and vectors of polynomials, its efficient implementation on memory-constrained IoT devices is very challenging. In this paper, we present an implementation of Saber with a minor tweak to the original Saber protocol for achieving reduced memory consumption and better performance. We call this tweaked implementation `Saber+\u27, and the difference compared to Saber is that we use different generation methods of public matrix

\boldsymbol{A}

and secret vector

\boldsymbol{s}

for memory optimization. Our highly optimized software implementation of Saber+ on a memory-constrained RISC-V platform achieves 48\% performance improvement compared with the best state-of-the-art memory-optimized implementation of original Saber. Specifically, we present various memory and performance optimizations for Saber+ on a memory-constrained RISC-V microcontroller, with merely 16KB of memory available. We utilize the Number Theoretic Transform (NTT) to speed up the polynomial multiplication in Saber+. For optimizing cycle counts and memory consumption during NTT, we carefully compare the efficiency of the complete and incomplete-NTTs, with platform-specific optimization. We implement 4-layers merging in the complete-NTT and 3-layers merging in the 6-layer incomplete-NTT. An improved on-the-fly generation strategy of the public matrix and secret vector in Saber+ results in low memory footprint. Furthermore, by combining different optimization strategies, various time-memory trade-offs are explored. Our software implementation for Saber+ on selected RISC-V core takes just 3,809K, 3,594K, and 3,193K clock cycles for key generation, encapsulation, and decapsulation, respectively, while consuming only 4.8KB of stack at most

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Cryptology ePrint Archive

oai:eprint.iacr.org:2021/1552

Last time updated on 25/08/2023