10 research outputs found

    Saber on ARM CCA-secure module lattice-based key encapsulation on ARM

    Get PDF
    The CCA-secure lattice-based post-quantum key encapsulation scheme Saber is a candidate in the NIST\u27s post-quantum cryptography standardization process. In this paper, we study the implementation aspects of Saber in resource-constrained microcontrollers from the ARM Cortex-M series which are very popular for realizing IoT applications. In this work, we carefully optimize various parts of Saber for speed and memory. We exploit digital signal processing instructions and efficient memory access for a fast implementation of polynomial multiplication. We also use memory efficient Karatsuba and just-in-time strategy for generating the public matrix of the module lattice to reduce the memory footprint. We also show that our optimizations can be combined with each other seamlessly to provide various speed-memory trade-offs. Our speed optimized software takes just 1,147K, 1,444K, and 1,543K clock cycles on a Cortex-M4 platform for key generation, encapsulation and decapsulation respectively. Our memory efficient software takes 4,786K, 6,328K, and 7,509K clock cycles on an ultra resource-constrained Cortex-M0 platform for key generation, encapsulation, and decapsulation respectively while consuming only 6.2 KB of memory at most. These results show that lattice-based key encapsulation schemes are perfectly practical for securing IoT devices from quantum computing attacks

    Faster NTRU on ARM Cortex-M4 with TMVP-based multiplication

    Get PDF
    The Number Theoretic Transform (NTT), Toom-Cook, and Karatsuba are the most commonly used algorithms for implementing lattice-based ?nalists of the NIST PQC competition. In this paper, we propose Toeplitz matrix-vector product (TMVP) based algorithms for multiplication for all parameter sets of NTRU. We implement the pro- posed algorithms on ARM Cortex-M4. The results show that TMVP- based multiplication algorithms using the four-way TMVP formula are more e?cient for NTRU. Our algorithms outperform the Toom-Cook method by up to 25.3%, and the NTT method by up to 19.8%. More- over, our algorithms require less stack space than the others in most cases. We also observe the impact of these improvements on the overall performance of NTRU. We speed up the encryption, decryption, en- capsulation, and decapsulation by up to 13.7%,17.5%,3.5%, and 14.1%, respectively, compared to state-of-the-art implementation

    Memory-Efficient High-Speed Implementation of Kyber on Cortex-M4

    Get PDF
    This paper presents an optimized software implementation of the module-lattice-based key-encapsulation mechanism Kyber for the ARM Cortex-M4 microcontroller. Kyber is one of the round-2 candidates in the NIST post-quantum project. In the center of our work are novel optimization techniques for the number-theoretic transform (NTT) inside Kyber, which make very efficient use of the computational power offered by the “vector” DSP instructions of the target architecture. We also present results for the recently updated parameter sets of Kyber which equally benefit from our optimizations. As a result of our efforts we present software that is 18% faster than an earlier implementation of Kyber optimized for the Cortex-M4 by the Kyber submitters. Our NTT is more than twice as fast as the NTT in that software. Our software runs at about the same speed as the latest speed-optimized implementation of the other module-lattice based round-2 NIST PQC candidate Saber. However, for our Kyber software, this performance is achieved with a much smaller RAM footprint. Kyber needs less than half of the RAM of what the considerably slower RAM-optimized version of Saber uses. Our software does not make use of any secret-dependent branches or memory access and thus offers state-of-the-art protection against timing attack

    Saber on ARM: CCA-secure module lattice-based key encapsulation on ARM

    No full text
    The CCA-secure lattice-based post-quantum key encapsulation scheme Saber is a candidate in the NIST’s post-quantum cryptography standardization process. In this paper, we study the implementation aspects of Saber in resourceconstrained microcontrollers from the ARM Cortex-M series which are very popular for realizing IoT applications. In this work, we carefully optimize various parts of Saber for speed and memory. We exploit digital signal processing instructions and efficient memory access for a fast implementation of polynomial multiplication. We also use memory efficient Karatsuba and just-in-time strategy for generating the public matrix of the module lattice to reduce the memory footprint. We also show that our optimizations can be combined with each other seamlessly to provide various speed-memory trade-offs. Our speed optimized software takes just 1,147K, 1,444K, and 1,543K clock cycles on a Cortex-M4 platform for key generation, encapsulation and decapsulation respectively. Our memory efficient software takes 4,786K, 6,328K, and 7,509K clock cycles on an ultra resource-constrained Cortex-M0 platform for key generation, encapsulation, and decapsulation respectively while consuming only 6.2 KB of memory at most. These results show that lattice-based key encapsulation schemes are perfectly practical for securing IoT devices from quantum computing attacks

    Saber on ARM. CCA-secure module lattice-based key encapsulation on ARM

    No full text
    status: publishe

    Saber on ESP32

    Get PDF
    Saber, a CCA-secure lattice-based post-quantum key encapsulation scheme, is one of the second round candidate algorithms in the post-quantum cryptography standardization process of the US National Institute of Standards and Technology (NIST) in 2019. In this work, we provide an efficient implementation of Saber on ESP32, an embedded microcontroller designed for IoT environment with WiFi and Bluetooth support. RSA coprocessor was used to speed up the polynomial multiplications for Kyber variant in a CHES 2019 paper. We propose an improved implementation utilizing the big integer coprocessor for the polynomial multiplications in Saber, which contains significant lower software overhead and takes a better advantage of the big integer coprocessor on ESP32. By using the fast implementation of polynomial multiplications, our single-core version implementation of Saber takes 1639K, 2123K, 2193K clock cycles on ESP32 for key generation, encapsulation and decapsulation respectively. Benefiting from the dual core feature on ESP32, we speed up the implementation of Saber by rearranging the computing steps and assigning proper tasks to two cores executing in parallel. Our dual-core version implementation takes 1176K, 1625K, 1514K clock cycles for key generation, encapsulation and decapsulation respectively
    corecore