7 research outputs found

    Preprocess-then-NTT Technique and Its Applications to KYBER and NEWHOPE

    Get PDF
    The Number Theoretic Transform (NTT) provides efficient algorithm for multiplying large degree polynomials. It is commonly used in cryptographic schemes that are based on the hardness of the Ring Learning With Errors problem (RLWE), which is a popular basis for post-quantum key exchange, encryption and digital signature. To apply NTT, modulus q should satisfy that q = 1 mod 2n, RLWE-based schemes have to choose an oversized modulus, which leads to excessive bandwidth. In this work, we present “Preprocess-then-NTT (PtNTT)” technique which weakens the limitation of modulus q, i.e., we only require q = 1 mod n or q = 1 mod n/2. Based on this technique, we provide new parameter settings for KYBER and NEWHOPE (two NIST candidates). In these new schemes, we can reduce public key size and ciphertext size at a cost of very little efficiency loss

    Tight bound on NewHope failure probability

    Get PDF
    NewHope Key Encapsulation Mechanism (KEM) has been presented at USENIX 2016 by Alchim et al. and is one of the remaining lattice-based candidates to the post-quantum standardization initiated by the NIST. However, despite the relative simplicity of the protocol, the bound on the decapsulation failure probability resulting from the original analysis is not tight. In this work we refine this analysis to get a tight upper-bound on this probability which happens to be much lower than what was originally evaluated. As a consequence we propose a set of alternnative parameters, increasing the security and the compactness of the scheme. However using a smaller modulus prevent the use of a full NTT algorithm to perform multiplications of elements in dimension 512 or 1024. Nonetheless, similarly to previous works, we combine different multiplication algorithms and show that our new parameters are competitive on a constant time vectorized implementation. Our most compact parameters bring a speed- up of 17% (resp. 11%) in performance but allow to gain more than 19% over the bandwidth requirements and to increase the security of 10% (resp. 7%) in dimension 512 (resp. 1024)

    Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols (Extended Version)

    Get PDF
    Public key cryptography protocols, such as RSA and elliptic curve cryptography, will be rendered insecure by Shor’s algorithm when large-scale quantum computers are built. Cryptographers are working on quantum-resistant algorithms, and lattice-based cryptography has emerged as a prime candidate. However, high computational complexity of these algorithms makes it challenging to implement lattice-based protocols on low-power embedded devices. To address this challenge, we present Sapphire – a lattice cryptography processor with configurable parameters. Efficient sampling, with a SHA-3-based PRNG, provides two orders of magnitude energy savings; a single-port RAM-based number theoretic transform memory architecture is proposed, which provides 124k-gate area savings; while a low-power modular arithmetic unit accelerates polynomial computations. Our test chip was fabricated in TSMC 40nm low-power CMOS process, with the Sapphire cryptographic core occupying 0.28 mm2 area consisting of 106k logic gates and 40.25 KB SRAM. Sapphire can be programmed with custom instructions for polynomial arithmetic and sampling, and it is coupled with a low-power RISC-V micro-processor to demonstrate NIST Round 2 lattice-based CCA-secure key encapsulation and signature protocols Frodo, NewHope, qTESLA, CRYSTALS-Kyber and CRYSTALS-Dilithium, achieving up to an order of magnitude improvement in performance and energy-efficiency compared to state-of-the-art hardware implementations. All key building blocks of Sapphire are constant-time and secure against timing and simple power analysis side-channel attacks. We also discuss how masking-based DPA countermeasures can be implemented on the Sapphire core without any changes to the hardware

    Number Theoretic Transform and Its Applications in Lattice-based Cryptosystems: A Survey

    Full text link
    Number theoretic transform (NTT) is the most efficient method for multiplying two polynomials of high degree with integer coefficients, due to its series of advantages in terms of algorithm and implementation, and is consequently widely-used and particularly fundamental in the practical implementations of lattice-based cryptographic schemes. Especially, recent works have shown that NTT can be utilized in those schemes without NTT-friendly rings, and can outperform other multiplication algorithms. In this paper, we first review the basic concepts of polynomial multiplication, convolution and NTT. Subsequently, we systematically introduce basic radix-2 fast NTT algorithms in an algebraic way via Chinese Remainder Theorem. And then, we elaborate recent advances about the methods to weaken restrictions on parameter conditions of NTT. Furthermore, we systematically introduce how to choose appropriate strategy of NTT algorithms for the various given rings. Later, we introduce the applications of NTT in the lattice-based cryptographic schemes of NIST post-quantum cryptography standardization competition. Finally, we try to present some possible future research directions

    AKCN-E8: Compact and Flexible KEM from Ideal Lattice

    Get PDF
    A remarkable breakthrough in mathematics in recent years is the proof of the long-standing conjecture: sphere packing (i.e., packing unit balls) in the E8E_8 lattice is optimal in the sense of the best density \cite{V17} for sphere packing in R8\mathbb{R}^8. In this work, based on the E8E_8 lattice code, we design a mechanism for asymmetric key consensus from noise (AKCN), referred to as AKCN-E8, for error correction and key consensus. As a direct application of the AKCN-E8 code, we present highly practical key encapsulation mechanism (KEM) from the ideal lattice based on the ring learning with errors (RLWE) problem. Compared to the RLWE-based NewHope-KEM \cite{newhope-NIST}, which is a variant of NewHope-Usenix \cite{newhope15} and is now a promising candidate in the second round of NIST post-quantum cryptography (PQC) standardization competition, our AKCN-E8-KEM has the following advantages: * The size of shared-key is doubled.. * More compact ciphertexts, at the same or even higher security level. * More flexible parameter selection for tradeoffs among security, ciphertext size and error probability

    cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs

    Get PDF
    Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and multi-scalar multiplication (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation of zkSNARK with the following three techniques to achieve high performance. First, we propose a new parallel MSM algorithm. This MSM algorithm achieves nearly perfect linear speedup over the Pippenger algorithm, a well-known serial MSM algorithm. Second, we parallelize the MUL operation. Along with our self-designed MSM scheme and well-studied NTT scheme, cuZK achieves the parallelization of all operations in the proof generation step. Third, cuZK reduces the latency overhead caused by CPU-GPU data transfer by 1) reducing redundant data transfer and 2) overlapping data transfer and device computation. The evaluation results show that our MSM module provides over 2.08× (up to 2.94×) speedup versus the state-of-the-art GPU implementation. cuZK achieves over 2.65× (up to 4.86×) speedup on standard benchmarks and 2.18× speedup on a GPU-accelerated cryptocurrency application, Filecoin

    Breaking DPA-protected Kyber via the pair-pointwise multiplication

    Get PDF
    We present a new template attack that allows us to recover the secret key in Kyber directly from the polynomial multiplication in the decapsulation process. This multiplication corresponds to pair-pointwise multiplications between the NTT representations of the secret key and an input ciphertext. For each pair-point multiplication, a pair of secret coefficients are multiplied in isolation with a pair of ciphertext coefficients, leading to side-channel information which depends solely on these two pairs of values. Hence, we propose to exploit leakage coming from each pair-point multiplication and use it for identifying the values of all secret coefficients. Interestingly, the same leakage is present in DPA-protected implementations. Namely, masked implementations of Kyber simply compute the pair-pointwise multiplication process sequentially on secret shares, allowing us to apply the same strategy for recovering the secret coefficients of each share of the key. Moreover, as we show, our attack can be easily extended to target designs implementing shuffling of the polynomial multiplication. We also show that our attacks can be generalised to work with a known ciphertext rather than a chosen one. To evaluate the effectiveness of our attack, we target the open source implementation of masked Kyber from the mkm4 repository. We conduct extensive simulations which confirm high success rates in the Hamming weight model, even when running the simplest versions of our attack with a minimal number of templates. We show that the success probabilities of our attacks can be increased exponentially only by a linear (in the modulus q) increase in the number of templates. Additionally, we provide partial experimental evidence of our attack’s success. In fact, we show via power traces that, if we build templates for pairs of coefficients used within a pair-point multiplication, we can perform a key extraction by simply calculating the difference between the target trace and the templates. Our attack is simple, straightforward and should not require any deep learning or heavy machinery means for template building or matching. Our work shows that countermeasures such as masking and shuffling may not be enough for protecting the polynomial multiplication in lattice-based schemes against very basic side-channel attacks
    corecore