96 research outputs found

    SaberX4: High-throughput Software Implementationof Saber Key Encapsulation Mechanism

    Get PDF
    Saber is a module lattice-based CCA-secure key encapsulation mechanism (KEM) which has been shortlisted for the second round of NIST\u27s Post Quantum Cryptography Standardization project. To attain simplicity and efficiency on constrained devices, the Saber algorithm is serial by construction. However, on high-end platforms, such as modern Intel processors with AVX2 instructions, Saber achieves limited speedup using vector processing instructions due to its serial nature. In this paper we overcome the above-mentioned algorithmic bottleneck and propose a high-throughput software implementation of Saber, which we call `SaberX4\u27, targeting modern Intel processors with AVX2 vector processing support. We apply the batching technique at the highest level of the implementation hierarchy and process four Saber KEM operations simultaneously in parallel using the AVX2 vector processing instructions. Our proof-of-concept software implementation of SaberX4 achieves nearly 1.5 times higher throughput at the cost of latency degradation within acceptable margins, compared to the AVX2-optimized non-batched implementation of Saber by its authors. We anticipate that both latency and throughput of SaberX4 will improve in the future with improved computer architectures and more optimization efforts

    Constant-time BCH Error-Correcting Code

    Get PDF
    Error-correcting codes can be useful in reducing decryption failure rate of several lattice-based and code-based public-key encryption schemes. Two schemes, namely LAC and HQC, in NIST’s round 2 phase of its post-quantum cryptography standardisation project use the strong error-correcting BCH code. However, direct application of the BCH code in decryption algorithms of public-key schemes could open new avenues to the attacks. For example, a recent attack exploited non-constant-time execution of BCH code to reduce the security of LAC. In this paper we analyse the BCH error-correcting code, identify computation steps that cause timing variations and design the first constant-time BCH algorithm. We implement our algorithm in software and evaluate its resistance against timing attacks by performing leakage detection tests. To study the computational overhead of the countermeasures, we integrated our constant-time BCH code in the reference and optimised implementations of the LAC scheme as a case study, and observed nearly 1.1 and 1.4 factor slowdown respectively for the CCA-secure primitive

    Constant-time discrete Gaussian sampling

    Get PDF
    © 2018 IEEE. Sampling from a discrete Gaussian distribution is an indispensable part of lattice-based cryptography. Several recent works have shown that the timing leakage from a non-constant-time implementation of the discrete Gaussian sampling algorithm could be exploited to recover the secret. In this paper, we propose a constant-time implementation of the Knuth-Yao random walk algorithm for performing constant-time discrete Gaussian sampling. Since the random walk is dictated by a set of input random bits, we can express the generated sample as a function of the input random bits. Hence, our constant-time implementation expresses the unique mapping of the input random-bits to the output sample-bits as a Boolean expression of the random-bits. We use bit-slicing to generate multiple samples in batches and thus increase the throughput of our constant-time sampling manifold. Our experiments on an Intel i7-Broadwell processor show that our method can be as much as 2.4 times faster than the constant-time implementation of cumulative distribution table based sampling and consumes exponentially less memory than the Knuth-Yao algorithm with shuffling for a similar level of security

    Arithmetic of Ï„\tau-adic Expansions for Lightweight Koblitz Curve Cryptography

    Get PDF
    Koblitz curves allow very efficient elliptic curve cryptography. The reason is that one can trade expensive point doublings to cheap Frobenius endomorphisms by representing the scalar as a tau-adic expansion. Typically elliptic curve cryptosystems, such as ECDSA, also require the scalar as an integer. This results in a need for conversions between integers and the tau-adic domain, which are costly and hinder the use of Koblitz curves on very constrained devices, such as RFID tags, wireless sensors, or certain applications of the Internet of things. We provide solutions to this problem by showing how complete cryptographic processes, such as ECDSA signing, can be completed in the tau-adic domain with very few resources. This allows outsourcing conversions to a more powerful party. We provide several algorithms for performing arithmetic operations in the tau-adic domain. In particular, we introduce a new representation allowing more efficient and secure computations compared to the algorithms available in the preliminary version of this work from CARDIS 2014. We also provide datapath extensions with different speed and side-channel resistance properties that require areas from less than one hundred to a few hundred gate equivalents on 0.13-mu m CMOS. These extensions are applicable for all Koblitz curves.Peer reviewe

    SASTA: Ambushing Hybrid Homomorphic Encryption Schemes with a Single Fault

    Get PDF
    The rising tide of data breaches targeting large data storage centres, and servers has raised serious privacy and security concerns. Homomorphic Encryption schemes offer an effective defence against such attacks, but their adoption is hindered by substantial computational and communication overhead, both on the server and client sides. This challenge led to the development of Hybrid Homomorphic Encryption (HHE) schemes to reduce the cost of client-side computation and communication. Despite the existence of a multitude of HHE schemes in the literature, their security analysis is still in its infancy, especially in the context of physical attacks like Differential Fault Analysis (DFA). This work aims to address this critical gap for HHE schemes defined over prime fields (Fp − HHE) by introducing, implementing and validating SASTA, the first DFA on Fp − HHE and the first nonce-respecting FA over any HHE scheme. In this pursuit, we introduce a new nonce-respecting fault model (all current fault attacks on HHE schemes require a nonce-reuse), which leads to a unique attack that completely exploits both the asymmetric and symmetric facets of HHE. We target Fp − HHE schemes as they offer support for integer or real arithmetic, enabling more versatile applications, like machine learning, and better performance. The fault model benefits from what we call the mirror-effect, which allows the attack to work both on the client and the server. Our analysis reveals a significant vulnerability: a single fault within the Keccak permutation, employed as an extendable output function, results in complete key recovery for the Pasta HHE scheme. Moreover, this vulnerability extends to other HHE schemes, including Rasta, Masta, and Hera, amplifying the scope and impact of SASTA. For experimental validation, we mount an actual fault attack using ChipWhisperer-Lite board on the Keccak permutation. Following this, we also discuss the conventional countermeasures to defend against SASTA. Overall, SASTA constitutes the first nonce-respecting FA of HHE that offers new insights into how server-side or client-side computations can be manipulated for Fp − HHE schemes to recover the entire key with just a single fault. This work reaffirms the orthogonality of convenience and attack vulnerability and should contribute to the landscape of future HHE schemes

    Exploring RNS for Isogeny-based Cryptography

    Get PDF
    Isogeny-based cryptography suffers from a long-running time due to its requirement of a great amount of large integer arithmetic. The Residue Number System (RNS) can compensate for that drawback by making computation more efficient via parallelism. However, performing a modular reduction by a large prime which is not part of the RNS base is very expensive. In this paper, we propose a new fast and efficient modular reduction algorithm using RNS. Also, we evaluate our modular reduction method by realizing a cryptoprocessor for isogeny-based SIDH key exchange. On a Xilinx Ultrascale+ FPGA, the proposed cryptoprocessor consumes 151,009 LUTs, 143,171 FFs and 1,056 DSPs. It achieves 250 MHz clock frequency and finishes the key exchange for SIDH in 3.8 and 4.9 ms

    PROTEUS: A Tool to generate pipelined Number Theoretic Transform Architectures for FHE and ZKP applications

    Get PDF
    Emerging cryptographic algorithms such as fully homomorphic encryption (FHE) and zero-knowledge proof (ZKP) perform arithmetic involving very large polynomials. One fundamental and time-consuming polynomial operation is the Number theoretic transform (NTT) which is a generalization of the fast Fourier transform. Hardware platforms such as FPGAs could be used to accelerate the NTTs in FHE and ZKP protocols. One major problem is that the FHE and ZKP protocols require different parameter sets, e.g., polynomial degree and coefficient size, depending on their applications. Therefore, a basic research question is: How to design scalable hardware architectures for accelerating NTTs in the FHE and ZKP protocols? In this paper, we present ‘PROTEUS’, an open-source and parametric tool that generates synthesizable bandwidth-efficient NTT architectures for user-specified parameter sets. The architectures can be tuned to utilize different memory bandwidths and parameters which is a very important design requirement in both FHE and ZKP protocols. The generated NTT architectures show a significant performance speedup compared to similar NTT architectures on FPGA. Further comparisons with state-of-the-art show a reduction of up to 23% and 35% in terms of DSP and BRAM utilization
    • …
    corecore