202 research outputs found

    SASTA: Ambushing Hybrid Homomorphic Encryption Schemes with a Single Fault

    Get PDF
    The rising tide of data breaches targeting large data storage centres, and servers has raised serious privacy and security concerns. Homomorphic Encryption schemes offer an effective defence against such attacks, but their adoption is hindered by substantial computational and communication overhead, both on the server and client sides. This challenge led to the development of Hybrid Homomorphic Encryption (HHE) schemes to reduce the cost of client-side computation and communication. Despite the existence of a multitude of HHE schemes in the literature, their security analysis is still in its infancy, especially in the context of physical attacks like Differential Fault Analysis (DFA). This work aims to address this critical gap for HHE schemes defined over prime fields (Fp − HHE) by introducing, implementing and validating SASTA, the first DFA on Fp − HHE and the first nonce-respecting FA over any HHE scheme. In this pursuit, we introduce a new nonce-respecting fault model (all current fault attacks on HHE schemes require a nonce-reuse), which leads to a unique attack that completely exploits both the asymmetric and symmetric facets of HHE. We target Fp − HHE schemes as they offer support for integer or real arithmetic, enabling more versatile applications, like machine learning, and better performance. The fault model benefits from what we call the mirror-effect, which allows the attack to work both on the client and the server. Our analysis reveals a significant vulnerability: a single fault within the Keccak permutation, employed as an extendable output function, results in complete key recovery for the Pasta HHE scheme. Moreover, this vulnerability extends to other HHE schemes, including Rasta, Masta, and Hera, amplifying the scope and impact of SASTA. For experimental validation, we mount an actual fault attack using ChipWhisperer-Lite board on the Keccak permutation. Following this, we also discuss the conventional countermeasures to defend against SASTA. Overall, SASTA constitutes the first nonce-respecting FA of HHE that offers new insights into how server-side or client-side computations can be manipulated for Fp − HHE schemes to recover the entire key with just a single fault. This work reaffirms the orthogonality of convenience and attack vulnerability and should contribute to the landscape of future HHE schemes

    Towards High-speed ASIC Implementations of Post-Quantum Cryptography

    Get PDF
    In this brief, we realize different architectural techniques towards improving the performance of post-quantum cryptography (PQC) algorithms when implemented as hardware accelerators on an application-specific integrated circuit (ASIC) platform. Having SABER as a case study, we designed a 256-bit wide architecture geared for high-speed cryptographic applications that incorporates smaller and distributed SRAM memory blocks. Moreover, we have adapted the building blocks of SABER to process 256-bit words. We have also used a buffer technique for efficient polynomial coefficient multiplications to reduce the clock cycle count. Finally, double-sponge functions are combined serially (one after another) in a high-speed KECCAK core to improve the hash operations of SHA/SHAKE. For key-generation, encapsulation, and decapsulation operations of SABER, our 256-bit wide accelerator with a single sponge function is 1.71x, 1.45x, and 1.78x faster compared to the raw clock cycle count of a serialized SABER design. Similarly, our 256-bit implementation with double-sponge functions takes 1.08x, 1.07x & 1.06x fewer clock cycles compared to its single-sponge counterpart. The studied optimization techniques are not specific to SABER - they can be utilized for improving the performance of other lattice-based PQC accelerators

    Kavach: Lightweight masking techniques for polynomial arithmetic in lattice-based cryptography

    Get PDF
    Lattice-based cryptography has laid the foundation of various modern-day cryptosystems that cater to several applications, including post-quantum cryptography. For structured lattice-based schemes, polynomial arithmetic is a fundamental part. In several instances, the performance optimizations come from implementing compact multipliers due to the small range of the secret polynomial coefficients. However, this optimization does not easily translate to side-channel protected implementations since masking requires secret polynomial coefficients to be distributed over a large range. In this work, we address this problem and propose two novel generalized techniques, one for the number theoretic transform (NTT) based and another for the non-NTT-based polynomial arithmetic. Both these proposals enable masked polynomial multiplication while utilizing and retaining the small secret property. For demonstration, we used the proposed technique and instantiated masked multipliers for schoolbook as well as NTT-based polynomial multiplication. Both of these can utilize the compact multipliers used in the unmasked implementations. The schoolbook multiplication requires an extra polynomial accumulation along with the two polynomial multiplications for a first-order protected implementation. However, this cost is nothing compared to the area saved by utilizing the existing cheap multiplication units. We also extensively test the side-channel resistance of the proposed design through TVLA to guarantee its first-order security

    REED: Chiplet-Based Scalable Hardware Accelerator for Fully Homomorphic Encryption

    Get PDF
    Fully Homomorphic Encryption (FHE) has emerged as a promising technology for processing encrypted data without the need for decryption. Despite its potential, its practical implementation has faced challenges due to substantial computational overhead. To address this issue, we propose the firstfirst chiplet-based FHE accelerator design `REED\u27, which enables scalability and offers high throughput, thereby enhancing homomorphic encryption deployment in real-world scenarios. It incorporates well-known wafer yield issues during fabrication which significantly impacts production costs. In contrast to state-of-the-art approaches, we also address data exchange overhead by proposing a non-blocking inter-chiplet communication strategy. We incorporate novel pipelined Number Theoretic Transform and automorphism techniques, leveraging parallelism and providing high throughput. Experimental results demonstrate that REED 2.5D integrated circuit consumes 177 mm2^2 chip area, 82.5 W average power in 7nm technology, and achieves an impressive speedup of up to 5,982×\times compared to a CPU (24-core 2×\timesIntel X5690), and 2×\times better energy efficiency and 50\% lower development cost than state-of-the-art ASIC accelerator. To evaluate its practical impact, we are the firstfirst to benchmark an encrypted deep neural network training. Overall, this work successfully enhances the practicality and deployability of fully homomorphic encryption in real-world scenarios

    KaLi: A Crystal for Post-Quantum Security using Kyber and Dilithium

    Get PDF
    Quantum computers pose a threat to the security of communications over the internet. This imminent risk has led to the standardization of cryptographic schemes for protection in a post-quantum scenario. We present a design methodology for future implementations of such algorithms. This is manifested using the NIST selected digital signature scheme CRYSTALS-Dilithium and key encapsulation scheme CRYSTALS-Kyber. A unified architecture, \crystal, is proposed that can perform key generation, encapsulation, decapsulation, signature generation, and signature verification for all the security levels of CRYSTALS-Dilithium, and CRYSTALS-Kyber. A unified yet flexible polynomial arithmetic unit is designed that can processes Kyber operations twice as fast as Dilithium operations. Efficient memory management is proposed to achieve optimal latency. \crystal is explicitly tailored for ASIC platforms using multiple clock domains. On ASIC 28nm/65nm technology, it occupies 0.263/1.107 mm2^2 and achieves a clock frequency of 2GHz/560MHz for the fast clock used for memory unit. On Xilinx Zynq Ultrascale+ZCU102 FPGA, the proposed architecture uses 23,277 LUTs, 9,758 DFFs, 4 DSPs, and 24 BRAMs, at 270 MHz clock frequency. \crystal performs better than the standalone implementations of either of the two schemes. This is the first work to provide a unified design in hardware for both schemes

    A Unified Cryptoprocessor for Lattice-based Signature and Key-exchange

    Get PDF
    We propose design methodologies for building a compact, unified and programmable cryptoprocessor architecture that computes post-quantum key agreement and digital signature. Synergies in the two types of cryptographic primitives are used to make the cryptoprocessor compact. As a case study, the cryptoprocessor architecture has been optimized targeting the signature scheme \u27CRYSTALS-Dilithium\u27 and the key encapsulation mechanism (KEM) \u27Saber\u27, both finalists in the NIST’s post-quantum cryptography standardization project. The programmable cryptoprocessor executes key generations, encapsulations, decapsulations, signature generations, and signature verifications for all the security levels of Dilithium and Saber. On a Xilinx Ultrascale+ FPGA, the proposed cryptoprocessor consumes 18,406 LUTs, 9,323 FFs, 4 DSPs, and 24 BRAMs. It achieves 200 MHz clock frequency and finishes CCA-secure key-generation/encapsulation/decapsulation operations for LightSaber in 29.6/40.4/ 58.3μ\mus; for Saber in 54.9/69.7/94.9μ\mus; and for FireSaber in 87.6/108.0/139.4μ\mus, respectively. It finishes key-generation/sign/verify operations for Dilithium-2 in 70.9/151.6/75.2μ\mus; for Dilithium-3 in 114.7/237/127.6μ\mus; and for Dilithium-5 in 194.2/342.1/228.9μ\mus, respectively, for the best-case scenario. On UMC 65nm library for ASIC the latency is improved by a factor of two due to a 2×\times increase in clock frequency

    Backdooring Post-Quantum Cryptography: Kleptographic Attacks on Lattice-based KEMs

    Get PDF
    Post-quantum Cryptography (PQC) has reached the verge of standardization competition, with Kyber as a winning candidate. In this work, we demonstrate practical backdoor insertion in Kyber through kleptrography. The backdoor can be inserted using classical techniques like ECDH or post-quantum Classic Mceliece. The inserted backdoor targets the key generation procedure where generated output public keys subliminally leak information about the secret key to the owner of the backdoor. We demonstrate first practical instantiations of such attack at the protocol level by validating it on TLS 1.3

    Accelerator for Computing on Encrypted Data

    Get PDF
    Fully homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations. In this paper, we present a complete instruction-set processor architecture ‘Medha’ for accelerating the cloud-side operations of an RNS variant of the HEAAN homomorphic encryption scheme. Medha has been designed following a modular hardware design approach to attain a fast computation time for computationally expensive homomorphic operations on encrypted data. At every level of the implementation hierarchy, we explore possibilities for parallel processing. Starting from hardware-friendly parallel algorithms for the basic building blocks, we gradually build heavily parallel RNS polynomial arithmetic units. Next, many of these parallel units are interconnected elegantly so that their interconnections require the minimum number of nets, therefore making the overall architecture placement-friendly on the implementation platform. As homomorphic encryption is computation- as well as data-centric, the speed of homomorphic evaluations depends greatly on the way the data variables are handled. For Medha, we take a memory-conservative design approach and get rid of any off-chip memory access during homomorphic evaluations. Our instruction-set accelerator Medha is programmable and it supports all homomorphic evaluation routines of the leveled fully RNS-HEAAN scheme. For a reasonably large parameter with the polynomial ring dimension 214 and ciphertext coefficient modulus 438-bit (corresponding to 128-bit security), we implemented Medha in a Xilinx Alveo U250 card. Medha achieves the fastest computation latency to date and is almost 2.4× faster in latency and also somewhat smaller in area than a state-of-the-art reconfigurable hardware accelerator for the same parameter

    Medha: Microcoded Hardware Accelerator for computing on Encrypted Data

    Get PDF
    Homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of homomorphic encryption is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data. First, we propose a divide-and-conquer technique that enables homomorphic evaluations in the polynomial ring RQ,2N = ZQ[x]/(x2N + 1) to use a hardware accelerator that has been built for the smaller ring RQ,N = ZQ[x]/(xN + 1). The technique makes it possible to use a single hardware accelerator flexibly for supporting several homomorphic encryption parameter sets. Next, we present several architectural design methods that we use to realize the flexible and instruction-set accelerator architecture, which we call ‘Medha’. At every level of the implementation hierarchy, we explore possibilities for parallel processing. Starting from hardware-friendly parallel algorithms for the basic building blocks, we gradually build heavily parallel RNS polynomial arithmetic units. Next, many of these parallel units are interconnected elegantly so that their interconnections require the minimum number of nets, therefore making the overall architecture placement-friendly on the platform. As homomorphic encryption is computation- as well as data-centric, the speed of homomorphic evaluations depends greatly on the way the data variables are handled. For Medha, we take a memory-conservative design approach and get rid of any off-chip memory access during homomorphic evaluations. Finally, we implement Medha in a Xilinx Alveo U250 FPGA and measure timing performances of the microcoded homomorphic addition, multiplication, key-switching, and rescaling routines for the leveled fully homomorphic encryption scheme RNSHEAAN at 200 MHz clock frequency. For the large parameter sets (log Q,N) = (438, 214) and (546, 215), Medha achieves accelerations by up to 68× and 78× times respectively compared to a highly optimized software implementation Microsoft SEAL running at 2.3 GHz

    Third-Generation Capsule Endoscopy Outperforms Second-Generation Based on the Detectability of Esophageal Varices

    Get PDF
    Background and Aim. The third-generation capsule endoscopy (SB3) was shown to have better image resolution than that of SB2. The aim of this study was to compare SB2 and SB3 regarding detectability of esophageal varices (EVs). Methods. Seventy-six consecutive liver cirrhosis patients (42 men; mean age: 67 years) received SB3, and 99 (58 men; mean age, 67 years old) received SB2. All patients underwent esophagogastroduodenoscopy within 1 month prior to capsule endoscopy as gold standard for diagnosis. The diagnosis using SB3 and SB2 for EVs was evaluated regarding form (F0–F3), location (Ls, Lm, and Li), and the red color (RC) sign of EVs. Results. SB2 and SB3 did not significantly differ on overall diagnostic rates for EV. Sensitivity, specificity, positive predictive value, and negative predictive value of SB2/SB3 for EV diagnosis were, respectively, 65%/81%, 100%/100%, 100%/100%, and 70%/62%. However, the diagnostic rates for EV form F1 were 81% using SB3 and 52% using SB2 (P=0.009). Further, the diagnostic rates for Ls/Lm varices were 79% using SB3 and 81% using SB2, and, for Li, varices were 84% using SB3 and 52% using SB2 (P=0.02). Conclusion. SB3 significantly improved the detectability of EVs compared with SB2
    • …
    corecore