113 research outputs found

    A Survey on Homomorphic Encryption Schemes: Theory and Implementation

    Full text link
    Legacy encryption systems depend on sharing a key (public or private) among the peers involved in exchanging an encrypted message. However, this approach poses privacy concerns. Especially with popular cloud services, the control over the privacy of the sensitive data is lost. Even when the keys are not shared, the encrypted material is shared with a third party that does not necessarily need to access the content. Moreover, untrusted servers, providers, and cloud operators can keep identifying elements of users long after users end the relationship with the services. Indeed, Homomorphic Encryption (HE), a special kind of encryption scheme, can address these concerns as it allows any third party to operate on the encrypted data without decrypting it in advance. Although this extremely useful feature of the HE scheme has been known for over 30 years, the first plausible and achievable Fully Homomorphic Encryption (FHE) scheme, which allows any computable function to perform on the encrypted data, was introduced by Craig Gentry in 2009. Even though this was a major achievement, different implementations so far demonstrated that FHE still needs to be improved significantly to be practical on every platform. First, we present the basics of HE and the details of the well-known Partially Homomorphic Encryption (PHE) and Somewhat Homomorphic Encryption (SWHE), which are important pillars of achieving FHE. Then, the main FHE families, which have become the base for the other follow-up FHE schemes are presented. Furthermore, the implementations and recent improvements in Gentry-type FHE schemes are also surveyed. Finally, further research directions are discussed. This survey is intended to give a clear knowledge and foundation to researchers and practitioners interested in knowing, applying, as well as extending the state of the art HE, PHE, SWHE, and FHE systems.Comment: - Updated. (October 6, 2017) - This paper is an early draft of the survey that is being submitted to ACM CSUR and has been uploaded to arXiv for feedback from stakeholder

    High-Performance VLSI Architectures for Lattice-Based Cryptography

    Get PDF
    Lattice-based cryptography is a cryptographic primitive built upon the hard problems on point lattices. Cryptosystems relying on lattice-based cryptography have attracted huge attention in the last decade since they have post-quantum-resistant security and the remarkable construction of the algorithm. In particular, homomorphic encryption (HE) and post-quantum cryptography (PQC) are the two main applications of lattice-based cryptography. Meanwhile, the efficient hardware implementations for these advanced cryptography schemes are demanding to achieve a high-performance implementation. This dissertation aims to investigate the novel and high-performance very large-scale integration (VLSI) architectures for lattice-based cryptography, including the HE and PQC schemes. This dissertation first presents different architectures for the number-theoretic transform (NTT)-based polynomial multiplication, one of the crucial parts of the fundamental arithmetic for lattice-based HE and PQC schemes. Then a high-speed modular integer multiplier is proposed, particularly for lattice-based cryptography. In addition, a novel modular polynomial multiplier is presented to exploit the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication for lattice-based PQC scheme. Afterward, an NTT and Chinese remainder theorem (CRT)-based high-speed modular polynomial multiplier is presented for HE schemes whose moduli are large integers

    Efficient Computation and FPGA implementation of Fully Homomorphic Encryption with Cloud Computing Significance

    Get PDF
    Homomorphic Encryption provides unique security solution for cloud computing. It ensures not only that data in cloud have confidentiality but also that data processing by cloud server does not compromise data privacy. The Fully Homomorphic Encryption (FHE) scheme proposed by Lopez-Alt, Tromer, and Vaikuntanathan (LTV), also known as NTRU(Nth degree truncated polynomial ring) based method, is considered one of the most important FHE methods suitable for practical implementation. In this thesis, an efficient algorithm and architecture for LTV Fully Homomorphic Encryption is proposed. Conventional linear feedback shift register (LFSR) structure is expanded and modified for performing the truncated polynomial ring multiplication in LTV scheme in parallel. Novel and efficient modular multiplier, modular adder and modular subtractor are proposed to support high speed processing of LFSR operations. In addition, a family of special moduli are selected for high speed computation of modular operations. Though the area keeps the complexity of O(Nn^2) with no advantage in circuit level. The proposed architecture effectively reduces the time complexity from O(N log N) to linear time, O(N), compared to the best existing works. An FPGA implementation of the proposed architecture for LTV FHE is achieved and demonstrated. An elaborate comparison of the existing methods and the proposed work is presented, which shows the proposed work gains significant speed up over existing works

    A Lightweight Implementation of NTRUEncrypt for 8-bit AVR Microcontrollers

    Get PDF
    Introduced in 1996, NTRUEncrypt is not only one of the earliest but also one of the most scrutinized lattice-based cryptosystems and a serious contender in NIST’s ongoing Post-Quantum Cryptography (PQC) standardization project. An important criterion for the assessment of candidates is their computational cost in various hardware and software environments. This paper contributes to the evaluation of NTRUEncrypt on the ATmega class of AVR microcontrollers, which belongs to the most popular 8-bit platforms in the embedded domain. More concretely, we present AvrNtru, a carefully-optimized implementation of NTRUEncrypt that we developed from scratch with the goal of achieving high performance and resistance to timing attacks. AvrNtru complies with version 3.3 of the EESS#1 specification and supports recent product-form parameter sets like ees443ep1, ees587ep1, and ees743ep1. A full encryption operation (including mask generation and blinding- polynomial generation) using the ees443ep1 parameters takes 834,272 clock cycles on an ATmega1281 microcontroller; the decryption is slightly more costly and has an execution time of 1,061,683 cycles. When choosing the ees743ep1 parameters to achieve a 256-bit security level, 1,539,829 clock cycles are cost for encryption and 2,103,228 clock cycles for decryption. We achieved these results thanks to a novel hybrid technique for multiplication in truncated polynomial rings where one of the operands is a sparse ternary polynomial in product form. Our hybrid technique is inspired by Gura et al’s hybrid method for multiple-precision integer multiplication (CHES 2004) and takes advantage of the large register file of the AVR architecture to minimize the number of load instructions. A constant-time multiplication in the ring specified by the ees443ep1 parameters requires only 210,827 cycles, which sets a new speed record for the arithmetic component of a lattice-based cryptosystem on an 8-bit microcontroller

    A Family of Scalable Polynomial Multiplier Architectures for Ring-LWE Based Cryptosystems

    Get PDF
    Many lattice based cryptosystems are based on the Ring learning with errors (Ring-LWE) problem. The most critical and computationally intensive operation of these Ring-LWE based cryptosystems is polynomial multiplication over rings. In this paper, we exploit the number theoretic transform (NTT) to build a family of scalable polynomial multiplier architectures, which provide designers with a trade-off choice of speed vs. area. Our polynomial multipliers are capable to calculate the product of two nn-degree polynomials in about (1.5nlgn+1.5n)/b(1.5n\lg n + 1.5n)/b clock cycles, where bb is the number of the butterfly operators. In addition, we exploit the cancellation lemma to reduce the required ROM storage. The experimental results on a Spartan-6 FPGA show that the proposed polynomial multiplier architectures achieve a speedup of 3 times on average and consume less Block RAMs and slices when compared with the compact design. Compared with the state of the art of high-speed design, the proposed hardware architectures save up to 46.64\% clock cycles and improve the utilization rate of the main data processing units by 42.27\%. Meanwhile, our designs can save up to 29.41\% block RAMs

    PaReNTT: Low-Latency Parallel Residue Number System and NTT-Based Long Polynomial Modular Multiplication for Homomorphic Encryption

    Full text link
    High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes four novel contributions. First, parallel NTT and iNTT architectures are proposed to reduce the number of clock cycles to process the polynomials. This can enable real-time processing for HE applications, as the number of clock cycles to process the polynomial is inversely proportional to the level of parallelism. Second, the proposed architecture eliminates the need for permuting the NTT outputs before their product is input to the iNTT. This reduces latency by n/4 clock cycles, where n is the length of the polynomial, and reduces buffer requirement by one delay-switch-delay circuit of size n. Third, an approach to select special moduli is presented where the moduli can be expressed in terms of a few signed power-of-two terms. Fourth, novel architectures for pre-processing for computing residual polynomials using the CRT and post-processing for combining the residual polynomials are proposed. These architectures significantly reduce the area consumption of the pre-processing and post-processing steps. The proposed long modular polynomial multiplications are ideal for applications that require low latency and high sample rate as these feed-forward architectures can be pipelined at arbitrary levels

    Analysis of BCNS and Newhope Key-exchange Protocols

    Get PDF
    Lattice-based cryptographic primitives are believed to offer resilience against attacks by quantum computers. Following increasing interest from both companies and government agencies in building quantum computers, a number of works have proposed instantiations of practical post-quantum key-exchange protocols based on hard problems in lattices, mainly based on the Ring Learning With Errors (R-LWE) problem. In this work we present an analysis of Ring-LWE based key-exchange mechanisms and compare two implementations of Ring-LWE based key-exchange protocol: BCNS and NewHope. This is important as NewHope protocol implementation outperforms state-of-the art elliptic curve based Diffie-Hellman key-exchange X25519, thus showing that using quantum safe key-exchange is not only a viable option but also a faster one. Specifically, this thesis compares different reconciliation methods, parameter choices, noise sampling algorithms and performance

    HPKA: A High-Performance CRYSTALS-Kyber Accelerator Exploring Efficient Pipelining

    Get PDF
    CRYSTALS-Kyber (Kyber) was recently chosen as the first quantum resistant Key Encapsulation Mechanism (KEM) scheme for standardisation, after three rounds of the National Institute of Standards and Technology (NIST) initiated PQC competition which begin in 2016 and search of the best quantum resistant KEMs and digital signatures. Kyber is based on the Module-Learning with Errors (M-LWE) class of Lattice-based Cryptography, that is known to manifest efficiently on FPGAs. This work explores several architectural optimizations and proposes a high-performance and area-time (AT) product efficient hardware accelerator for Kyber. The proposed architectural optimizations include inter-module and intra-module pipelining, that are designed and balanced via FIFO based buffering to ensure maximum parallelisation. The implementation results show that compared to state-of-the-art designs, the proposed architecture delivers 25-51% speedups for Kyber\u27s three different security levels on Artix-7 and Zynq UltraScale+ devices, and a 50-75\% reduction in DSPs at comparable security level. Consequently, the proposed design achieve higher AT product efficiencies of 19-33%
    corecore