113 research outputs found
A Survey on Homomorphic Encryption Schemes: Theory and Implementation
Legacy encryption systems depend on sharing a key (public or private) among
the peers involved in exchanging an encrypted message. However, this approach
poses privacy concerns. Especially with popular cloud services, the control
over the privacy of the sensitive data is lost. Even when the keys are not
shared, the encrypted material is shared with a third party that does not
necessarily need to access the content. Moreover, untrusted servers, providers,
and cloud operators can keep identifying elements of users long after users end
the relationship with the services. Indeed, Homomorphic Encryption (HE), a
special kind of encryption scheme, can address these concerns as it allows any
third party to operate on the encrypted data without decrypting it in advance.
Although this extremely useful feature of the HE scheme has been known for over
30 years, the first plausible and achievable Fully Homomorphic Encryption (FHE)
scheme, which allows any computable function to perform on the encrypted data,
was introduced by Craig Gentry in 2009. Even though this was a major
achievement, different implementations so far demonstrated that FHE still needs
to be improved significantly to be practical on every platform. First, we
present the basics of HE and the details of the well-known Partially
Homomorphic Encryption (PHE) and Somewhat Homomorphic Encryption (SWHE), which
are important pillars of achieving FHE. Then, the main FHE families, which have
become the base for the other follow-up FHE schemes are presented. Furthermore,
the implementations and recent improvements in Gentry-type FHE schemes are also
surveyed. Finally, further research directions are discussed. This survey is
intended to give a clear knowledge and foundation to researchers and
practitioners interested in knowing, applying, as well as extending the state
of the art HE, PHE, SWHE, and FHE systems.Comment: - Updated. (October 6, 2017) - This paper is an early draft of the
survey that is being submitted to ACM CSUR and has been uploaded to arXiv for
feedback from stakeholder
High-Performance VLSI Architectures for Lattice-Based Cryptography
Lattice-based cryptography is a cryptographic primitive built upon the hard problems on point lattices. Cryptosystems relying on lattice-based cryptography have attracted huge attention in the last decade since they have post-quantum-resistant security and the remarkable construction of the algorithm. In particular, homomorphic encryption (HE) and post-quantum cryptography (PQC) are the two main applications of lattice-based cryptography. Meanwhile, the efficient hardware implementations for these advanced cryptography schemes are demanding to achieve a high-performance implementation.
This dissertation aims to investigate the novel and high-performance very large-scale integration (VLSI) architectures for lattice-based cryptography, including the HE and PQC schemes. This dissertation first presents different architectures for the number-theoretic transform (NTT)-based polynomial multiplication, one of the crucial parts of the fundamental arithmetic for lattice-based HE and PQC schemes. Then a high-speed modular integer multiplier is proposed, particularly for lattice-based cryptography. In addition, a novel modular polynomial multiplier is presented to exploit the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication for lattice-based PQC scheme. Afterward, an NTT and Chinese remainder theorem (CRT)-based high-speed modular polynomial multiplier is presented for HE schemes whose moduli are large integers
Efficient Computation and FPGA implementation of Fully Homomorphic Encryption with Cloud Computing Significance
Homomorphic Encryption provides unique security solution for cloud computing. It ensures not only that data in cloud have confidentiality but also that data processing by cloud server does not compromise data privacy. The Fully Homomorphic Encryption (FHE) scheme proposed by Lopez-Alt, Tromer, and Vaikuntanathan (LTV), also known as NTRU(Nth degree truncated polynomial ring) based method, is considered one of the most important FHE methods suitable for practical implementation. In this thesis, an efficient algorithm and architecture for LTV Fully Homomorphic Encryption is proposed. Conventional linear feedback shift register (LFSR) structure is expanded and modified for performing the truncated polynomial ring multiplication in LTV scheme in parallel. Novel and efficient modular multiplier, modular adder and modular subtractor are proposed to support high speed processing of LFSR operations. In addition, a family of special moduli are selected for high speed computation of modular operations. Though the area keeps the complexity of O(Nn^2) with no advantage in circuit level. The proposed architecture effectively reduces the time complexity from O(N log N) to linear time, O(N), compared to the best existing works. An FPGA implementation of the proposed architecture for LTV FHE is achieved and demonstrated. An elaborate comparison of the existing methods and the proposed work is presented, which shows the proposed work gains significant speed up over existing works
A Lightweight Implementation of NTRUEncrypt for 8-bit AVR Microcontrollers
Introduced in 1996, NTRUEncrypt is not only one of the earliest but also one of the most scrutinized lattice-based cryptosystems and a serious contender in NIST’s ongoing Post-Quantum Cryptography (PQC) standardization project. An important criterion for the assessment of candidates is their computational cost in various hardware and software environments. This paper contributes to the evaluation of NTRUEncrypt on the ATmega class of AVR microcontrollers, which belongs to the most popular 8-bit platforms in the embedded domain. More concretely, we present AvrNtru, a carefully-optimized implementation of NTRUEncrypt that we developed from scratch with the goal of achieving high performance and resistance to timing attacks. AvrNtru complies with version 3.3 of the EESS#1 specification and supports recent product-form parameter sets like ees443ep1, ees587ep1, and ees743ep1. A full encryption operation (including mask generation and blinding- polynomial generation) using the ees443ep1 parameters takes 834,272 clock cycles on an ATmega1281 microcontroller; the decryption is slightly more costly and has an execution time of 1,061,683 cycles. When choosing the ees743ep1 parameters to achieve a 256-bit security level, 1,539,829 clock cycles are cost for encryption and 2,103,228 clock cycles for decryption. We achieved these results thanks to a novel hybrid technique for multiplication in truncated polynomial rings where one of the operands is a sparse ternary polynomial in product form. Our hybrid technique is inspired by Gura et al’s hybrid method for multiple-precision integer multiplication (CHES 2004) and takes advantage of the large register file of the AVR architecture to minimize the number of load instructions. A constant-time multiplication in the ring specified by the ees443ep1 parameters requires only 210,827 cycles, which sets a new speed record for the arithmetic component of a lattice-based cryptosystem on an 8-bit microcontroller
A Family of Scalable Polynomial Multiplier Architectures for Ring-LWE Based Cryptosystems
Many lattice based cryptosystems are based on the Ring learning with errors (Ring-LWE) problem.
The most critical and computationally intensive operation of these Ring-LWE based cryptosystems is polynomial multiplication over rings.
In this paper, we exploit the number theoretic transform (NTT) to build a family of scalable polynomial multiplier architectures,
which provide designers with a trade-off choice of speed vs. area.
Our polynomial multipliers are capable to calculate the product of two -degree polynomials in about clock cycles,
where is the number of the butterfly operators.
In addition, we exploit the cancellation lemma to reduce the required ROM storage.
The experimental results on a Spartan-6 FPGA show that the proposed polynomial multiplier architectures achieve a speedup of 3 times on average
and consume less Block RAMs and slices
when compared with the compact design.
Compared with the state of the art of high-speed design,
the proposed hardware architectures save up to 46.64\% clock cycles
and improve the utilization rate of the main data processing units by 42.27\%.
Meanwhile, our designs can save up to 29.41\% block RAMs
PaReNTT: Low-Latency Parallel Residue Number System and NTT-Based Long Polynomial Modular Multiplication for Homomorphic Encryption
High-speed long polynomial multiplication is important for applications in
homomorphic encryption (HE) and lattice-based cryptosystems. This paper
addresses low-latency hardware architectures for long polynomial modular
multiplication using the number-theoretic transform (NTT) and inverse NTT
(iNTT). Chinese remainder theorem (CRT) is used to decompose the modulus into
multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes four
novel contributions. First, parallel NTT and iNTT architectures are proposed to
reduce the number of clock cycles to process the polynomials. This can enable
real-time processing for HE applications, as the number of clock cycles to
process the polynomial is inversely proportional to the level of parallelism.
Second, the proposed architecture eliminates the need for permuting the NTT
outputs before their product is input to the iNTT. This reduces latency by n/4
clock cycles, where n is the length of the polynomial, and reduces buffer
requirement by one delay-switch-delay circuit of size n. Third, an approach to
select special moduli is presented where the moduli can be expressed in terms
of a few signed power-of-two terms. Fourth, novel architectures for
pre-processing for computing residual polynomials using the CRT and
post-processing for combining the residual polynomials are proposed. These
architectures significantly reduce the area consumption of the pre-processing
and post-processing steps. The proposed long modular polynomial multiplications
are ideal for applications that require low latency and high sample rate as
these feed-forward architectures can be pipelined at arbitrary levels
Analysis of BCNS and Newhope Key-exchange Protocols
Lattice-based cryptographic primitives are believed to offer resilience against attacks by quantum computers. Following increasing interest from both companies and government agencies in building quantum computers, a number of works have proposed instantiations of practical post-quantum key-exchange protocols based on hard problems in lattices, mainly based on the Ring Learning With Errors (R-LWE) problem.
In this work we present an analysis of Ring-LWE based key-exchange mechanisms and compare two implementations of Ring-LWE based key-exchange protocol: BCNS and NewHope. This is important as NewHope protocol implementation outperforms state-of-the art elliptic curve based Diffie-Hellman key-exchange X25519, thus showing that using quantum safe key-exchange is not only a viable option but also a faster one. Specifically, this thesis compares different reconciliation methods, parameter choices, noise sampling algorithms and performance
HPKA: A High-Performance CRYSTALS-Kyber Accelerator Exploring Efficient Pipelining
CRYSTALS-Kyber (Kyber) was recently chosen as the first quantum resistant Key Encapsulation Mechanism (KEM) scheme for standardisation, after three rounds of the National Institute of Standards and Technology (NIST) initiated PQC competition which begin in 2016 and search of the best quantum resistant KEMs and digital signatures. Kyber is based on the Module-Learning with Errors (M-LWE) class of Lattice-based Cryptography, that is known to manifest efficiently on FPGAs. This work explores several architectural optimizations and proposes a high-performance and area-time (AT) product efficient hardware accelerator for Kyber. The proposed architectural optimizations include inter-module and intra-module pipelining, that are designed and balanced via FIFO based buffering to ensure maximum parallelisation. The implementation results show that compared to state-of-the-art designs, the proposed architecture delivers 25-51% speedups for Kyber\u27s three different security levels on Artix-7 and Zynq UltraScale+ devices, and a 50-75\% reduction in DSPs at comparable security level. Consequently, the proposed design achieve higher AT product efficiencies of 19-33%
- …