10 research outputs found

    Accelerating NTRU based Homomorphic Encryption using GPUs

    Get PDF
    In this work we introduce a large polynomial arithmetic library optimized for Nvidia GPUs to support fully homomorphic encryption schemes. To realize the large polynomial arithmetic library we convert the polynomial with large coefficients using the Chinese Remainder Theorem into many polynomials with small coefficients, and then carry out modular multiplications in the residue space using a custom developed discrete Fourier transform library. We further extend the library to support the homomorphic evaluation operations, i.e. addition, multiplication, and relinearization, in an NTRU based somewhat homomorphic encryption library. Finally, we put the library to use to evaluate homomorphic evaluation of two block ciphers: Prince and AES, which show 2.57 times and 7.6 times speedup, respectively, over an Intel Xeon software implementation

    Comparison of RISC-V and transport triggered architectures for a post-quantum cryptography application

    Get PDF
    Cryptography is one of the basic phenomena of security systems. However, some of the widely used public key cryptography algorithms can be broken by using quantum computers. Therefore, many post-quantum cryptography algorithms are proposed in recent years to handle this issue. NTRU is one of the most important of these quantum-safe algorithms. Besides the importance of cryptography algorithms, the architecture where they are implemented is also essential. In this study, we developed an NTRU public key cryptosystem application and designed several processors to compare them in many aspects. We address two dierent architectures in this work. The RISC-V is chosen as it is the most lately version of classical RISC architecture. As competitor to this, we preferred transport triggered architecture (TTA) which oers high level customization and scalability. Details of all dierent implementations and the test results obtained with them are shared and discussed

    Accelerating LTV based homomorphic encryption in reconfigurable hardware

    Get PDF
    After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The accelerator we present is connected via a fast PCIe interface to a CPU platform to provide homomorphic evaluation services to any application that needs to support blinded computations. Specifically we introduce a number theoretical transform based multiplier architecture capable of efficiently handling very large polynomials. When synthesized for the Xilinx Virtex 7 family the presented architecture can compute the product of large polynomials in under 6.25 msec making it the fastest multiplier design of its kind currently available in the literature and is more than 102 times faster than a software implementation. Using this multiplier we can compute a relinearization operation in 526 msec. When used as an accelerator, for instance, to evaluate the AES block cipher, we estimate a per block homomorphic evaluation performance of 442 msec yielding performance gains of 28.5 and 17 times over similar CPU and GPU implementations, respectively

    cuHE: A Homomorphic Encryption Accelerator Library

    Get PDF
    We introduce a CUDA GPU library to accelerate evaluations with homomorphic schemes defined over polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory minimization techniques, memory and thread scheduling and low level CUDA hand-tuned assembly optimizations to take full advantage of the mass parallelism and high memory bandwidth GPUs offer. The arithmetic functions constructed to handle very large polynomial operands using number-theoretic transform (NTT) and Chinese remainder theorem (CRT) based methods are then extended to implement the primitives of the leveled homomorphic encryption scheme proposed by Löpez-Alt, Tromer and Vaikuntanathan. To compare the performance of the proposed CUDA library we implemented two applications: the Prince block cipher and homomorphic sorting algorithms on two GPU platforms in single GPU and multiple GPU configurations. We observed a speedup of 25 times and 51 times over the best previous GPU implementation for Prince with single and triple GPUs, respectively. Similarly for homomorphic sorting we obtained 12-41 times speedup depending on the number and size of the sorted elements

    Blind Web Search: How far are we from a privacy preserving search engine?

    Get PDF
    Recent rapid progress in fully homomorphic encryption (FHE) and somewhat homomorphic encryption (SHE) has catalyzed renewed efforts to develop efficient privacy preserving protocols. Several works have already appeared in the literature that provide solutions to these problems by employing FHE or SHE techniques. In this work, we focus on a natural application where privacy is a major concern: web search. An estimated 5 billion web queries are processed by the world\u27s leading search engines each day. It is no surprise, then, that privacy-preserving web search was proposed as the paragon FHE application in Gentry\u27s seminal FHE paper. Indeed, numerous proposals have emerged in the intervening years that attack various privatized search problems over encrypted user data, e.g. private information retrieval (PIR). Yet, there is no known work that focuses on implementing a completely blind web search engine using an FHE/SHE construction. In this work, we focus first on single keyword queries with exact matches, aiming toward real-world viability. We then discuss multiple-keyword searches and tackle a number of issues currently hindering practical implementation, such as communication and computational efficiency

    Accelerating Somewhat Homomorphic Evaluation using FPGAs

    Get PDF
    After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The accelerator we present is connected via a fast PCIe interface to a CPU platform to provide homomorphic evaluation services to any application that needs to support blinded computations. Specifically we introduce a number theoretical transform based multiplier architecture capable of efficiently handling very large polynomials. When synthesized for the Xilinx Virtex 7 family the presented architecture can compute the product of large polynomials in under 6.256.25~msec making it the fastest multiplier design of its kind currently available in the literature and is more than 102 times faster than a software implementation. Using this multiplier we can compute a relinearization operation in 526526 msec. When used as an accelerator, for instance, to evaluate the AES block cipher, we estimate a per block homomorphic evaluation performance of 442442~msec yielding performance gains of 28.528.5 and 1717 times over similar CPU and GPU implementations, respectively

    A Survey on Homomorphic Encryption Schemes: Theory and Implementation

    Full text link
    Legacy encryption systems depend on sharing a key (public or private) among the peers involved in exchanging an encrypted message. However, this approach poses privacy concerns. Especially with popular cloud services, the control over the privacy of the sensitive data is lost. Even when the keys are not shared, the encrypted material is shared with a third party that does not necessarily need to access the content. Moreover, untrusted servers, providers, and cloud operators can keep identifying elements of users long after users end the relationship with the services. Indeed, Homomorphic Encryption (HE), a special kind of encryption scheme, can address these concerns as it allows any third party to operate on the encrypted data without decrypting it in advance. Although this extremely useful feature of the HE scheme has been known for over 30 years, the first plausible and achievable Fully Homomorphic Encryption (FHE) scheme, which allows any computable function to perform on the encrypted data, was introduced by Craig Gentry in 2009. Even though this was a major achievement, different implementations so far demonstrated that FHE still needs to be improved significantly to be practical on every platform. First, we present the basics of HE and the details of the well-known Partially Homomorphic Encryption (PHE) and Somewhat Homomorphic Encryption (SWHE), which are important pillars of achieving FHE. Then, the main FHE families, which have become the base for the other follow-up FHE schemes are presented. Furthermore, the implementations and recent improvements in Gentry-type FHE schemes are also surveyed. Finally, further research directions are discussed. This survey is intended to give a clear knowledge and foundation to researchers and practitioners interested in knowing, applying, as well as extending the state of the art HE, PHE, SWHE, and FHE systems.Comment: - Updated. (October 6, 2017) - This paper is an early draft of the survey that is being submitted to ACM CSUR and has been uploaded to arXiv for feedback from stakeholder

    SoK: Fully Homomorphic Encryption Accelerators

    Full text link
    Fully Homomorphic Encryption~(FHE) is a key technology enabling privacy-preserving computing. However, the fundamental challenge of FHE is its inefficiency, due primarily to the underlying polynomial computations with high computation complexity and extremely time-consuming ciphertext maintenance operations. To tackle this challenge, various FHE accelerators have recently been proposed by both research and industrial communities. This paper takes the first initiative to conduct a systematic study on the 14 FHE accelerators -- cuHE/cuFHE, nuFHE, HEAT, HEAX, HEXL, HEXL-FPGA, 100×\times, F1, CraterLake, BTS, ARK, Poseidon, FAB and TensorFHE. We first make our observations on the evolution trajectory of these existing FHE accelerators to establish a qualitative connection between them. Then, we perform testbed evaluations of representative open-source FHE accelerators to provide a quantitative comparison on them. Finally, with the insights learned from both qualitative and quantitative studies, we discuss potential directions to inform the future design and implementation for FHE accelerators

    Accelerating Homomorphic Evaluation on Reconfigurable Hardware

    Get PDF
    Homomorphic encryption allows computation on encrypted data and makes it possible to securely outsource computational tasks to untrusted environments. However, all proposed schemes are quite inefficient and homomorphic evaluation of ciphertexts usually takes several seconds on high-end CPUs, even for evaluating simple functions. In this work we investigate the potential of FPGAs for speeding up those evaluation operations. We propose an architecture to accelerate schemes based on the ring learning with errors (RLWE) problem and specifically implemented the somewhat homomorphic encryption scheme YASHE, which was proposed by Bos, Lauter, Loftus, and Naehrig in 2013. Due to the large size of ciphertexts and evaluation keys, on-chip storage of all data is not possible and external memory is required. For efficient utilization of the external memory we propose an efficient double-buffered memory access scheme and a polynomial multiplier based on the number theoretic transform (NTT). For the parameter set (n=16384,log_2(q)=512) capable of evaluating 9 levels of multiplications, we can perform a homomorphic addition in 48.67 and a homomorphic multiplication in 0.94 ms
    corecore