7 research outputs found

    Design and implementation of a fast and scalable NTT-based polynomial multiplier architecture

    Get PDF
    In this paper, we present an optimized FPGA implementation of a novel, fast and highly parallelized NTT-based polynomial multiplier architecture, which proves to be effective as an accelerator for lattice-based homomorphic cryptographic schemes. As I/O operations are as time-consuming as NTT operations during homomorphic computations in a host processor/accelerator setting, instead of achieving the fastest NTT implementation possible on the target FPGA, we focus on a balanced time performance between the NTT and I/O operations. Even with this goal, we achieved the fastest NTT implementation in literature, to the best of our knowledge. For proof of concept, we utilize our architecture in a framework for Fan-Vercauteren (FV) homomorphic encryption scheme, utilizing a hardware/software co-design approach, in which polynomial multiplication operations are offloaded to the accelerator via PCIe bus while the rest of operations in the FV scheme are executed in software running on an off-the-shelf desktop computer. Specifically, our framework is optimized to accelerate Simple Encrypted Arithmetic Library (SEAL), developed by the Cryptography Research Group at Microsoft Research, for the FV encryption scheme, where large degree polynomial multiplications are utilized extensively. The hardware part of the proposed framework targets Xilinx Virtex-7 FPGA device and the proposed framework achieves almost 11x latency speedup for the offloaded operations compared to their pure software implementations

    Implementation and evaluation of improved Gaussian sampling for lattice trapdoors

    Get PDF
    We report on our implementation of a new Gaussian sampling algorithm for lattice trapdoors. Lattice trapdoors are used in a wide array of lattice-based cryptographic schemes including digital signatures, attributed-based encryption, program obfuscation and others. Our implementation provides Gaussian sampling for trapdoor lattices with prime moduli, and supports both single- and multi-threaded execution. We experimentally evaluate our implementation through its use in the GPV hash-and-sign digital signature scheme as a benchmark. We compare our design and implementation with prior work reported in the literature. The evaluation shows that our implementation 1) has smaller space requirements and faster runtime, 2) does not require multi-precision floating-point arithmetic, and 3) can be used for a broader range of cryptographic primitives than previous implementations

    Implementation and Comparison of Lattice-based Identification Protocols on Smart Cards and Microcontrollers

    Get PDF
    Most lattice-based cryptographic schemes which enjoy a security proof suffer from huge key sizes and heavy computations. This is also true for the simpler case of identification protocols. Recent progress on ideal lattices has significantly improved the efficiency, and made it possible to implement practical lattice-based cryptography on constrained devices like FPGAs and smart phones. However, to the best of our knowledge, no previous attempts were made to implement lattice-based schemes on smart cards. In this paper, we report the results of our implementation of several state-of-the-art and highly-secure lattice-based identification protocols on smart cards and microcontrollers. Our results show that only a few of such protocols fit into the limitations of these devices. We also discuss the implementation challenges and techniques to perform lattice-based cryptography on constrained devices, which may be of independent interest

    A custom accelerator for homomorphic encryption applications

    Get PDF
    After the introduction of first fully homomorphic encryption scheme in 2009, numerous research work has been published aiming at making fully homomorphic encryption practical for daily use. The first fully functional scheme and a few others that have been introduced has been proven difficult to be utilized in practical applications, due to efficiency reasons. Here, we propose a custom hardware accelerator, which is optimized for a class of reconfigurable logic, for Lopez-Alt, Tromer and Vaikuntanathan’s somewhat homomorphic encryption based schemes. Our design is working as a co-processor which enables the operating system to offload the most compute–heavy operations to this specialized hardware. The core of our design is an efficient hardware implementation of a polynomial multiplier as it is the most compute–heavy operation of our target scheme. The presented architecture can compute the product of very–large polynomials in under 6.25 ms which is 102 times faster than its software implementation. In case of accelerating homomorphic applications; we estimate the per block homomorphic AES as 442 ms which is 28.5 and 17 times faster than the CPU and GPU implementations, respectively. In evaluation of Prince block cipher homomorphically, we estimate the performance as 52 ms which is 66 times faster than the CPU implementation

    SPQCop: Side-channel protected Post-Quantum Cryptoprocessor

    Get PDF
    The past few decades have seen significant progress in practically realizable quantum technologies. It is well known since the work of Peter Shor that large scale quantum computers will threaten the security of most of the currently used public key cryptographic algorithms. This has spurred the cryptography community to design algorithms which will remain safe even with the emergence of large scale quantum computing systems. An effort in this direction is the currently ongoing post-quantum cryptography (PQC) competition, which has led to the design and analysis of many concrete cryptographic constructions. Among these, Lattice based algorithms have emerged to be promising candidates. Therefore, we focus on the efficient implementation of Ring-LWE based quantum-safe key-exchange algorithms. Further, deployment of hardware implementing such algorithms in critical applications requires security against implementation attacks. In this work, we design a side channel resistant post-quantum cryptoprocessor which supports NewHope-NIST, NewHope-USENIX and HILA5 key-exchange schemes. The implemented cryptoprocessor is highly optimized with minimal overhead due to the countermeasures. It requires about 13,500 LUTs and 8,100 FFs. Due to a significantly pipelined architecture, an operating speed of 406 MHz could be achieved on the latest 16nm FPGAs; resulting in a key-exchange time of only 158uS, 157uS and 148uS for the above mentioned designs respectively. We also present detailed area and performance metrics for different modules required for all the designs. To the best of our knowledge, this work presents the first side-channel leakage resistant post quantum accelerator. Furthermore, this is also the fastest hardware implementation of NewHope-NIST

    Diseño de IP CORES de cifrado aplicado a telecomunicaciones

    Get PDF
    En la actualidad, se tiene un uso masivo de las telecomunicaciones y la información que se transmite es en su mayoría sensible. Existen desarrollos de elementos que hacen que dicha información sea ilegible a la vista de terceros no autorizados, sin embargo, no son reconfigurables y no es posible realizarle mejoras que eviten riesgos de privacidad. Este documento comprende el desarollo de un IP Core de cifrado AES-128/256 implementado en un Dispositivo Lógico Programable, que puede ser parte de un Sistema de Telecomunicaciones. El cifrador AES se conforma de un IP Core que cifra los datos y un IP Core que recupera los datos originales. Éstos IP Cores se desarrollaron de forma que sean reconfigurables por medio del software del sistema embebido en el que están contenidos, así como reutilizables en otros posibles sistemas digitales con otras aplicaciones debido a que cuentan con un protocolo estándar llamado AXI4-Stream que les permite comunicarse con otros sistemas que utilicen el mismo protocolo. Primero, se realizó un estudio del estado de la cuestión de los últimos cuatro años, profundizando particularmente en algoritmos de cifrado sobre FPGAs. Seguido de la comprensión de los conceptos que giran alrededor de un cifrador AES y el estudio de los diferentes elementos que son necesarios para la implementación hardware del mismo. AES cifra bloques de 128-bit cada vez, y utiliza una misma clave de 128/192/256-bit para cifrar y para descifrar, por lo que recibe el nombre de cifrador simétrico. Dicho cifrado consiste en un número de rondas que se aplican al bloque de datos de entrada, y en la última ronda el bloque de datos resultante es el dato cifrado o también conocido como criptograma. El diseño de la arquitectura hardware del estándar de cifrado AES, se describió y se simuló en Verilog tanto para el IP Core de Cifrado como para el IP Core de Descifrado. Además, les fue añadido un protocolo de comuncicación denominado AXI4-Stream que les permite comunicarse con cualquier módulo hardware que cuente con la misma interfaz. La implementación del sistema fue realizado utilizando la tarjeta de desarrollo Zedboard cuyo elemento principal es el Zynq . El desarrollo constó de dos elementos principales. El primero, una plataforma de hardware en la que se incluyen los dos IP Cores. Y el segundo, una plataforma de software capaz de controlar las entradas de datos al sistema, por medio de una hiper terminal. Con lo que se pudo verificar el cifrado AES-128 y descifrado AES-128 (ambos AXI4-Stream) de bloques de 128-bit de datos. La verificación del funcionamiento de los bloques hardware diseñados, fue contrastada con los vectores de prueba diseñados para este efecto por el Instituto Nacional de Estándares y Tecnología (NIST) [1]
    corecore