4 research outputs found

    PaReNTT: Low-Latency Parallel Residue Number System and NTT-Based Long Polynomial Modular Multiplication for Homomorphic Encryption

    Full text link
    High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes four novel contributions. First, parallel NTT and iNTT architectures are proposed to reduce the number of clock cycles to process the polynomials. This can enable real-time processing for HE applications, as the number of clock cycles to process the polynomial is inversely proportional to the level of parallelism. Second, the proposed architecture eliminates the need for permuting the NTT outputs before their product is input to the iNTT. This reduces latency by n/4 clock cycles, where n is the length of the polynomial, and reduces buffer requirement by one delay-switch-delay circuit of size n. Third, an approach to select special moduli is presented where the moduli can be expressed in terms of a few signed power-of-two terms. Fourth, novel architectures for pre-processing for computing residual polynomials using the CRT and post-processing for combining the residual polynomials are proposed. These architectures significantly reduce the area consumption of the pre-processing and post-processing steps. The proposed long modular polynomial multiplications are ideal for applications that require low latency and high sample rate as these feed-forward architectures can be pipelined at arbitrary levels

    Efficient hardware arithmetic for inverted binary ring-LWE based post-quantum cryptography

    Get PDF
    Ring learning-with-errors (RLWE)-based encryption scheme is a lattice-based cryptographic algorithm that constitutes one of the most promising candidates for Post-Quantum Cryptography (PQC) standardization due to its efficient implementation and low computational complexity. Binary Ring-LWE (BRLWE) is a new optimized variant of RLWE, which achieves smaller computational complexity and higher efficient hardware implementations. In this paper, two efficient architectures based on Linear-Feedback Shift Register (LFSR) for the arithmetic used in Inverted Binary Ring-LWE (InvBRLWE)-based encryption scheme are presented, namely the operation of A center dot B+C over the polynomial ring Zq/(xn+1){Z}_q/(xn+1) . The first architecture optimizes the resource usage for major computation and has a novel input processing setup to speed up the overall processing latency with minimized input loading cycles. The second architecture deploys an innovative serial-in serial-out processing format to reduce the involved area usage further yet maintains a regular input loading time-complexity. Experimental results show that the architectures presented here improve the complexities obtained by competing schemes found in the literature, e.g., involving 71.23% less area-delay product than recent designs. Both architectures are highly efficient in terms of area-time complexities and can be extended for deploying in different lightweight application environments

    An Efficient and Parallel R-LWE Cryptoprocessor

    No full text

    An Efficient and Parallel R-LWE Cryptoprocessor

    No full text
    corecore