10 research outputs found

    Conversió de TFD's i convolucions circulars bidimensionals en unidimensionals

    Get PDF
    The paper describes an indexing scheme that allows the error-free computation of some two-dimensional operations as one-dimensional ones. In particular, the conditions which must be satisfied to compute DFT's and circular convolutions in this way are derived, together with the computational effort required. This makes it possible to take full advantage of some very fast linear devices, such as CCD's and SAW'S. A CCD implementation of circular convolutions which, by using the proposed indexing scheme, allows to save up to 50% shifts is given. Finally, the reordering of elements underlying each two-dimensional operation is characterized by studying the results of composing a direct and an inverse 2D-1 D mapping

    Factorization in Phase-Space Finite Geometry and Weak Mutually Unbiased Bases

    Get PDF
    A phase-space factorization of lines in finite geometry G(m) with variables in Zm and its correspondence in finite Hilbert space H(m) for m a non-prime was discussed. Using the method of Good [15], lines in G(m) were factorized as products of lines G(mi) where mi is a prime divisor of m. A lattice was formed between the non trivial sublines of G(m) and lines of G(mi) and between a subspace of H(m) and bases of H(mi) and existence of a link between lines in phase space finite geometry and bases in Hilbert space of finite quantum systems was discussed

    Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption

    Get PDF
    Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial multiplication. Existing implementations mostly focus on the case where the polynomials are defined over a power-of-two cyclotomic ring, allowing to make use of the simpler Cooley-Tukey NTT. However, generalized cyclotomics have several benefits in the BGV FHE scheme, including more SIMD plaintext slots and a simpler bootstrapping algorithm. We present a hardware architecture for the NTT targeting generalized cyclotomics within the context of the BGV FHE scheme. We explore different non-power-of-two NTT algorithms, including the Prime-Factor, Rader, and Bluestein NTTs. Our most efficient architecture targets the 21845-th cyclotomic polynomial --- a practical parameter for BGV --- with ideal properties for use with a combination of the Prime-Factor and Rader algorithms. The design achieves high throughput with optimized resource utilization, by leveraging parallel processing, pipelining, and reusing processing elements. Compared to Wu et al.\u27s VLSI architecture of the Bluestein NTT, our approach showcases 2×\times to 5×\times improved throughput and area efficiency. Simulation and implementation results on an AMD Alveo U250 FPGA demonstrate the feasibility of the proposed hardware design for FHE

    Multi-Parameter Support with NTTs for NTRU and NTRU Prime on Cortex-M4

    Get PDF
    We propose NTT implementations with each supporting at least one parameter of NTRU and one parameter of NTRU Prime. Our implementations are based on size-1440, size-1536, and size-1728 convolutions without algebraic assumptions on the target polynomial rings. We also propose several improvements for the NTT computation. Firstly, we introduce dedicated radix-(2, 3) butterflies combining Good–Thomas FFT and vector-radix FFT. In general, there are six dedicated radix-(2, 3) butterflies and they together support implicit permutations. Secondly, for odd prime radices, we show that the multiplications for one output can be replaced with additions/subtractions. We demonstrate the idea for radix-3 and show how to extend it to any odd prime. Our improvement also applies to radix-(2, 3) butterflies. Thirdly, we implement an incomplete version of Good–Thomas FFT for addressing potential code size issues. For NTRU, our polynomial multiplications outperform the state-of-the-art by 2.8%−10.3%. For NTRU Prime, our polynomial multiplications are slower than the state-of-the-art. However, the SotA exploits the specific structure of coefficient rings or polynomial moduli, while our NTT-based multiplications exploit neither and apply across different schemes. This reduces the engineering effort, including testing and verification

    SoK: Polynomial Multiplications for Lattice-Based Cryptosystems

    Get PDF
    We survey various mathematical tools used in software works multiplying polynomials in Zq[x]/⟨xn−αx−β⟩\mathbb{Z}_q [x] / ⟨x^n −αx−β⟩. In particular, we survey implementation works targeting polynomial multiplications in lattice-based cryptosystems Dilithium, Kyber, NTRU, NTRU Prime, and Saber with instruction set architectures/extensions Armv7-M, Armv7E-M, Armv8-A, and AVX2. There are three emphases in this paper: (i) modular arithmetic, (ii) homomorphisms, and (iii) vectorization. For modular arithmetic, we survey Montgomery, Barrett, and Plantard multiplications. For homomorphisms, we survey (a) various homomorphisms such as Cooley–Tukey FFT, Bruun’s FFT, Rader’s FFT, Karatsuba, and Toom– Cook; (b) various algebraic techniques for adjoining nice properties to the coefficient rings, including injections, Schönhage’s FFT, Nussbaumer’s FFT, and localization; and (c) various algebraic techniques related to the polynomial moduli, including twisting, composed multiplication, evaluation at ∞, Good–Thomas FFT, truncation, incomplete transformation, and Toeplitz matrix-vector product. For vectorization, we survey the relations between homomorphisms and the support of vector arithmetic. We then go through several case studies: We compare the implementations of modular multiplications used in Dilithium and Kyber, explain how the matrix-to-vector structure was exploited in Saber, and review the design choices of transformations for NTRU and NTRU Prime with vectorization. Finally, we outline several interesting implementation projects

    Number theoretic techniques applied to algorithms and architectures for digital signal processing

    Get PDF
    Many of the techniques for the computation of a two-dimensional convolution of a small fixed window with a picture are reviewed. It is demonstrated that Winograd's cyclic convolution and Fourier Transform Algorithms, together with Nussbaumer's two-dimensional cyclic convolution algorithms, have a common general form. Many of these algorithms use the theoretical minimum number of general multiplications. A novel implementation of these algorithms is proposed which is based upon one-bit systolic arrays. These systolic arrays are networks of identical cells with each cell sharing a common control and timing function. Each cell is only connected to its nearest neighbours. These are all attractive features for implementation using Very Large Scale Integration (VLSI). The throughput rate is only limited by the time to perform a one-bit full addition. In order to assess the usefulness to these systolic arrays a 'cost function' is developed to compare them with more conventional techniques, such as the Cooley-Tukey radix-2 Fast Fourier Transform (FFT). The cost function shows that these systolic arrays offer a good way of implementing the Discrete Fourier Transform for transforms up to about 30 points in length. The cost function is a general tool and allows comparisons to be made between different implementations of the same algorithm and between dissimilar algorithms. Finally a technique is developed for the derivation of Discrete Cosine Transform (DCT) algorithms from the Winograd Fourier Transform Algorithm. These DCT algorithms may be implemented by modified versions of the systolic arrays proposed earlier, but requiring half the number of cells
    corecore