10 research outputs found
Conversió de TFD's i convolucions circulars bidimensionals en unidimensionals
The paper describes an indexing scheme that allows the error-free computation of some two-dimensional operations as one-dimensional ones. In particular, the conditions which must be satisfied to compute DFT's and circular convolutions in this way are derived, together with
the computational effort required. This makes it possible to take full advantage of some very fast linear devices, such as CCD's and SAW'S. A CCD implementation of circular convolutions which, by using the proposed indexing scheme, allows to save up to 50% shifts is given. Finally, the reordering of elements underlying each two-dimensional
operation is characterized by studying the results of composing a direct and an inverse 2D-1 D mapping
Factorization in Phase-Space Finite Geometry and Weak Mutually Unbiased Bases
A phase-space factorization of lines in finite geometry G(m) with variables in Zm and its correspondence in finite Hilbert space H(m) for m a non-prime was discussed. Using the method of Good [15], lines in G(m) were factorized as products of lines G(mi) where mi is a prime divisor of m. A lattice was formed between the non trivial sublines of G(m) and lines of G(mi) and between a subspace of H(m) and bases of H(mi) and existence of a link between lines in phase space finite geometry and bases in Hilbert space of finite quantum systems was discussed
Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial multiplication. Existing implementations mostly focus on the case where the polynomials are defined over a power-of-two cyclotomic ring, allowing to make use of the simpler Cooley-Tukey NTT. However, generalized cyclotomics have several benefits in the BGV FHE scheme, including more SIMD plaintext slots and a simpler bootstrapping algorithm.
We present a hardware architecture for the NTT targeting generalized cyclotomics within the context of the BGV FHE scheme. We explore different non-power-of-two NTT algorithms, including the Prime-Factor, Rader, and Bluestein NTTs. Our most efficient architecture targets the 21845-th cyclotomic polynomial --- a practical parameter for BGV --- with ideal properties for use with a combination of the Prime-Factor and Rader algorithms. The design achieves high throughput with optimized resource utilization, by leveraging parallel processing, pipelining, and reusing processing elements. Compared to Wu et al.\u27s VLSI architecture of the Bluestein NTT, our approach showcases 2 to 5 improved throughput and area efficiency. Simulation and implementation results on an AMD Alveo U250 FPGA demonstrate the feasibility of the proposed hardware design for FHE
Recommended from our members
An analytic representation of weak mutually unbiased bases
Quantum systems in the d-dimensional Hilbert space are considered. The mutually unbiased bases is a deep problem in this area. The problem of finding all mutually unbiased bases for higher (non-prime) dimension is still open. We derive an alternate approach to mutually unbiased bases by studying a weaker concept which we call weak mutually unbiased bases. We then compare three rather different structures. The first is weak mutually unbiased bases, for which the absolute value of the overlap of any two vectors in two different bases is 1/√k (where k∣d) or 0. The second is maximal lines through the origin in the Z(d) × Z(d) phase space. The third is an analytic representation in the complex plane based on Theta functions, and their zeros. The analytic representation of the weak mutually unbiased bases is defined with the zeros examined. It is shown that there is a correspondence (triality) that links strongly these three apparently different structures. We give an explicit breakdown of this triality
Recommended from our members
Fast Fourier transforms and fast Wigner and Weyl functions in large quantum systems
YesTwo methods for fast Fourier transforms are used in a quantum context. The first method is for systems with dimension of the Hilbert space
with d an odd integer, and is inspired by the Cooley-Tukey formalism. The ‘large Fourier transform’ is expressed as a sequence of n ‘small Fourier transforms’ (together with some other transforms) in quantum systems with d-dimensional Hilbert space. Limitations of the method are discussed. In some special cases, the n Fourier transforms can be performed in parallel. The second method is for systems with dimension of the Hilbert space
with
odd integers coprime to each other. It is inspired by the Good formalism, which in turn is based on the Chinese reminder theorem. In this case also the ‘large Fourier transform’ is expressed as a sequence of n ‘small Fourier transforms’ (that involve some constants related to the number theory that describes the formalism). The ‘small Fourier transforms’ can be performed in a classical computer or in a quantum computer (in which case we have the additional well known advantages of quantum Fourier transform circuits). In the case that the small Fourier transforms are performed with a classical computer, complexity arguments for both methods show the reduction in computational time from
to
. The second method is also used for the fast calculation of Wigner and Weyl functions, in quantum systems with large finite dimension of the Hilbert space
Recommended from our members
Unitarily inequivalent local and global Fourier transforms in multipartite quantum systems
YesA multipartite system comprised of n subsystems, each of which is described with
‘local variables’ in Z(d) and with a d-dimensional Hilbert space H(d), is considered.
Local Fourier transforms in each subsystem are defined and related phase space methods are discussed (displacement operators, Wigner and Weyl functions, etc). A holistic
view of the same system might be more appropriate in the case of strong interactions,
which uses ‘global variables’ in Z(dn) and a dn-dimensional Hilbert space H(dn).
A global Fourier transform is then defined and related phase space methods are discussed. The local formalism is compared and contrasted with the global formalism.
Depending on the values of d, n the local Fourier transform is unitarily inequivalent
or unitarily equivalent to the global Fourier transform. Time evolution of the system
in terms of both local and global variables, is discussed. The formalism can be useful
in the general area of Fast Fourier transforms
Multi-Parameter Support with NTTs for NTRU and NTRU Prime on Cortex-M4
We propose NTT implementations with each supporting at least one parameter of NTRU and one parameter of NTRU Prime. Our implementations are based on size-1440, size-1536, and size-1728 convolutions without algebraic assumptions on the target polynomial rings. We also propose several improvements for the NTT computation. Firstly, we introduce dedicated radix-(2, 3) butterflies combining Good–Thomas FFT and vector-radix FFT. In general, there are six dedicated radix-(2, 3) butterflies and they together support implicit permutations. Secondly, for odd prime radices, we show that the multiplications for one output can be replaced with additions/subtractions. We demonstrate the idea for radix-3 and show how to extend it to any odd prime. Our improvement also applies to radix-(2, 3) butterflies. Thirdly, we implement an incomplete version of Good–Thomas FFT for addressing potential code size issues. For NTRU, our polynomial multiplications outperform the state-of-the-art by 2.8%−10.3%. For NTRU Prime, our polynomial multiplications are slower than the state-of-the-art. However, the SotA exploits the specific structure of coefficient rings or polynomial moduli, while our NTT-based multiplications exploit neither and apply across different schemes. This reduces the engineering effort, including testing and verification
SoK: Polynomial Multiplications for Lattice-Based Cryptosystems
We survey various mathematical tools used in software works multiplying polynomials in . In particular, we survey implementation works targeting polynomial multiplications in lattice-based cryptosystems Dilithium, Kyber, NTRU, NTRU Prime, and Saber with instruction set architectures/extensions Armv7-M, Armv7E-M, Armv8-A, and AVX2.
There are three emphases in this paper: (i) modular arithmetic, (ii) homomorphisms, and (iii) vectorization. For modular arithmetic, we survey Montgomery, Barrett, and Plantard multiplications. For homomorphisms, we survey (a) various homomorphisms such as Cooley–Tukey FFT, Bruun’s FFT, Rader’s FFT, Karatsuba, and Toom– Cook; (b) various algebraic techniques for adjoining nice properties to the coefficient rings, including injections, Schönhage’s FFT, Nussbaumer’s FFT, and localization; and (c) various algebraic techniques related to the polynomial moduli, including twisting, composed multiplication, evaluation at ∞, Good–Thomas FFT, truncation, incomplete transformation, and Toeplitz matrix-vector product. For vectorization, we survey the relations between homomorphisms and the support of vector arithmetic. We then go through several case studies: We compare the implementations of modular multiplications used in Dilithium and Kyber, explain how the matrix-to-vector structure was exploited in Saber, and review the design choices of transformations for NTRU and NTRU Prime with vectorization. Finally, we outline several interesting implementation projects
Number theoretic techniques applied to algorithms and architectures for digital signal processing
Many of the techniques for the computation of a two-dimensional convolution of a small fixed window with a picture are reviewed. It is demonstrated that Winograd's cyclic convolution and Fourier Transform Algorithms, together with Nussbaumer's two-dimensional cyclic convolution algorithms, have a common general form. Many of these algorithms use the theoretical minimum number of general multiplications. A novel implementation of these algorithms is proposed which is based upon one-bit systolic arrays. These systolic arrays are networks of identical cells with each cell sharing a common control and timing function. Each cell is only connected to its nearest neighbours. These are all attractive features for implementation using Very Large Scale Integration (VLSI). The throughput rate is only limited by the time to perform a one-bit full addition. In order to assess the usefulness to these systolic arrays a 'cost function' is developed to compare them with more conventional techniques, such as the Cooley-Tukey radix-2 Fast Fourier Transform (FFT). The cost function shows that these systolic arrays offer a good way of implementing the Discrete Fourier Transform for transforms up to about 30 points in length. The cost function is a general tool and allows comparisons to be made between different implementations of the same algorithm and between dissimilar algorithms. Finally a technique is developed for the derivation of Discrete Cosine Transform (DCT) algorithms from the Winograd Fourier Transform Algorithm. These DCT algorithms may be implemented by modified versions of the systolic arrays proposed earlier, but requiring half the number of cells