32 research outputs found
Composite Cyclotomic Fourier Transforms with Reduced Complexities
Discrete Fourier transforms~(DFTs) over finite fields have widespread
applications in digital communication and storage systems. Hence, reducing the
computational complexities of DFTs is of great significance. Recently proposed
cyclotomic fast Fourier transforms (CFFTs) are promising due to their low
multiplicative complexities. Unfortunately, there are two issues with CFFTs:
(1) they rely on efficient short cyclic convolution algorithms, which has not
been investigated thoroughly yet, and (2) they have very high additive
complexities when directly implemented. In this paper, we address both issues.
One of the main contributions of this paper is efficient bilinear 11-point
cyclic convolution algorithms, which allow us to construct CFFTs over
GF. The other main contribution of this paper is that we propose
composite cyclotomic Fourier transforms (CCFTs). In comparison to previously
proposed fast Fourier transforms, our CCFTs achieve lower overall complexities
for moderate to long lengths, and the improvement significantly increases as
the length grows. Our 2047-point and 4095-point CCFTs are also first efficient
DFTs of such lengths to the best of our knowledge. Finally, our CCFTs are also
advantageous for hardware implementations due to their regular and modular
structure.Comment: submitted to IEEE trans on Signal Processin
Reduced-Complexity Decoder of Long Reed-Solomon Codes Based on Composite Cyclotomic Fourier Transforms
Long Reed-Solomon (RS) codes are desirable for digital communication and
storage systems due to their improved error performance, but the high
computational complexity of their decoders is a key obstacle to their adoption
in practice. As discrete Fourier transforms (DFTs) can evaluate a polynomial at
multiple points, efficient DFT algorithms are promising in reducing the
computational complexities of syndrome based decoders for long RS codes. In
this paper, we first propose partial composite cyclotomic Fourier transforms
(CCFTs) and then devise syndrome based decoders for long RS codes over large
finite fields based on partial CCFTs. The new decoders based on partial CCFTs
achieve a significant saving of computational complexities for long RS codes.
Since partial CCFTs have modular and regular structures, the new decoders are
suitable for hardware implementations. To further verify and demonstrate the
advantages of partial CCFTs, we implement in hardware the syndrome computation
block for a shortened RS code over GF. In comparison
to previous results based on Horner's rule, our hardware implementation not
only has a smaller gate count, but also achieves much higher throughputs.Comment: 7 pages, 1 figur
Faster polynomial multiplication over finite fields
Let p be a prime, and let M_p(n) denote the bit complexity of multiplying two
polynomials in F_p[X] of degree less than n. For n large compared to p, we
establish the bound M_p(n) = O(n log n 8^(log^* n) log p), where log^* is the
iterated logarithm. This is the first known F\"urer-type complexity bound for
F_p[X], and improves on the previously best known bound M_p(n) = O(n log n log
log n log p)
Faster Amortized FHEW bootstrapping using Ring Automorphisms
Amortized bootstrapping offers a way to simultaneously refresh many ciphertexts of a fully homomorphic encryption scheme, at a total cost comparable to that of refreshing a single ciphertext. An amortization method for FHEW-style cryptosystems was first proposed by (Micciancio and Sorrell, ICALP 2018), who showed that the amortized cost of bootstrapping n FHEW-style ciphertexts can be reduced from basic cryptographic operations to just , for any constant . However, despite the promising asymptotic saving, the algorithm was rather inpractical due to a large constant (exponential in ) hidden in the asymptotic notation. In this work, we propose an alternative amortized boostrapping method with much smaller overhead, still achieving asymptotic amortized cost, but with a hidden constant that is only linear in , and with reduced noise growth. This is achieved following the general strategy of (Micciancio and Sorrell), but replacing their use of the Nussbaumer transform, with a much more practical Number Theoretic Transform, with multiplication by twiddle factors implemented using ring automorphisms. A key technical ingredient to do this is a new scheme switching technique proposed in this paper which may be of independent interest
Compressible FHE with Applications to PIR
Homomorphic encryption (HE) is often viewed as impractical, both in communication and computation. Here we provide an additively homomorphic encryption scheme based on (ring) LWE with nearly optimal rate ( for any ). Moreover, we describe how to compress many FHE ciphertexts that may have come from a homomorphic evaluation (e.g., of the Gentry-Sahai-Waters (GSW) scheme), into fewer high-rate ciphertexts.
Using our high-rate HE scheme, we are able for the first time to describe a single-server private information retrieval (PIR) scheme with sufficiently low computational overhead so as to be practical for large databases. Single-server PIR inherently requires the server to perform at least one bit operation per database bit, and we describe a rate-(4/9) scheme with computation which is not so much worse than this inherent lower bound. In fact it is probably faster than whole-database AES encryption -- specifically under 1.8 mod- multiplication per database byte, where is about 50 to 60 bits.
Asymptotically, the computational overhead of our PIR scheme is \tilde{O}(\log \log \secparam + \log \log \log N), where \secparam is the security parameter and is the number of database files, which are assumed to be sufficiently large
Fast norm computation in smooth-degree Abelian number fields
This paper presents a fast method to compute algebraic norms of integral elements of smooth-degree cyclotomic fields, and, more generally, smooth-degree Galois number fields with commutative Galois groups. The typical scenario arising in -unit searches (for, e.g., class-group computation) is computing a -bit norm of an element of weight in a degree- field; this method then uses bit operations.
An operation count was already known in two easier special cases: norms from power-of-2 cyclotomic fields via towers of power-of-2 cyclotomic subfields, and norms from multiquadratic fields via towers of multiquadratic subfields. This paper handles more general Abelian fields by identifying tower-compatible integral bases supporting fast multiplication; in particular, there is a synergy between tower-compatible Gauss-period integral bases and a fast-multiplication idea from Rader.
As a baseline, this paper also analyzes various standard norm-computation techniques that apply to arbitrary number fields, concluding that all of these techniques use at least bit operations in the same scenario, even with fast subroutines for continued fractions and for complex FFTs. Compared to this baseline, algorithms dedicated to smooth-degree Abelian fields find each norm times faster, and finish norm computations inside -unit searches times faster
Homomorphic Evaluation of the AES Circuit
We describe a working implementation of leveled homomorphic encryption (with or without bootstrapping) that can evaluate the AES-128 circuit. This implementation is built on top of the HElib library, whose design was inspired by an early version of the current work. Our main implementation (without bootstrapping) takes about 4 minutes and 3GB of RAM, running on a small laptop, to evaluate an entire AES-128 encryption operation. Using SIMD techniques, we can process upto 120 blocks in each such evaluation, yielding an amortized rate of just over 2 seconds per block.
For cases where further processing is needed after the AES computation, we describe a different setting that uses bootstrapping. We describe an implementation that lets us process 180 blocks in just over 18 minutes using 3.7GB of RAM on the same laptop, yielding amortized 6 seconds/block. We note that somewhat better amortized per-block cost can be obtained using byte-slicing (and maybe also bit-slicing ) implementations, at the cost of significantly slower wall-clock time for a single evaluation
Type-II/III DCT/DST algorithms with reduced number of arithmetic operations
We present algorithms for the discrete cosine transform (DCT) and discrete
sine transform (DST), of types II and III, that achieve a lower count of real
multiplications and additions than previously published algorithms, without
sacrificing numerical accuracy. Asymptotically, the operation count is reduced
from ~ 2N log_2 N to ~ (17/9) N log_2 N for a power-of-two transform size N.
Furthermore, we show that a further N multiplications may be saved by a certain
rescaling of the inputs or outputs, generalizing a well-known technique for N=8
by Arai et al. These results are derived by considering the DCT to be a special
case of a DFT of length 4N, with certain symmetries, and then pruning redundant
operations from a recent improved fast Fourier transform algorithm (based on a
recursive rescaling of the conjugate-pair split radix algorithm). The improved
algorithms for DCT-III, DST-II, and DST-III follow immediately from the
improved count for the DCT-II.Comment: 9 page