32 research outputs found

    Composite Cyclotomic Fourier Transforms with Reduced Complexities

    Full text link
    Discrete Fourier transforms~(DFTs) over finite fields have widespread applications in digital communication and storage systems. Hence, reducing the computational complexities of DFTs is of great significance. Recently proposed cyclotomic fast Fourier transforms (CFFTs) are promising due to their low multiplicative complexities. Unfortunately, there are two issues with CFFTs: (1) they rely on efficient short cyclic convolution algorithms, which has not been investigated thoroughly yet, and (2) they have very high additive complexities when directly implemented. In this paper, we address both issues. One of the main contributions of this paper is efficient bilinear 11-point cyclic convolution algorithms, which allow us to construct CFFTs over GF(211)(2^{11}). The other main contribution of this paper is that we propose composite cyclotomic Fourier transforms (CCFTs). In comparison to previously proposed fast Fourier transforms, our CCFTs achieve lower overall complexities for moderate to long lengths, and the improvement significantly increases as the length grows. Our 2047-point and 4095-point CCFTs are also first efficient DFTs of such lengths to the best of our knowledge. Finally, our CCFTs are also advantageous for hardware implementations due to their regular and modular structure.Comment: submitted to IEEE trans on Signal Processin

    Reduced-Complexity Decoder of Long Reed-Solomon Codes Based on Composite Cyclotomic Fourier Transforms

    Full text link
    Long Reed-Solomon (RS) codes are desirable for digital communication and storage systems due to their improved error performance, but the high computational complexity of their decoders is a key obstacle to their adoption in practice. As discrete Fourier transforms (DFTs) can evaluate a polynomial at multiple points, efficient DFT algorithms are promising in reducing the computational complexities of syndrome based decoders for long RS codes. In this paper, we first propose partial composite cyclotomic Fourier transforms (CCFTs) and then devise syndrome based decoders for long RS codes over large finite fields based on partial CCFTs. The new decoders based on partial CCFTs achieve a significant saving of computational complexities for long RS codes. Since partial CCFTs have modular and regular structures, the new decoders are suitable for hardware implementations. To further verify and demonstrate the advantages of partial CCFTs, we implement in hardware the syndrome computation block for a (2720,2550)(2720, 2550) shortened RS code over GF(212)(2^{12}). In comparison to previous results based on Horner's rule, our hardware implementation not only has a smaller gate count, but also achieves much higher throughputs.Comment: 7 pages, 1 figur

    Faster polynomial multiplication over finite fields

    Full text link
    Let p be a prime, and let M_p(n) denote the bit complexity of multiplying two polynomials in F_p[X] of degree less than n. For n large compared to p, we establish the bound M_p(n) = O(n log n 8^(log^* n) log p), where log^* is the iterated logarithm. This is the first known F\"urer-type complexity bound for F_p[X], and improves on the previously best known bound M_p(n) = O(n log n log log n log p)

    Faster Amortized FHEW bootstrapping using Ring Automorphisms

    Get PDF
    Amortized bootstrapping offers a way to simultaneously refresh many ciphertexts of a fully homomorphic encryption scheme, at a total cost comparable to that of refreshing a single ciphertext. An amortization method for FHEW-style cryptosystems was first proposed by (Micciancio and Sorrell, ICALP 2018), who showed that the amortized cost of bootstrapping n FHEW-style ciphertexts can be reduced from O(n)O(n) basic cryptographic operations to just O(nϵ)O(n^{\epsilon}), for any constant ϵ>0\epsilon>0. However, despite the promising asymptotic saving, the algorithm was rather inpractical due to a large constant (exponential in 1/ϵ1/\epsilon) hidden in the asymptotic notation. In this work, we propose an alternative amortized boostrapping method with much smaller overhead, still achieving O(nϵ)O(n^\epsilon) asymptotic amortized cost, but with a hidden constant that is only linear in 1/ϵ1/\epsilon, and with reduced noise growth. This is achieved following the general strategy of (Micciancio and Sorrell), but replacing their use of the Nussbaumer transform, with a much more practical Number Theoretic Transform, with multiplication by twiddle factors implemented using ring automorphisms. A key technical ingredient to do this is a new scheme switching technique proposed in this paper which may be of independent interest

    Compressible FHE with Applications to PIR

    Get PDF
    Homomorphic encryption (HE) is often viewed as impractical, both in communication and computation. Here we provide an additively homomorphic encryption scheme based on (ring) LWE with nearly optimal rate (1ϵ1-\epsilon for any ϵ>0\epsilon>0). Moreover, we describe how to compress many FHE ciphertexts that may have come from a homomorphic evaluation (e.g., of the Gentry-Sahai-Waters (GSW) scheme), into fewer high-rate ciphertexts. Using our high-rate HE scheme, we are able for the first time to describe a single-server private information retrieval (PIR) scheme with sufficiently low computational overhead so as to be practical for large databases. Single-server PIR inherently requires the server to perform at least one bit operation per database bit, and we describe a rate-(4/9) scheme with computation which is not so much worse than this inherent lower bound. In fact it is probably faster than whole-database AES encryption -- specifically under 1.8 mod-qq multiplication per database byte, where qq is about 50 to 60 bits. Asymptotically, the computational overhead of our PIR scheme is \tilde{O}(\log \log \secparam + \log \log \log N), where \secparam is the security parameter and NN is the number of database files, which are assumed to be sufficiently large

    Fast norm computation in smooth-degree Abelian number fields

    Get PDF
    This paper presents a fast method to compute algebraic norms of integral elements of smooth-degree cyclotomic fields, and, more generally, smooth-degree Galois number fields with commutative Galois groups. The typical scenario arising in SS-unit searches (for, e.g., class-group computation) is computing a Θ(nlogn)\Theta(n\log n)-bit norm of an element of weight n1/2+o(1)n^{1/2+o(1)} in a degree-nn field; this method then uses n(logn)3+o(1)n(\log n)^{3+o(1)} bit operations. An n(logn)O(1)n(\log n)^{O(1)} operation count was already known in two easier special cases: norms from power-of-2 cyclotomic fields via towers of power-of-2 cyclotomic subfields, and norms from multiquadratic fields via towers of multiquadratic subfields. This paper handles more general Abelian fields by identifying tower-compatible integral bases supporting fast multiplication; in particular, there is a synergy between tower-compatible Gauss-period integral bases and a fast-multiplication idea from Rader. As a baseline, this paper also analyzes various standard norm-computation techniques that apply to arbitrary number fields, concluding that all of these techniques use at least n2(logn)2+o(1)n^2(\log n)^{2+o(1)} bit operations in the same scenario, even with fast subroutines for continued fractions and for complex FFTs. Compared to this baseline, algorithms dedicated to smooth-degree Abelian fields find each norm n/(logn)1+o(1)n/(\log n)^{1+o(1)} times faster, and finish norm computations inside SS-unit searches n2/(logn)1+o(1)n^2/(\log n)^{1+o(1)} times faster

    Homomorphic Evaluation of the AES Circuit

    Get PDF
    We describe a working implementation of leveled homomorphic encryption (with or without bootstrapping) that can evaluate the AES-128 circuit. This implementation is built on top of the HElib library, whose design was inspired by an early version of the current work. Our main implementation (without bootstrapping) takes about 4 minutes and 3GB of RAM, running on a small laptop, to evaluate an entire AES-128 encryption operation. Using SIMD techniques, we can process upto 120 blocks in each such evaluation, yielding an amortized rate of just over 2 seconds per block. For cases where further processing is needed after the AES computation, we describe a different setting that uses bootstrapping. We describe an implementation that lets us process 180 blocks in just over 18 minutes using 3.7GB of RAM on the same laptop, yielding amortized 6 seconds/block. We note that somewhat better amortized per-block cost can be obtained using byte-slicing (and maybe also bit-slicing ) implementations, at the cost of significantly slower wall-clock time for a single evaluation

    Type-II/III DCT/DST algorithms with reduced number of arithmetic operations

    Full text link
    We present algorithms for the discrete cosine transform (DCT) and discrete sine transform (DST), of types II and III, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is reduced from ~ 2N log_2 N to ~ (17/9) N log_2 N for a power-of-two transform size N. Furthermore, we show that a further N multiplications may be saved by a certain rescaling of the inputs or outputs, generalizing a well-known technique for N=8 by Arai et al. These results are derived by considering the DCT to be a special case of a DFT of length 4N, with certain symmetries, and then pruning redundant operations from a recent improved fast Fourier transform algorithm (based on a recursive rescaling of the conjugate-pair split radix algorithm). The improved algorithms for DCT-III, DST-II, and DST-III follow immediately from the improved count for the DCT-II.Comment: 9 page
    corecore