27,012 research outputs found

    Analysis of Parallel Montgomery Multiplication in CUDA

    Get PDF
    For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

    Composite Cyclotomic Fourier Transforms with Reduced Complexities

    Full text link
    Discrete Fourier transforms~(DFTs) over finite fields have widespread applications in digital communication and storage systems. Hence, reducing the computational complexities of DFTs is of great significance. Recently proposed cyclotomic fast Fourier transforms (CFFTs) are promising due to their low multiplicative complexities. Unfortunately, there are two issues with CFFTs: (1) they rely on efficient short cyclic convolution algorithms, which has not been investigated thoroughly yet, and (2) they have very high additive complexities when directly implemented. In this paper, we address both issues. One of the main contributions of this paper is efficient bilinear 11-point cyclic convolution algorithms, which allow us to construct CFFTs over GF(211)(2^{11}). The other main contribution of this paper is that we propose composite cyclotomic Fourier transforms (CCFTs). In comparison to previously proposed fast Fourier transforms, our CCFTs achieve lower overall complexities for moderate to long lengths, and the improvement significantly increases as the length grows. Our 2047-point and 4095-point CCFTs are also first efficient DFTs of such lengths to the best of our knowledge. Finally, our CCFTs are also advantageous for hardware implementations due to their regular and modular structure.Comment: submitted to IEEE trans on Signal Processin
    • …
    corecore