122 research outputs found

    Analysis of Parallel Montgomery Multiplication in CUDA

    Get PDF
    For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

    Low-Latency Elliptic Curve Scalar Multiplication

    Get PDF
    This paper presents a low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis. A graphics processing unit implementation using a standardized elliptic curve over a 224-bit prime field, complying with the new 112-bit security level, computes the scalar multiplication in 1.9ms on the NVIDIA GTX 500 architecture family. The presented methods and implementation considerations can be applied to any parallel 32-bit architectur

    Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

    Get PDF
    Asymmetric cryptographic algorithm (e.g., RSA and Elliptic Curve Cryptography) implementations on Graphics Processing Units (GPUs) have been researched for over a decade. The basic idea of most previous contributions is exploiting the highly parallel GPU architecture and porting the integer-based algorithms from general-purpose CPUs to GPUs, to offer high performance. However, the great potential cryptographic computing power of GPUs, especially by the more powerful floating-point instructions, has not been comprehensively investigated in fact. In this paper, we fully exploit the floating-point computing power of GPUs, by various designs, including the floating-point-based Montgomery multiplication/exponentiation algorithm and Chinese Remainder Theorem (CRT) implementation in GPU. And for practical usage of the proposed algorithm, a new method is performed to convert the input/output between octet strings and floating-point numbers, fully utilizing GPUs and further promoting the overall performance by about 5%. The performance of RSA-2048/3072/4096 decryption on NVIDIA GeForce GTX TITAN reaches 42,211/12,151/5,790 operations per second, respectively, which achieves 13 times the performance of the previous fastest floating-point-based implementation (published in Eurocrypt 2009). The RSA-4096 decryption precedes the existing fastest integer-based result by 23%

    GPU and ASIC Acceleration of Elliptic Curve Scalar Point Multiplication

    Get PDF
    As public information is increasingly communicated across public networks such as the internet, the use of public key cryptography to provide security services such as authentication, data integrity, and non-repudiation is ever-growing. Elliptic curve cryptography is being used now more than ever to fulfill the need for public key cryptography, as it provides security equivalent in strength to the entrenched RSA cryptography algorithm, but with much smaller key sizes and reduced computational cost. All elliptic curve cryptography operations rely on elliptic curve scalar point multiplication. In turn, scalar point multiplication depends heavily on finite field multiplication. In this dissertation, two major approaches are taken to accelerate the performance of scalar point multiplication. First, a series of very high performance finite field multiplier architectures have been implemented using domino logic in a CMOS process. Simulation results show that the proposed implementations are more efficient than similar designs in the literature when considering area and delay as performance metrics. The proposed implementations are suitable for integration with a CPU in order to provide a special-purpose finite field multiplication instruction useful for accelerating scalar point multiplication. The next major part of this thesis focuses on the use of consumer computer graphics cards to directly accelerate scalar point multiplication. A number of finite field multiplication algorithms suitable for graphics cards are developed, along with algorithms for finite field addition, subtraction, squaring, and inversion. The proposed graphics-card finite field arithmetic library is used to accelerate elliptic curve scalar point multiplication. The operation throughput and latency performance of the proposed implementation is characterized by a series of tests, and results are compared to the state of the art. Finally, it is shown that graphics cards can be used to significantly increase the operation throughput of scalar point multiplication operations, which makes their use viable for improving elliptic curve cryptography performance in a high-demand server environment

    On the Analysis of Public-Key Cryptologic Algorithms

    Get PDF
    The RSA cryptosystem introduced in 1977 by Ron Rivest, Adi Shamir and Len Adleman is the most commonly deployed public-key cryptosystem. Elliptic curve cryptography (ECC) introduced in the mid 80's by Neal Koblitz and Victor Miller is becoming an increasingly popular alternative to RSA offering competitive performance due the use of smaller key sizes. Most recently hyperelliptic curve cryptography (HECC) has been demonstrated to have comparable and in some cases better performance than ECC. The security of RSA relies on the integer factorization problem whereas the security of (H)ECC is based on the (hyper)elliptic curve discrete logarithm problem ((H)ECDLP). In this thesis the practical performance of the best methods to solve these problems is analyzed and a method to generate secure ephemeral ECC parameters is presented. The best publicly known algorithm to solve the integer factorization problem is the number field sieve (NFS). Its most time consuming step is the relation collection step. We investigate the use of graphics processing units (GPUs) as accelerators for this step. In this context, methods to efficiently implement modular arithmetic and several factoring algorithms on GPUs are presented and their performance is analyzed in practice. In conclusion, it is shown that integrating state-of-the-art NFS software packages with our GPU software can lead to a speed-up of 50%. In the case of elliptic and hyperelliptic curves for cryptographic use, the best published method to solve the (H)ECDLP is the Pollard rho algorithm. This method can be made faster using classes of equivalence induced by curve automorphisms like the negation map. We present a practical analysis of their use to speed up Pollard rho for elliptic curves and genus 2 hyperelliptic curves defined over prime fields. As a case study, 4 curves at the 128-bit theoretical security level are analyzed in our software framework for Pollard rho to estimate their practical security level. In addition, we present a novel many-core architecture to solve the ECDLP using the Pollard rho algorithm with the negation map on FPGAs. This architecture is used to estimate the cost of solving the Certicom ECCp-131 challenge with a cluster of FPGAs. Our design achieves a speed-up factor of about 4 compared to the state-of-the-art. Finally, we present an efficient method to generate unique, secure and unpredictable ephemeral ECC parameters to be shared by a pair of authenticated users for a single communication. It provides an alternative to the customary use of fixed ECC parameters obtained from publicly available standards designed by untrusted third parties. The effectiveness of our method is demonstrated with a portable implementation for regular PCs and Android smartphones. On a Samsung Galaxy S4 smartphone our implementation generates unique 128-bit secure ECC parameters in 50 milliseconds on average

    Utilizing graphics processing units in cryptographic applications.

    Get PDF
    Fleissner Sebastian.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 91-95).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- The Legend of Hercules --- p.1Chapter 1.2 --- Background --- p.2Chapter 1.3 --- Research Purpose --- p.2Chapter 1.4 --- Research Overview --- p.3Chapter 1.5 --- Thesis Organization --- p.4Chapter 2 --- Background and Definitions --- p.6Chapter 2.1 --- General Purpose GPU Computing --- p.6Chapter 2.1.1 --- Four Generations of GPU Hardware --- p.6Chapter 2.1.2 --- GPU Architecture & Terms --- p.7Chapter 2.1.3 --- General Purpose GPU Programming --- p.9Chapter 2.1.4 --- Shader Programming Languages --- p.12Chapter 2.2 --- Cryptography Overview --- p.13Chapter 2.2.1 --- "Alice, Bob, and Friends" --- p.14Chapter 2.2.2 --- Cryptographic Hash Functions --- p.14Chapter 2.2.3 --- Secret Key Ciphers --- p.15Chapter 2.2.4 --- Public Key Encryption --- p.16Chapter 2.2.5 --- Digital Signatures --- p.17Chapter 2.3 --- The Montgomery Method --- p.18Chapter 2.3.1 --- Pre-computation Step --- p.19Chapter 2.3.2 --- Obtaining the Montgomery Representation --- p.19Chapter 2.3.3 --- Calculating the Montgomery Product(s) --- p.19Chapter 2.3.4 --- Calculating final result --- p.20Chapter 2.3.5 --- The Montgomery Exponentiation Algorithm . . --- p.20Chapter 2.4 --- Elliptic Curve Cryptography --- p.21Chapter 2.4.1 --- Introduction --- p.21Chapter 2.4.2 --- Recommended Elliptic Curves --- p.22Chapter 2.4.3 --- Coordinate Systems --- p.23Chapter 2.4.4 --- Point Doubling --- p.23Chapter 2.4.5 --- Point Addition --- p.24Chapter 2.4.6 --- Double and Add --- p.25Chapter 2.4.7 --- Elliptic Curve Encryption --- p.26Chapter 2.5 --- Related Research --- p.28Chapter 2.5.1 --- Secret Key Cryptography on GPUs --- p.28Chapter 2.5.2 --- Remotely Keyed Cryptographics --- p.29Chapter 3 --- Proposed Algorithms --- p.30Chapter 3.1 --- Introduction --- p.30Chapter 3.2 --- Chapter Organization --- p.31Chapter 3.3 --- Algorithm Design Issues --- p.31Chapter 3.3.1 --- Arithmetic Density and GPU Memory Access . --- p.31Chapter 3.3.2 --- Encoding Large Integers with Floating Point Numbers --- p.33Chapter 3.4 --- GPU Montgomery Algorithms --- p.34Chapter 3.4.1 --- Introduction --- p.34Chapter 3.4.2 --- GPU-FlexM-Prod Specification --- p.37Chapter 3.4.3 --- GPU-FlexM-Mul Specification --- p.43Chapter 3.4.4 --- GPU-FlexM-Exp Specification --- p.45Chapter 3.4.5 --- GPU-FixM-Prod Specification --- p.46Chapter 3.4.6 --- GPU-FixM-Mul Specification --- p.50Chapter 3.4.7 --- GPU-FixM-Exp Specification --- p.52Chapter 3.5 --- GPU Elliptic Curve Algorithms --- p.54Chapter 3.5.1 --- GPU-EC-Double Specification --- p.55Chapter 3.5.2 --- GPU-EC-Add Specification --- p.59Chapter 3.5.3 --- GPU-EC-DoubleAdd Specification --- p.64Chapter 4 --- Analysis of Proposed Algorithms --- p.67Chapter 4.1 --- Performance Analysis --- p.67Chapter 4.1.1 --- GPU-FlexM Algorithms --- p.69Chapter 4.1.2 --- GPU-FixM Algorithms --- p.72Chapter 4.1.3 --- GPU-EC Algorithms --- p.77Chapter 4.1.4 --- Summary --- p.82Chapter 4.2 --- Usability of Proposed Algorithms --- p.83Chapter 4.2.1 --- Signcryption --- p.84Chapter 4.2.2 --- Pure Asymmetric Encryption and Decryption --- p.85Chapter 4.2.3 --- Simultaneous Signing of Multiple Messages --- p.86Chapter 4.2.4 --- Relieving the Main Processor --- p.87Chapter 5 --- Conclusions --- p.88Chapter 5.1 --- Research Results --- p.88Chapter 5.2 --- Future Research --- p.89Bibliography --- p.9

    On the Cryptanalysis of Public-Key Cryptography

    Get PDF
    Nowadays, the most popular public-key cryptosystems are based on either the integer factorization or the discrete logarithm problem. The feasibility of solving these mathematical problems in practice is studied and techniques are presented to speed-up the underlying arithmetic on parallel architectures. The fastest known approach to solve the discrete logarithm problem in groups of elliptic curves over finite fields is the Pollard rho method. The negation map can be used to speed up this calculation by a factor √2. It is well known that the random walks used by Pollard rho when combined with the negation map get trapped in fruitless cycles. We show that previously published approaches to deal with this problem are plagued by recurring cycles, and we propose effective alternative countermeasures. Furthermore, fast modular arithmetic is introduced which can take advantage of prime moduli of a special form using efficient "sloppy reduction." The effectiveness of these techniques is demonstrated by solving a 112-bit elliptic curve discrete logarithm problem using a cluster of PlayStation 3 game consoles: breaking a public-key standard and setting a new world record. The elliptic curve method (ECM) for integer factorization is the asymptotically fastest method to find relatively small factors of large integers. From a cryptanalytic point of view the performance of ECM gives information about secure parameter choices of some cryptographic protocols. We optimize ECM by proposing carry-free arithmetic modulo Mersenne numbers (numbers of the form 2M – 1) especially suitable for parallel architectures. Our implementation of these techniques on a cluster of PlayStation 3 game consoles set a new record by finding a 241-bit prime factor of 21181 – 1. A normal form for elliptic curves introduced by Edwards results in the fastest elliptic curve arithmetic in practice. Techniques to reduce the temporary storage and enhance the performance even further in the setting of ECM are presented. Our results enable one to run ECM efficiently on resource-constrained platforms such as graphics processing units
    • …
    corecore