Search CORE

132 research outputs found

Analysis of Parallel Montgomery Multiplication in CUDA

Author: Liu Yuheng
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2013
Field of study

For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

SJSU ScholarWorks

HLS-Based Methodology for Fast Iterative Development Applied to Elliptic Curve Arithmetic

Author: Bourge Alban
Leveugle Régis
Maistri Paolo
Muller Olivier
Pontie Simon
Prost-Boucle Adrien
Rousseau Frédéric
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

International audienceHigh-Level Synthesis (HLS) is used by hardware developers to achieve higher abstraction in circuit descriptions. In order to shorten the hardware development time via HLS, we present an adjustment of the Iterative and Incremental Design (IID) methodology, frequently used in software development. In particular, our methodology is relevant for the development of applications with unusual complexity: the method was applied here to the development of large modular arithmetic, commonly used for cryptography applications (e.g., Elliptic Curves). Rapid feedback on circuit characteristics is used to evaluate deep architectural changes in short time, greatly reducing the time-to-market with respect to hand-made designs. In addition, our approach is highly flexible, since the same generic high-level description can be used to produce an entire set of circuits, each with different area/performance trade-offs. Thanks to the proposed approach, any change to the initial specification (e.g., the curve used) is also very fast, while it may require a large effort in the case of hand-made designs

Hal - Université Grenoble Alpes

A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units

Author: Emmart Niall
Publication venue: ScholarWorks@UMass Amherst
Publication date: 21/03/2018
Field of study

Multiple precision (MP) arithmetic is a core building block of a wide variety of algorithms in computational mathematics and computer science. In mathematics MP is used in computational number theory, geometric computation, experimental mathematics, and in some random matrix problems. In computer science, MP arithmetic is primarily used in cryptographic algorithms: securing communications, digital signatures, and code breaking. In most of these application areas, the factor that limits performance is the MP arithmetic. The focus of our research is to build and analyze highly optimized libraries that allow the MP operations to be offloaded from the CPU to the GPU. Our goal is to achieve an order of magnitude improvement over the CPU in three key metrics: operations per second per socket, operations per watt, and operation per second per dollar. What we find is that the SIMD design and balance of compute, cache, and bandwidth resources on the GPU is quite different from the CPU, so libraries such as GMP cannot simply be ported to the GPU. New approaches and algorithms are required to achieve high performance and high utilization of GPU resources. Further, we find that low-level ISA differences between GPU generations means that an approach that works well on one generation might not run well on the next. Here we report on our progress towards MP arithmetic libraries on the GPU in four areas: (1) large integer addition, subtraction, and multiplication; (2) high performance modular multiplication and modular exponentiation (the key operations for cryptographic algorithms) across generations of GPUs; (3) high precision floating point addition, subtraction, multiplication, division, and square root; (4) parallel short division, which we prove is asymptotically optimal on EREW and CREW PRAMs