Search CORE

18 research outputs found

Analysis of Parallel Montgomery Multiplication in CUDA

Author: Liu Yuheng
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2013
Field of study

For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

SJSU ScholarWorks

Low-Latency Elliptic Curve Scalar Multiplication

Author: Bos Joppe
Publication venue
Publication date: 18/06/2018
Field of study

This paper presents a low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis. A graphics processing unit implementation using a standardized elliptic curve over a 224-bit prime field, complying with the new 112-bit security level, computes the scalar multiplication in 1.9ms on the NVIDIA GTX 500 architecture family. The presented methods and implementation considerations can be applied to any parallel 32-bit architectur

RERO DOC Digital Library

Supporting Preemptive Task Executions and Memory Copies in GPGPUs

Author: Basaran Can
Kang Kyoung-Don
Publication venue: The Open Repository @ Binghamton (The ORB)
Publication date: 01/07/2012
Field of study

GPGPUs (General Purpose Graphic Processing Units) provide massive computational power. However, applying GPGPU technology to real-time computing is challenging due to the non-preemptive nature of GPGPUs. Especially, a job running in a GPGPU or a data copy between a GPGPU and CPU is non-preemptive. As a result, a high priority job arriving in the middle of a low priority job execution or memory copy suffers from priority inversion. To address the problem, we present a new lightweight approach to supporting preemptive memory copies and job executions in GPGPUs. Moreover, in our approach, a GPGPU job and memory copy between a GPGPU and the hosting CPU are run concurrently to enhance the responsiveness. To show the feasibility of our approach, we have implemented a prototype system for preemptive job executions and data copies in a GPGPU. The experimental results show that our approach can bound the response times in a reliable manner. In addition, the response time of our approach is significantly shorter than those of the unmodified GPGPU runtime system that supports no preemption and an advanced GPGPU model designed to support prioritization and performance isolation via preemptive data copies

The Open Repository @Binghamton (The ORB)

Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

Author: Fangyu Zheng
Jiankuo Dong
Jingqiang Lin
Jiwu Jing
Wuqiong Pan
Yuan Zhao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Asymmetric cryptographic algorithm (e.g., RSA and Elliptic Curve Cryptography) implementations on Graphics Processing Units (GPUs) have been researched for over a decade. The basic idea of most previous contributions is exploiting the highly parallel GPU architecture and porting the integer-based algorithms from general-purpose CPUs to GPUs, to offer high performance. However, the great potential cryptographic computing power of GPUs, especially by the more powerful floating-point instructions, has not been comprehensively investigated in fact. In this paper, we fully exploit the floating-point computing power of GPUs, by various designs, including the floating-point-based Montgomery multiplication/exponentiation algorithm and Chinese Remainder Theorem (CRT) implementation in GPU. And for practical usage of the proposed algorithm, a new method is performed to convert the input/output between octet strings and floating-point numbers, fully utilizing GPUs and further promoting the overall performance by about 5%. The performance of RSA-2048/3072/4096 decryption on NVIDIA GeForce GTX TITAN reaches 42,211/12,151/5,790 operations per second, respectively, which achieves 13 times the performance of the previous fastest floating-point-based implementation (published in Eurocrypt 2009). The RSA-4096 decryption precedes the existing fastest integer-based result by 23%

Crossref

Directory of Open Access Journals

Selected RNS Bases for Modular Multiplication

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

A coprocessor for secure and high speed modular arithmetic

Author: Nicolas Guillermin
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 04/07/2011
Field of study

We present a coprocessor design for fast arithmetic over large numbers of cryptographic sizes. Our design provides a efficient way to prevent side channel analysis as well as fault analysis targeting modular arithmetic with large prime or composite numbers. These two countermeasure are then suitable both for Elliptic Curve Cryptography over prime fields or RSA using CRT or not. To do so, we use the residue number system (RNS) in an efficient manner to protect from leakage and fault, while keeping its ability to fast execute modular arithmetic with large numbers. We illustrate our countermeasure with a fully protected RSA-CRT implementation using our architecture, and show that it is possible to execute a secure 1024 bit RSA-CRT in less than 0:7 ms on a FPGA

Cryptology ePrint Archive

Faster Modular Arithmetic For Isogeny Based Crypto on Embedded Devices

Author: Joppe W. Bos
Simon J. Friedberger
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 01/09/2018
Field of study

We show how to implement the Montgomery reduction algorithm for isogeny based cryptography such that it can utilize the unsigned multiply accumulate accumulate long instruction present on modern ARM architectures. This results in a practical speed-up of a factor 1.34 compared to the approach used by SIKE: the supersingular isogeny based submission to the ongoing post-quantum standardization effort. Moreover, motivated by the recent work of Costello and Hisil (ASIACRYPT 2017), which shows that there is only a moderate degradation in performance when evaluating large odd degree isogenies, we search for more general supersingular isogeny friendly moduli. Using graphics processing units to accelerate this search we find many such moduli which allow for faster implementations on embedded devices. By combining these two approaches we manage to make the modular reduction 1.5 times as fast on a 32-bit ARM platform

Cryptology ePrint Archive

Cofactorization on Graphics Processing Units

Author: A. Moss
A.K. Lenstra
C. Pomerance
D. Loebenberger
D.A. Osvik
D.J. Bernstein
D.J. Bernstein
D.J. Bernstein
D.J. Bernstein
H. Hisil
H.M. Edwards
H.W. Lenstra Jr.
J. Gilger
J. Pelzl
J. Yang
J.M. Pollard
J.M. Pollard
J.W. Bos
J.W. Bos
J.W. Bos
K. Gaj
M.O. Rabin
O. Harrison
O. Harrison
P. Zimmermann
P.L. Montgomery
P.L. Montgomery
P.L. Montgomery
R. Szerwinski
R.P. Brent
S. Collange
T. Güneysu
T. Jebelean
T. Kleinjung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We show how the cofactorization step, a compute-intensive part of the relation collection phase of the number field sieve (NFS), can be farmed out to a graphics processing unit. Our implementation on a GTX 580 GPU, which is integrated with a state-of-the-art NFS implementation, can serve as a cryptanalytic co-processor for several Intel i7-3770K quad-core CPUs simultaneously. This allows those processors to focus on the memory-intensive sieving and results in more useful NFS-relations found in less time

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Cryptology ePrint Archive