Search CORE

21 research outputs found

Efficient ECM Factorization in Parallel with the Lyness Map

Author: Hone Andrew N.W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/02/2020
Field of study

The Lyness map is a birational map in the plane which provides one of the simplest discrete analogues of a Hamiltonian system with one degree of freedom, having a conserved quantity and an invariant symplectic form. As an example of a symmetric Quispel-Roberts-Thompson (QRT) map, each generic orbit of the Lyness map lies on a curve of genus one, and corresponds to a sequence of points on an elliptic curve which is one of the fibres in a pencil of biquadratic curves in the plane. Here we present a version of the elliptic curve method (ECM) for integer factorization, which is based on iteration of the Lyness map with a particular choice of initial data. More precisely, we give an algorithm for scalar multiplication of a point on an arbitrary elliptic curve over Q, which is represented by one of the curves in the Lyness pencil. In order to avoid field inversion (I), and require only field multiplication (M), squaring (S) and addition, projective coordinates in P1 × P1 are used. Neglecting multiplication by curve constants (assumed small), each addition of the chosen point uses 2M, while each doubling step requires 15M. We further show that the doubling step can be implemented efficiently in parallel with four processors, dropping the effective cost to 4M. In contrast, the fastest algorithms in the literature use twisted Edwards curves (equivalent to Montgomery curves), which correspond to a subset of all elliptic curves. Scalar muliplication on twisted Edwards curves with suitable small curve constants uses 8M for point addition and 4M+4S for point doubling, both of which can be run in parallel with four processors to yield effective costs of 2M and 1M + 1S, respectively. Thus our scalar multiplication algorithm should require, on average, roughly twice as many multiplications per bit as state of the art methods using twisted Edwards curves. In our conclusions, we discuss applications where the use of Lyness curves may provide potential advantages

arXiv.org e-Print Archive

Crossref

Kent Academic Repository

Cryptology ePrint Archive

Low-Latency Elliptic Curve Scalar Multiplication

Author: Bos Joppe
Publication venue
Publication date: 18/06/2018
Field of study

This paper presents a low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis. A graphics processing unit implementation using a standardized elliptic curve over a 224-bit prime field, complying with the new 112-bit security level, computes the scalar multiplication in 1.9ms on the NVIDIA GTX 500 architecture family. The presented methods and implementation considerations can be applied to any parallel 32-bit architectur

RERO DOC Digital Library

Revisiting ECM on GPUs

Author: Christine Priplata
Colin Stahlke
Jan Richter-Brockmann
Jonas Wloka
Thorsten Kleinjung
Tim Güneysu
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 14/10/2020
Field of study

Modern public-key cryptography is a crucial part of our contemporary life where a secure communication channel with another party is needed. With the advance of more powerful computing architectures – especially Graphics Processing Units (GPUs) – traditional approaches like RSA and Diﬃe-Hellman schemes are more and more in danger of being broken. We present a highly optimized implementation of Lenstra’s ECM algorithm customized for GPUs. Our implementation uses state-of-the-art elliptic curve arithmetic and optimized integer arithmetic while providing the possibility of arbitrarily scaling ECM’s parameters allowing an application even for larger discrete logarithm problems. Furthermore, the proposed software is not limited to any speciﬁc GPU generation and is to the best of our knowledge the ﬁrst implementation supporting multiple device computation. To this end, for a bound of B1=8,192 and a modulus size of 192 bit, we achieve a throughput of 214 thousand ECM trials per second on a modern RTX 2080 Ti GPU considering only the ﬁrst stage of ECM. To solve the Discrete Logarithm Problem for larger bit sizes, our software can easily support larger parameter sets such that a throughput of 2,781 ECM trials per second is achieved using B1=50,000, B2=5,000,000, and a modulus size of 448 bit

Cryptology ePrint Archive

High-Performance Modular Multiplication on the Cell Processor

Author: Bos Joppe Willem
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/12/2011
Field of study

This paper presents software implementation speed records for modular multiplication arithmetic on the synergistic processing elements of the Cell broadband engine (Cell) architecture. The focus is on moduli which are of special interest in elliptic curve cryptography, that is, moduli of bit-lengths ranging from 192- to 521-bit. Finite field arithmetic using primes which allow particularly fast reduction is compared to Montgomery multiplication. The special primes considered are the five recommended NIST primes, as specified in the FIPS 186-3 standard, and the prime used in the elliptic curve curve25519. While presented and benchmarked on the Cell architecture, the proposed techniques to efficiently implement the modular multiplication algorithms are suited to run on any architecture which is able to compute multiple computations concurrently; e.g. graphics processing units

Infoscience - École polytechnique fédérale de Lausanne

Fast GPGPU-Based Elliptic Curve Scalar Multiplication

Author: Eric M. Mahé
Jean-Marie Chauvet
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 17/03/2014
Field of study

This paper presents a fast implementation to compute the scalar multiplication of elliptic curve points based on a ``General-Purpose computing on Graphics Processing Units\u27\u27 (GPGPU) approach. A GPU implementation using Dan Bernstein\u27s Curve25519, an elliptic curve over a 255-bit prime field complying with the new 128-bit security level, computes the scalar multiplication in less than a microsecond on AMD\u27s R9 290X GPU. The presented methods and implementation considerations can be applied to any parallel architecture

CiteSeerX

Cryptology ePrint Archive

Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

Author: Fangyu Zheng
Jiankuo Dong
Jingqiang Lin
Jiwu Jing
Wuqiong Pan
Yuan Zhao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Asymmetric cryptographic algorithm (e.g., RSA and Elliptic Curve Cryptography) implementations on Graphics Processing Units (GPUs) have been researched for over a decade. The basic idea of most previous contributions is exploiting the highly parallel GPU architecture and porting the integer-based algorithms from general-purpose CPUs to GPUs, to offer high performance. However, the great potential cryptographic computing power of GPUs, especially by the more powerful floating-point instructions, has not been comprehensively investigated in fact. In this paper, we fully exploit the floating-point computing power of GPUs, by various designs, including the floating-point-based Montgomery multiplication/exponentiation algorithm and Chinese Remainder Theorem (CRT) implementation in GPU. And for practical usage of the proposed algorithm, a new method is performed to convert the input/output between octet strings and floating-point numbers, fully utilizing GPUs and further promoting the overall performance by about 5%. The performance of RSA-2048/3072/4096 decryption on NVIDIA GeForce GTX TITAN reaches 42,211/12,151/5,790 operations per second, respectively, which achieves 13 times the performance of the previous fastest floating-point-based implementation (published in Eurocrypt 2009). The RSA-4096 decryption precedes the existing fastest integer-based result by 23%

Crossref

Directory of Open Access Journals

Modular SIMD arithmetic in Mathemagix

Author: Lecerf Grégoire
Quintin Guillaume
van der Hoeven Joris
Publication venue
Publication date: 29/06/2014
Field of study

Modular integer arithmetic occurs in many algorithms for computer algebra, cryptography, and error correcting codes. Although recent microprocessors typically offer a wide range of highly optimized arithmetic functions, modular integer operations still require dedicated implementations. In this article, we survey existing algorithms for modular integer arithmetic, and present detailed vectorized counterparts. We also present several applications, such as fast modular Fourier transforms and multiplication of integer polynomials and matrices. The vectorized algorithms have been implemented in C++ inside the free computer algebra and analysis system Mathemagix. The performance of our implementation is illustrated by various benchmarks

arXiv.org e-Print Archive

HAL-UNILIM

HAL-Polytechnique

Faster Modular Arithmetic For Isogeny Based Crypto on Embedded Devices

Author: Joppe W. Bos
Simon J. Friedberger
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 01/09/2018
Field of study

We show how to implement the Montgomery reduction algorithm for isogeny based cryptography such that it can utilize the unsigned multiply accumulate accumulate long instruction present on modern ARM architectures. This results in a practical speed-up of a factor 1.34 compared to the approach used by SIKE: the supersingular isogeny based submission to the ongoing post-quantum standardization effort. Moreover, motivated by the recent work of Costello and Hisil (ASIACRYPT 2017), which shows that there is only a moderate degradation in performance when evaluating large odd degree isogenies, we search for more general supersingular isogeny friendly moduli. Using graphics processing units to accelerate this search we find many such moduli which allow for faster implementations on embedded devices. By combining these two approaches we manage to make the modular reduction 1.5 times as fast on a 32-bit ARM platform

Cryptology ePrint Archive

ECM at Work

Author: Joppe W. Bos
Thorsten Kleinjung
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 07/09/2012
Field of study

The performance of the elliptic curve method (ECM) for integer factorization plays an important role in the security assessment of RSA-based protocols as a cofactorization tool inside the number field sieve. The efficient arithmetic for Edwards curves found an application by speeding up ECM. We propose techniques based on generating and combining addition-subtracting chains to optimize Edwards ECM in terms of both performance and memory requirements. This makes our approach very suitable for memory-constrained devices such as graphics processing units (GPU). For commonly used ECM parameters we are able to lower the required memory up to a factor 55 compared to the state-of-the-art Edwards ECM approach. Our ECM implementation on a GTX 580 GPU sets a new throughput record, outperforming the best GPU, CPU and FPGA results reported in literature

Cryptology ePrint Archive