7,274 research outputs found
Analysis of Parallel Montgomery Multiplication in CUDA
For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
An Energy-Efficient Reconfigurable DTLS Cryptographic Engine for Securing Internet-of-Things Applications
This paper presents the first hardware implementation of the Datagram
Transport Layer Security (DTLS) protocol to enable end-to-end security for the
Internet of Things (IoT). A key component of this design is a reconfigurable
prime field elliptic curve cryptography (ECC) accelerator, which is 238x and 9x
more energy-efficient compared to software and state-of-the-art hardware
respectively. Our full hardware implementation of the DTLS 1.3 protocol
provides 438x improvement in energy-efficiency over software, along with code
size and data memory usage as low as 8 KB and 3 KB respectively. The
cryptographic accelerators are coupled with an on-chip low-power RISC-V
processor to benchmark applications beyond DTLS with up to two orders of
magnitude energy savings. The test chip, fabricated in 65 nm CMOS, demonstrates
hardware-accelerated DTLS sessions while consuming 44.08 uJ per handshake, and
0.89 nJ per byte of encrypted data at 16 MHz and 0.8 V.Comment: Published in IEEE Journal of Solid-State Circuits (JSSC
Optimization of Supersingular Isogeny Cryptography for Deeply Embedded Systems
Public-key cryptography in use today can be broken by a quantum computer with sufficient resources. Microsoft Research has published an open-source library of quantum-secure supersingular isogeny (SI) algorithms including Diffie-Hellman key agreement and key encapsulation in portable C and optimized x86 and x64 implementations. For our research, we modified this library to target a deeply-embedded processor with instruction set extensions and a finite-field coprocessor originally designed to accelerate traditional elliptic curve cryptography (ECC). We observed a 6.3-7.5x improvement over a portable C implementation using instruction set extensions and a further 6.0-6.1x improvement with the addition of the coprocessor. Modification of the coprocessor to a wider datapath further increased performance 2.6-2.9x. Our results show that current traditional ECC implementations can be easily refactored to use supersingular elliptic curve arithmetic and achieve post-quantum security
- …