Using z14 Fused-Multiply-Add Instructions to Accelerate Elliptic Curve Cryptography

Abstract

Due to growing commercial applications like Blockchain, the performance of large-integer arithmetic is the focus of both academic and industrial research. IBM introduced a new integer fused multiply-add instruction in z14, called VMSL, to accelerate such workloads. Unlike their floating-point counterparts, there are a variety of integer fused multiply-add instruction designs. VMSL multiplies two pairs of radix 2562^{56} inputs, sums the two results together with an additional 128-bit input, and stores the resulting 128-bit value in a vector register. In this paper, we will describe the unique features of VMSL, the ways in which it is inherently more efficient than alternative specifications, in particular by enabling multiple carry strategies. We will then look at the issues we encountered implementing Montgomery Modular Multiplication for Elliptic Curve Cryptography on z14, including radix choice, mixed radices, instruction selection to trade instruction count for latency, and VMSL-specific optimizations for Montgomery-friendly moduli. The best choices resulted in a 20% increase in throughput

    Similar works