Search CORE

6 research outputs found

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

Author: Ajay Joshi David Kaeli
Gilbert Jonatan Evelio Mora
José L. Abellán Alexander Ingare
Kaustubh Shivdikar Yuhui Bao
Neal Livesay John Kim
Rashmi Agrawal Michael Shen
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2023
Field of study

Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data. This overhead is presently a major barrier to the commercial adoption of FHE. While prior efforts recommend moving to custom accelerators to accelerate FHE computing, these solutions lack cost-effectiveness and scalability. In this work, we leverage GPUs to accelerate FHE, capitalizing on a well-established GPU ecosystem that is available in the cloud. We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture. First, GME integrates a lightweight on-chip compute unit (CU)-side hierarchical interconnect to retain ciphertext in cache across FHE kernels, thus eliminating redundant memory transactions and improving performance. Second, to tackle compute bottlenecks, GME introduces special MOD-units that provide native custom hardware support for modular reduction operations, one of the most commonly executed sets of operations in FHE. Third, by integrating the MOD-unit with our novel pipelined 64-bit integer arithmetic cores (WMAC-units), GME further accelerates FHE workloads by 19%. Finally, we propose a Locality-Aware Block Scheduler (LABS) that improves FHE workload performance, exploiting the temporal locality available in FHE primitive blocks. Incorporating these microarchitectural features and compiler optimizations, we create a synergistic approach achieving average speedups of 796×, 14.2×, and 2.3× over Intel Xeon CPU, NVIDIA V100 GPU, and Xilinx FPGA implementations, respectively

DIGITUM Universidad de Murcia (España)

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

Author: Abellán José L.
Agrawal Rashmi
Bao Yuhui
Ingare Alexander
Jonatan Gilbert
Joshi Ajay
Kaeli David
Kim John
Livesay Neal
Mora Evelio
Shen Michael
Shivdikar Kaustubh
Publication venue
Publication date: 19/09/2023
Field of study

64

-bit integer arithmetic cores (WMAC-units), GME further accelerates FHE workloads by

19\%

. Finally, we propose a Locality-Aware Block Scheduler (LABS) that exploits the temporal locality available in FHE primitive blocks. Incorporating these microarchitectural features and compiler optimizations, we create a synergistic approach achieving average speedups of

796\times

14.2\times

, and

2.3\times

over Intel Xeon CPU, NVIDIA V100 GPU, and Xilinx FPGA implementations, respectively

arXiv.org e-Print Archive

Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs

Author: Abellán José L.
Agrawal Rashmi
Jonatan Gilbert
Joshi Ajay
Kaeli David
Kim John
Livesay Neal
Mora Evelio
Shivdikar Kaustubh
Publication venue: IEEE Computer Society
Publication date: 08/03/2023
Field of study

© 2023. This manuscript version is made available under the CC-BY4.0 license http://creativecommons.org/licenses/by /4.0/ This document is the accepted version of a Published Work that appeared in final form in IEEE Micro. To access the final edited and published work see https://doi.org/10.1109/MM.2023.3253052Fully Homomorphic Encryption (FHE) is a rapidly developing technology that enables computation directly on encrypted data, making it a compelling solution for security in cloud-based systems. In addition, modern FHE schemes are believed to be resistant to quantum attacks. Although FHE offers unprecedented potential for security, current implementations suffer from prohibitively high latency. Finite field arithmetic operations, particularly the multiplication of high-degree polynomials, are key computational bottlenecks. The parallel processing capabilities provided by modern Graphical Processing Units (GPUs) make them compelling candidates to target these highly parallelizable workloads. In this article, we discuss methods to accelerate polynomial multiplication with GPUs, with the goal of making FHE practical

DIGITUM Universidad de Murcia (España)