Incorporating fully homomorphic encryption (FHE) into the inference process
of a convolutional neural network (CNN) draws enormous attention as a viable
approach for achieving private inference (PI). FHE allows delegating the entire
computation process to the server while ensuring the confidentiality of
sensitive client-side data. However, practical FHE implementation of a CNN
faces significant hurdles, primarily due to FHE's substantial computational and
memory overhead. To address these challenges, we propose a set of
optimizations, which includes GPU/ASIC acceleration, an efficient activation
function, and an optimized packing scheme. We evaluate our method using the
ResNet models on the CIFAR-10 and ImageNet datasets, achieving several orders
of magnitude improvement compared to prior work and reducing the latency of the
encrypted CNN inference to 1.4 seconds on an NVIDIA A100 GPU. We also show that
the latency drops to a mere 0.03 seconds with a custom hardware design.Comment: 3 pages, 1 figure, appears at DISCC 2023 (2nd Workshop on Data
Integrity and Secure Cloud Computing, in conjunction with the 56th
International Symposium on Microarchitecture (MICRO 2023)