GPU-based Private Information Retrieval for On-Device Machine Learning
  Inference

Brooks, David; Gupta, Udit; Johnson, Jeff; Lai, Liangzhen; Lam, Maximilian; Lee, Hsien-Hsin S.; Leontiadis, Ilias; Li, Yang; Maeng, Kiwan; Reddi, Vijay Janapa; Rhu, Minsoo; Suh, G. Edward; Wei, Gu-Yeon; Xiong, Wenjie

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

Authors: David Brooks
Udit Gupta
Jeff Johnson
Liangzhen Lai
Maximilian Lam
Hsien-Hsin S. Lee
Ilias Leontiadis
Yang Li
Kiwan Maeng
Vijay Janapa Reddi
Minsoo Rhu
G. Edward Suh
Gu-Yeon Wei
Wenjie Xiong
Publication date: 25 September 2023
Publisher

Abstract

On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than

20 \times

over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over

5 \times

additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to

100,000

queries per second -- a

>100 \times

throughput improvement over a CPU-based baseline -- while maintaining model accuracy

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2301.10904

Last time updated on 26/02/2023