Increasing privacy concerns have given rise to Private Inference (PI). In PI,
both the client's personal data and the service provider's trained model are
kept confidential. State-of-the-art PI protocols combine several cryptographic
primitives: Homomorphic Encryption (HE), Secret Sharing (SS), Garbled Circuits
(GC), and Oblivious Transfer (OT). Today, PI remains largely arcane and too
slow for practical use, despite the need and recent performance improvements.
This paper addresses PI's shortcomings with a detailed characterization of a
standard high-performance protocol to build foundational knowledge and
intuition in the systems community. The characterization pinpoints all sources
of inefficiency -- compute, communication, and storage. A notable aspect of
this work is the use of inference request arrival rates rather than studying
individual inferences in isolation. Prior to this work, and without considering
arrival rate, it has been assumed that PI pre-computations can be handled
offline and their overheads ignored. We show this is not the case. The offline
costs in PI are so high that they are often incurred online, as there is
insufficient downtime to hide pre-compute latency. We further propose three
optimizations to address the computation (layer-parallel HE), communication
(wireless slot allocation), and storage (Client-Garbler) overheads leveraging
insights from our characterization. Compared to the state-of-the-art PI
protocol, the optimizations provide a total PI speedup of 1.8×, with the
ability to sustain inference requests up to a 2.24× greater rate.Comment: 12 figure