26 research outputs found
DeepReShape: Redesigning Neural Networks for Efficient Private Inference
Prior work on Private Inference (PI)--inferences performed directly on
encrypted input--has focused on minimizing a network's ReLUs, which have been
assumed to dominate PI latency rather than FLOPs. Recent work has shown that
FLOPs for PI can no longer be ignored and have high latency penalties. In this
paper, we develop DeepReShape, a network redesign technique that tailors
architectures to PI constraints, optimizing for both ReLUs and FLOPs for the
first time. The {\em key insight} is that a strategic allocation of channels
such that the network's ReLUs are aligned in their criticality order
simultaneously optimizes ReLU and FLOPs efficiency. DeepReShape automates
network development with an efficient process, and we call generated networks
HybReNets. We evaluate DeepReShape using standard PI benchmarks and demonstrate
a 2.1\% accuracy gain with a 5.2 runtime improvement at iso-ReLU on
CIFAR-100 and an 8.7 runtime improvement at iso-accuracy on
TinyImageNet. Furthermore, we demystify the input network selection in prior
ReLU optimizations and shed light on the key network attributes enabling PI
efficiency.Comment: 37 pages, 23 Figures, and 17 Table
A Garbled Circuit Accelerator for Arbitrary, Fast Privacy-Preserving Computation
Privacy and security have rapidly emerged as priorities in system design. One
powerful solution for providing both is privacy-preserving computation, where
functions are computed directly on encrypted data and control can be provided
over how data is used. Garbled circuits (GCs) are a PPC technology that provide
both confidential computing and control over how data is used. The challenge is
that they incur significant performance overheads compared to plaintext. This
paper proposes a novel garbled circuit accelerator and compiler, named HAAC, to
mitigate performance overheads and make privacy-preserving computation more
practical. HAAC is a hardware-software co-design. GCs are exemplars of
co-design as programs are completely known at compile time, i.e., all
dependence, memory accesses, and control flow are fixed. The design philosophy
of HAAC is to keep hardware simple and efficient, maximizing area devoted to
our proposed custom execution units and other circuits essential for high
performance (e.g., on-chip storage). The compiler can leverage its program
understanding to realize hardware's performance potential by generating
effective instruction schedules, data layouts, and orchestrating off-chip
events. In taking this approach we can achieve ASIC performance/efficiency
without sacrificing generality. Insights of our approach include how co-design
enables expressing arbitrary GC programs as streams, which simplifies hardware
and enables complete memory-compute decoupling, and the development of a
scratchpad that captures data reuse by tracking program execution, eliminating
the need for costly hardware managed caches and tagging logic. We evaluate HAAC
with VIP-Bench and achieve a speedup of 608 in 4.3mm of area
Sphynx: ReLU-Efficient Network Design for Private Inference
The emergence of deep learning has been accompanied by privacy concerns
surrounding users' data and service providers' models. We focus on private
inference (PI), where the goal is to perform inference on a user's data sample
using a service provider's model. Existing PI methods for deep networks enable
cryptographically secure inference with little drop in functionality; however,
they incur severe latency costs, primarily caused by non-linear network
operations (such as ReLUs). This paper presents Sphynx, a ReLU-efficient
network design method based on micro-search strategies for convolutional cell
design. Sphynx achieves Pareto dominance over all existing private inference
methods on CIFAR-100. We also design large-scale networks that support
cryptographically private inference on Tiny-ImageNet and ImageNet
Weightless: Lossy Weight Encoding For Deep Neural Network Compression
The large memory requirements of deep neural networks limit their deployment
and adoption on many devices. Model compression methods effectively reduce the
memory requirements of these models, usually through applying transformations
such as weight pruning or quantization. In this paper, we present a novel
scheme for lossy weight encoding which complements conventional compression
techniques. The encoding is based on the Bloomier filter, a probabilistic data
structure that can save space at the cost of introducing random errors.
Leveraging the ability of neural networks to tolerate these imperfections and
by re-training around the errors, the proposed technique, Weightless, can
compress DNN weights by up to 496x with the same model accuracy. This results
in up to a 1.51x improvement over the state-of-the-art
Towards Fast and Scalable Private Inference
Privacy and security have rapidly emerged as first order design constraints.
Users now demand more protection over who can see their data (confidentiality)
as well as how it is used (control). Here, existing cryptographic techniques
for security fall short: they secure data when stored or communicated but must
decrypt it for computation. Fortunately, a new paradigm of computing exists,
which we refer to as privacy-preserving computation (PPC). Emerging PPC
technologies can be leveraged for secure outsourced computation or to enable
two parties to compute without revealing either users' secret data. Despite
their phenomenal potential to revolutionize user protection in the digital age,
the realization has been limited due to exorbitant computational,
communication, and storage overheads.
This paper reviews recent efforts on addressing various PPC overheads using
private inference (PI) in neural network as a motivating application. First,
the problem and various technologies, including homomorphic encryption (HE),
secret sharing (SS), garbled circuits (GCs), and oblivious transfer (OT), are
introduced. Next, a characterization of their overheads when used to implement
PI is covered. The characterization motivates the need for both GCs and HE
accelerators. Then two solutions are presented: HAAC for accelerating GCs and
RPU for accelerating HE. To conclude, results and effects are shown with a
discussion on what future work is needed to overcome the remaining overheads of
PI.Comment: Appear in the 20th ACM International Conference on Computing
Frontier
Characterizing and Optimizing End-to-End Systems for Private Inference
Increasing privacy concerns have given rise to Private Inference (PI). In PI,
both the client's personal data and the service provider's trained model are
kept confidential. State-of-the-art PI protocols combine several cryptographic
primitives: Homomorphic Encryption (HE), Secret Sharing (SS), Garbled Circuits
(GC), and Oblivious Transfer (OT). Today, PI remains largely arcane and too
slow for practical use, despite the need and recent performance improvements.
This paper addresses PI's shortcomings with a detailed characterization of a
standard high-performance protocol to build foundational knowledge and
intuition in the systems community. The characterization pinpoints all sources
of inefficiency -- compute, communication, and storage. A notable aspect of
this work is the use of inference request arrival rates rather than studying
individual inferences in isolation. Prior to this work, and without considering
arrival rate, it has been assumed that PI pre-computations can be handled
offline and their overheads ignored. We show this is not the case. The offline
costs in PI are so high that they are often incurred online, as there is
insufficient downtime to hide pre-compute latency. We further propose three
optimizations to address the computation (layer-parallel HE), communication
(wireless slot allocation), and storage (Client-Garbler) overheads leveraging
insights from our characterization. Compared to the state-of-the-art PI
protocol, the optimizations provide a total PI speedup of 1.8, with the
ability to sustain inference requests up to a 2.24 greater rate.Comment: 12 figure