4 research outputs found
Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform
Companies increasingly expose machine learning (ML) models trained over
sensitive user data to untrusted domains, such as end-user devices and
wide-access model stores. We present Sage, a differentially private (DP) ML
platform that bounds the cumulative leakage of training data through models.
Sage builds upon the rich literature on DP ML algorithms and contributes
pragmatic solutions to two of the most pressing systems challenges of global
DP: running out of privacy budget and the privacy-utility tradeoff. To address
the former, we develop block composition, a new privacy loss accounting method
that leverages the growing database regime of ML workloads to keep training
models endlessly on a sensitive data stream while enforcing a global DP
guarantee for the stream. To address the latter, we develop privacy-adaptive
training, a process that trains a model on growing amounts of data and/or with
increasing privacy parameters until, with high probability, the model meets
developer-configured quality criteria. They illustrate how a systems focus on
characteristics of ML workloads enables pragmatic solutions that are not
apparent when one focuses on individual algorithms, as most DP ML literature
does.Comment: Extended version of a paper presented at the 27th ACM Symposium on
Operating Systems Principles (SOSP '19
Practical Privacy Filters and Odometers with R\'enyi Differential Privacy and Applications to Differentially Private Deep Learning
Differential Privacy (DP) is the leading approach to privacy preserving deep
learning. As such, there are multiple efforts to provide drop-in integration of
DP into popular frameworks. These efforts, which add noise to each gradient
computation to make it DP, rely on composition theorems to bound the total
privacy loss incurred over this sequence of DP computations.
However, existing composition theorems present a tension between efficiency
and flexibility. Most theorems require all computations in the sequence to have
a predefined DP parameter, called the privacy budget. This prevents the design
of training algorithms that adapt the privacy budget on the fly, or that
terminate early to reduce the total privacy loss. Alternatively, the few
existing composition results for adaptive privacy budgets provide complex
bounds on the privacy loss, with constants too large to be practical.
In this paper, we study DP composition under adaptive privacy budgets through
the lens of R\'enyi Differential Privacy, proving a simpler composition theorem
with smaller constants, making it practical enough to use in algorithm design.
We demonstrate two applications of this theorem for DP deep learning: adapting
the noise or batch size online to improve a model's accuracy within a fixed
total privacy loss, and stopping early when fine-tuning a model to reduce total
privacy loss
DP-Sync: Hiding Update Patterns in Secure Outsourced Databases with Differential Privacy
In this paper, we have introduced a new type of leakage associated with
modern encrypted databases called update pattern leakage. We formalize the
definition and security model of DP-Sync with DP update patterns. We also
proposed the framework DP-Sync, which extends existing encrypted database
schemes to DP-Sync with DP update patterns. DP-Sync guarantees that the entire
data update history over the outsourced data structure is protected by
differential privacy. This is achieved by imposing differentially-private
strategies that dictate the data owner's synchronization of local~data
Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation
Recently, a number of approaches and techniques have been introduced for
reporting software statistics with strong privacy guarantees. These range from
abstract algorithms to comprehensive systems with varying assumptions and built
upon local differential privacy mechanisms and anonymity. Based on the
Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified
large improvements in privacy guarantees without loss of utility by making
reports anonymous. However, these results either comprise of systems with
seemingly disparate mechanisms and attack models, or formal statements with
little guidance to practitioners. Addressing this, we provide a formal
treatment and offer prescriptive guidelines for privacy-preserving reporting
with anonymity. We revisit the ESA framework with a simple, abstract model of
attackers as well as assumptions covering it and other proposed systems of
anonymity. In light of new formal privacy bounds, we examine the limitations of
sketch-based encodings and ESA mechanisms such as data-dependent crowds. We
also demonstrate how the ESA notion of fragmentation (reporting data aspects in
separate, unlinkable messages) improves privacy/utility tradeoffs both in terms
of local and central differential-privacy guarantees. Finally, to help
practitioners understand the applicability and limitations of
privacy-preserving reporting, we report on a large number of empirical
experiments. We use real-world datasets with heavy-tailed or near-flat
distributions, which pose the greatest difficulty for our techniques; in
particular, we focus on data drawn from images that can be easily visualized in
a way that highlights reconstruction errors. Showing the promise of the
approach, and of independent interest, we also report on experiments using
anonymous, privacy-preserving reporting to train high-accuracy deep neural
networks on standard tasks---MNIST and CIFAR-10