125 research outputs found
Differentially Private Empirical Risk Minimization
Privacy-preserving machine learning algorithms are crucial for the
increasingly common setting in which personal data, such as medical or
financial records, are analyzed. We provide general techniques to produce
privacy-preserving approximations of classifiers learned via (regularized)
empirical risk minimization (ERM). These algorithms are private under the
-differential privacy definition due to Dwork et al. (2006). First we
apply the output perturbation ideas of Dwork et al. (2006), to ERM
classification. Then we propose a new method, objective perturbation, for
privacy-preserving machine learning algorithm design. This method entails
perturbing the objective function before optimizing over classifiers. If the
loss and regularizer satisfy certain convexity and differentiability criteria,
we prove theoretical results showing that our algorithms preserve privacy, and
provide generalization bounds for linear and nonlinear kernels. We further
present a privacy-preserving technique for tuning the parameters in general
machine learning algorithms, thereby providing end-to-end privacy guarantees
for the training process. We apply these results to produce privacy-preserving
analogues of regularized logistic regression and support vector machines. We
obtain encouraging results from evaluating their performance on real
demographic and benchmark data sets. Our results show that both theoretically
and empirically, objective perturbation is superior to the previous
state-of-the-art, output perturbation, in managing the inherent tradeoff
between privacy and learning performance.Comment: 40 pages, 7 figures, accepted to the Journal of Machine Learning
Researc
Near-Optimal Algorithms for Differentially-Private Principal Components
Principal components analysis (PCA) is a standard tool for identifying good
low-dimensional approximations to data in high dimension. Many data sets of
interest contain private or sensitive information about individuals. Algorithms
which operate on such data should be sensitive to the privacy risks in
publishing their outputs. Differential privacy is a framework for developing
tradeoffs between privacy and the utility of these outputs. In this paper we
investigate the theory and empirical performance of differentially private
approximations to PCA and propose a new method which explicitly optimizes the
utility of the output. We show that the sample complexity of the proposed
method differs from the existing procedure in the scaling with the data
dimension, and that our method is nearly optimal in terms of this scaling. We
furthermore illustrate our results, showing that on real data there is a large
performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of
Machine Learning Research, preliminary version was at NIPS 201
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond
We consider vertical logistic regression (VLR) trained with mini-batch
gradient descent -- a setting which has attracted growing interest among
industries and proven to be useful in a wide range of applications including
finance and medical research. We provide a comprehensive and rigorous privacy
analysis of VLR in a class of open-source Federated Learning frameworks, where
the protocols might differ between one another, yet a procedure of obtaining
local gradients is implicitly shared. We first consider the honest-but-curious
threat model, in which the detailed implementation of protocol is neglected and
only the shared procedure is assumed, which we abstract as an oracle. We find
that even under this general setting, single-dimension feature and label can
still be recovered from the other party under suitable constraints of batch
size, thus demonstrating the potential vulnerability of all frameworks
following the same philosophy. Then we look into a popular instantiation of the
protocol based on Homomorphic Encryption (HE). We propose an active attack that
significantly weaken the constraints on batch size in the previous analysis via
generating and compressing auxiliary ciphertext. To address the privacy leakage
within the HE-based protocol, we develop a simple-yet-effective countermeasure
based on Differential Privacy (DP), and provide both utility and privacy
guarantees for the updated algorithm. Finally, we empirically verify the
effectiveness of our attack and defense on benchmark datasets. Altogether, our
findings suggest that all vertical federated learning frameworks that solely
depend on HE might contain severe privacy risks, and DP, which has already
demonstrated its power in horizontal federated learning, can also play a
crucial role in the vertical setting, especially when coupled with HE or secure
multi-party computation (MPC) techniques
Prochlo: Strong Privacy for Analytics in the Crowd
The large-scale monitoring of computer users' software activities has become
commonplace, e.g., for application telemetry, error reporting, or demographic
profiling. This paper describes a principled systems architecture---Encode,
Shuffle, Analyze (ESA)---for performing such monitoring with high utility while
also protecting user privacy. The ESA design, and its Prochlo implementation,
are informed by our practical experiences with an existing, large deployment of
privacy-preserving software monitoring.
(cont.; see the paper
- …