10,683 research outputs found
Private Incremental Regression
Data is continuously generated by modern data sources, and a recent challenge
in machine learning has been to develop techniques that perform well in an
incremental (streaming) setting. In this paper, we investigate the problem of
private machine learning, where as common in practice, the data is not given at
once, but rather arrives incrementally over time.
We introduce the problems of private incremental ERM and private incremental
regression where the general goal is to always maintain a good empirical risk
minimizer for the history observed under differential privacy. Our first
contribution is a generic transformation of private batch ERM mechanisms into
private incremental ERM mechanisms, based on a simple idea of invoking the
private batch ERM procedure at some regular time intervals. We take this
construction as a baseline for comparison. We then provide two mechanisms for
the private incremental regression problem. Our first mechanism is based on
privately constructing a noisy incremental gradient function, which is then
used in a modified projected gradient procedure at every timestep. This
mechanism has an excess empirical risk of , where is the
dimensionality of the data. While from the results of [Bassily et al. 2014]
this bound is tight in the worst-case, we show that certain geometric
properties of the input and constraint set can be used to derive significantly
better results for certain interesting regression problems.Comment: To appear in PODS 201
Differentially Private Empirical Risk Minimization
Privacy-preserving machine learning algorithms are crucial for the
increasingly common setting in which personal data, such as medical or
financial records, are analyzed. We provide general techniques to produce
privacy-preserving approximations of classifiers learned via (regularized)
empirical risk minimization (ERM). These algorithms are private under the
-differential privacy definition due to Dwork et al. (2006). First we
apply the output perturbation ideas of Dwork et al. (2006), to ERM
classification. Then we propose a new method, objective perturbation, for
privacy-preserving machine learning algorithm design. This method entails
perturbing the objective function before optimizing over classifiers. If the
loss and regularizer satisfy certain convexity and differentiability criteria,
we prove theoretical results showing that our algorithms preserve privacy, and
provide generalization bounds for linear and nonlinear kernels. We further
present a privacy-preserving technique for tuning the parameters in general
machine learning algorithms, thereby providing end-to-end privacy guarantees
for the training process. We apply these results to produce privacy-preserving
analogues of regularized logistic regression and support vector machines. We
obtain encouraging results from evaluating their performance on real
demographic and benchmark data sets. Our results show that both theoretically
and empirically, objective perturbation is superior to the previous
state-of-the-art, output perturbation, in managing the inherent tradeoff
between privacy and learning performance.Comment: 40 pages, 7 figures, accepted to the Journal of Machine Learning
Researc
A Hybrid Approach to Privacy-Preserving Federated Learning
Federated learning facilitates the collaborative training of models without
the sharing of raw data. However, recent attacks demonstrate that simply
maintaining data locality during training processes does not provide sufficient
privacy guarantees. Rather, we need a federated learning system capable of
preventing inference over both the messages exchanged during training and the
final trained model while ensuring the resulting model also has acceptable
predictive accuracy. Existing federated learning approaches either use secure
multiparty computation (SMC) which is vulnerable to inference or differential
privacy which can lead to low accuracy given a large number of parties with
relatively small amounts of data each. In this paper, we present an alternative
approach that utilizes both differential privacy and SMC to balance these
trade-offs. Combining differential privacy with secure multiparty computation
enables us to reduce the growth of noise injection as the number of parties
increases without sacrificing privacy while maintaining a pre-defined rate of
trust. Our system is therefore a scalable approach that protects against
inference threats and produces models with high accuracy. Additionally, our
system can be used to train a variety of machine learning models, which we
validate with experimental results on 3 different machine learning algorithms.
Our experiments demonstrate that our approach out-performs state of the art
solutions
- …