3,700 research outputs found
Private Incremental Regression
Data is continuously generated by modern data sources, and a recent challenge
in machine learning has been to develop techniques that perform well in an
incremental (streaming) setting. In this paper, we investigate the problem of
private machine learning, where as common in practice, the data is not given at
once, but rather arrives incrementally over time.
We introduce the problems of private incremental ERM and private incremental
regression where the general goal is to always maintain a good empirical risk
minimizer for the history observed under differential privacy. Our first
contribution is a generic transformation of private batch ERM mechanisms into
private incremental ERM mechanisms, based on a simple idea of invoking the
private batch ERM procedure at some regular time intervals. We take this
construction as a baseline for comparison. We then provide two mechanisms for
the private incremental regression problem. Our first mechanism is based on
privately constructing a noisy incremental gradient function, which is then
used in a modified projected gradient procedure at every timestep. This
mechanism has an excess empirical risk of , where is the
dimensionality of the data. While from the results of [Bassily et al. 2014]
this bound is tight in the worst-case, we show that certain geometric
properties of the input and constraint set can be used to derive significantly
better results for certain interesting regression problems.Comment: To appear in PODS 201
Differentially Private Empirical Risk Minimization with Sparsity-Inducing Norms
Differential privacy is concerned about the prediction quality while
measuring the privacy impact on individuals whose information is contained in
the data. We consider differentially private risk minimization problems with
regularizers that induce structured sparsity. These regularizers are known to
be convex but they are often non-differentiable. We analyze the standard
differentially private algorithms, such as output perturbation, Frank-Wolfe and
objective perturbation. Output perturbation is a differentially private
algorithm that is known to perform well for minimizing risks that are strongly
convex. Previous works have derived excess risk bounds that are independent of
the dimensionality. In this paper, we assume a particular class of convex but
non-smooth regularizers that induce structured sparsity and loss functions for
generalized linear models. We also consider differentially private Frank-Wolfe
algorithms to optimize the dual of the risk minimization problem. We derive
excess risk bounds for both these algorithms. Both the bounds depend on the
Gaussian width of the unit ball of the dual norm. We also show that objective
perturbation of the risk minimization problems is equivalent to the output
perturbation of a dual optimization problem. This is the first work that
analyzes the dual optimization problems of risk minimization problems in the
context of differential privacy
Differentially Private Empirical Risk Minimization
Privacy-preserving machine learning algorithms are crucial for the
increasingly common setting in which personal data, such as medical or
financial records, are analyzed. We provide general techniques to produce
privacy-preserving approximations of classifiers learned via (regularized)
empirical risk minimization (ERM). These algorithms are private under the
-differential privacy definition due to Dwork et al. (2006). First we
apply the output perturbation ideas of Dwork et al. (2006), to ERM
classification. Then we propose a new method, objective perturbation, for
privacy-preserving machine learning algorithm design. This method entails
perturbing the objective function before optimizing over classifiers. If the
loss and regularizer satisfy certain convexity and differentiability criteria,
we prove theoretical results showing that our algorithms preserve privacy, and
provide generalization bounds for linear and nonlinear kernels. We further
present a privacy-preserving technique for tuning the parameters in general
machine learning algorithms, thereby providing end-to-end privacy guarantees
for the training process. We apply these results to produce privacy-preserving
analogues of regularized logistic regression and support vector machines. We
obtain encouraging results from evaluating their performance on real
demographic and benchmark data sets. Our results show that both theoretically
and empirically, objective perturbation is superior to the previous
state-of-the-art, output perturbation, in managing the inherent tradeoff
between privacy and learning performance.Comment: 40 pages, 7 figures, accepted to the Journal of Machine Learning
Researc
Private Multiplicative Weights Beyond Linear Queries
A wide variety of fundamental data analyses in machine learning, such as
linear and logistic regression, require minimizing a convex function defined by
the data. Since the data may contain sensitive information about individuals,
and these analyses can leak that sensitive information, it is important to be
able to solve convex minimization in a privacy-preserving way.
A series of recent results show how to accurately solve a single convex
minimization problem in a differentially private manner. However, the same data
is often analyzed repeatedly, and little is known about solving multiple convex
minimization problems with differential privacy. For simpler data analyses,
such as linear queries, there are remarkable differentially private algorithms
such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS
2010) that accurately answer exponentially many distinct queries. In this work,
we extend these results to the case of convex minimization and show how to give
accurate and differentially private solutions to *exponentially many* convex
minimization problems on a sensitive dataset
- …