23 research outputs found
Private Multiplicative Weights Beyond Linear Queries
A wide variety of fundamental data analyses in machine learning, such as
linear and logistic regression, require minimizing a convex function defined by
the data. Since the data may contain sensitive information about individuals,
and these analyses can leak that sensitive information, it is important to be
able to solve convex minimization in a privacy-preserving way.
A series of recent results show how to accurately solve a single convex
minimization problem in a differentially private manner. However, the same data
is often analyzed repeatedly, and little is known about solving multiple convex
minimization problems with differential privacy. For simpler data analyses,
such as linear queries, there are remarkable differentially private algorithms
such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS
2010) that accurately answer exponentially many distinct queries. In this work,
we extend these results to the case of convex minimization and show how to give
accurate and differentially private solutions to *exponentially many* convex
minimization problems on a sensitive dataset
Tight Lower Bounds for Differentially Private Selection
A pervasive task in the differential privacy literature is to select the
items of "highest quality" out of a set of items, where the quality of each
item depends on a sensitive dataset that must be protected. Variants of this
task arise naturally in fundamental problems like feature selection and
hypothesis testing, and also as subroutines for many sophisticated
differentially private algorithms.
The standard approaches to these tasks---repeated use of the exponential
mechanism or the sparse vector technique---approximately solve this problem
given a dataset of samples. We provide a tight lower
bound for some very simple variants of the private selection problem. Our lower
bound shows that a sample of size is required
even to achieve a very minimal accuracy guarantee.
Our results are based on an extension of the fingerprinting method to sparse
selection problems. Previously, the fingerprinting method has been used to
provide tight lower bounds for answering an entire set of queries, but
often only some much smaller set of queries are relevant. Our extension
allows us to prove lower bounds that depend on both the number of relevant
queries and the total number of queries
Private Incremental Regression
Data is continuously generated by modern data sources, and a recent challenge
in machine learning has been to develop techniques that perform well in an
incremental (streaming) setting. In this paper, we investigate the problem of
private machine learning, where as common in practice, the data is not given at
once, but rather arrives incrementally over time.
We introduce the problems of private incremental ERM and private incremental
regression where the general goal is to always maintain a good empirical risk
minimizer for the history observed under differential privacy. Our first
contribution is a generic transformation of private batch ERM mechanisms into
private incremental ERM mechanisms, based on a simple idea of invoking the
private batch ERM procedure at some regular time intervals. We take this
construction as a baseline for comparison. We then provide two mechanisms for
the private incremental regression problem. Our first mechanism is based on
privately constructing a noisy incremental gradient function, which is then
used in a modified projected gradient procedure at every timestep. This
mechanism has an excess empirical risk of , where is the
dimensionality of the data. While from the results of [Bassily et al. 2014]
this bound is tight in the worst-case, we show that certain geometric
properties of the input and constraint set can be used to derive significantly
better results for certain interesting regression problems.Comment: To appear in PODS 201