7,461 research outputs found
A simple and practical algorithm for differentially private data release
We present new theoretical results on differentially private data release useful with respect to any target class of counting queries, coupled with experimental results on a variety of real world data sets.
Specifically, we study a simple combination of the multiplicative weights approach of [Hardt and Rothblum, 2010] with the exponential mechanism of [McSherry and Talwar, 2007]. The multiplicative weights framework allows us to maintain and improve a distribution approximating a given data set with respect to a set of counting queries. We use the exponential mechanism to select those queries most incorrectly tracked by the current distribution. Combing the two, we quickly approach a distribution that agrees with the data set on the given set of queries up to small error.
The resulting algorithm and its analysis is simple, but nevertheless improves upon previous work in terms of both error and running time. We also empirically demonstrate the practicality of our approach on several data sets commonly used in the statistical community for contingency table release
Differentially Private Model Selection with Penalized and Constrained Likelihood
In statistical disclosure control, the goal of data analysis is twofold: The
released information must provide accurate and useful statistics about the
underlying population of interest, while minimizing the potential for an
individual record to be identified. In recent years, the notion of differential
privacy has received much attention in theoretical computer science, machine
learning, and statistics. It provides a rigorous and strong notion of
protection for individuals' sensitive information. A fundamental question is
how to incorporate differential privacy into traditional statistical inference
procedures. In this paper we study model selection in multivariate linear
regression under the constraint of differential privacy. We show that model
selection procedures based on penalized least squares or likelihood can be made
differentially private by a combination of regularization and randomization,
and propose two algorithms to do so. We show that our private procedures are
consistent under essentially the same conditions as the corresponding
non-private procedures. We also find that under differential privacy, the
procedure becomes more sensitive to the tuning parameters. We illustrate and
evaluate our method using simulation studies and two real data examples
Differentially Private Data Releasing for Smooth Queries with Synthetic Database Output
We consider accurately answering smooth queries while preserving differential
privacy. A query is said to be -smooth if it is specified by a function
defined on whose partial derivatives up to order are all
bounded. We develop an -differentially private mechanism for the
class of -smooth queries. The major advantage of the algorithm is that it
outputs a synthetic database. In real applications, a synthetic database output
is appealing. Our mechanism achieves an accuracy of , and runs in polynomial time. We also
generalize the mechanism to preserve -differential privacy
with slightly improved accuracy. Extensive experiments on benchmark datasets
demonstrate that the mechanisms have good accuracy and are efficient
Private Multiplicative Weights Beyond Linear Queries
A wide variety of fundamental data analyses in machine learning, such as
linear and logistic regression, require minimizing a convex function defined by
the data. Since the data may contain sensitive information about individuals,
and these analyses can leak that sensitive information, it is important to be
able to solve convex minimization in a privacy-preserving way.
A series of recent results show how to accurately solve a single convex
minimization problem in a differentially private manner. However, the same data
is often analyzed repeatedly, and little is known about solving multiple convex
minimization problems with differential privacy. For simpler data analyses,
such as linear queries, there are remarkable differentially private algorithms
such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS
2010) that accurately answer exponentially many distinct queries. In this work,
we extend these results to the case of convex minimization and show how to give
accurate and differentially private solutions to *exponentially many* convex
minimization problems on a sensitive dataset
Prochlo: Strong Privacy for Analytics in the Crowd
The large-scale monitoring of computer users' software activities has become
commonplace, e.g., for application telemetry, error reporting, or demographic
profiling. This paper describes a principled systems architecture---Encode,
Shuffle, Analyze (ESA)---for performing such monitoring with high utility while
also protecting user privacy. The ESA design, and its Prochlo implementation,
are informed by our practical experiences with an existing, large deployment of
privacy-preserving software monitoring.
(cont.; see the paper
Learning Coverage Functions and Private Release of Marginals
We study the problem of approximating and learning coverage functions. A
function is a coverage function, if
there exists a universe with non-negative weights for each
and subsets of such that . Alternatively, coverage functions can be described
as non-negative linear combinations of monotone disjunctions. They are a
natural subclass of submodular functions and arise in a number of applications.
We give an algorithm that for any , given random and uniform
examples of an unknown coverage function , finds a function that
approximates within factor on all but -fraction of the
points in time . This is the first fully-polynomial
algorithm for learning an interesting class of functions in the demanding PMAC
model of Balcan and Harvey (2011). Our algorithms are based on several new
structural properties of coverage functions. Using the results in (Feldman and
Kothari, 2014), we also show that coverage functions are learnable agnostically
with excess -error over all product and symmetric
distributions in time . In contrast, we show that,
without assumptions on the distribution, learning coverage functions is at
least as hard as learning polynomial-size disjoint DNF formulas, a class of
functions for which the best known algorithm runs in time
(Klivans and Servedio, 2004).
As an application of our learning results, we give simple
differentially-private algorithms for releasing monotone conjunction counting
queries with low average error. In particular, for any , we obtain
private release of -way marginals with average error in time
- …