1,851 research outputs found
Fast Private Data Release Algorithms for Sparse Queries
We revisit the problem of accurately answering large classes of statistical
queries while preserving differential privacy. Previous approaches to this
problem have either been very general but have not had run-time polynomial in
the size of the database, have applied only to very limited classes of queries,
or have relaxed the notion of worst-case error guarantees. In this paper we
consider the large class of sparse queries, which take non-zero values on only
polynomially many universe elements. We give efficient query release algorithms
for this class, in both the interactive and the non-interactive setting. Our
algorithms also achieve better accuracy bounds than previous general techniques
do when applied to sparse queries: our bounds are independent of the universe
size. In fact, even the runtime of our interactive mechanism is independent of
the universe size, and so can be implemented in the "infinite universe" model
in which no finite universe need be specified by the data curator
Differential Privacy for the Analyst via Private Equilibrium Computation
We give new mechanisms for answering exponentially many queries from multiple
analysts on a private database, while protecting differential privacy both for
the individuals in the database and for the analysts. That is, our mechanism's
answer to each query is nearly insensitive to changes in the queries asked by
other analysts. Our mechanism is the first to offer differential privacy on the
joint distribution over analysts' answers, providing privacy for data analysts
even if the other data analysts collude or register multiple accounts. In some
settings, we are able to achieve nearly optimal error rates (even compared to
mechanisms which do not offer analyst privacy), and we are able to extend our
techniques to handle non-linear queries. Our analysis is based on a novel view
of the private query-release problem as a two-player zero-sum game, which may
be of independent interest
Private Multiplicative Weights Beyond Linear Queries
A wide variety of fundamental data analyses in machine learning, such as
linear and logistic regression, require minimizing a convex function defined by
the data. Since the data may contain sensitive information about individuals,
and these analyses can leak that sensitive information, it is important to be
able to solve convex minimization in a privacy-preserving way.
A series of recent results show how to accurately solve a single convex
minimization problem in a differentially private manner. However, the same data
is often analyzed repeatedly, and little is known about solving multiple convex
minimization problems with differential privacy. For simpler data analyses,
such as linear queries, there are remarkable differentially private algorithms
such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS
2010) that accurately answer exponentially many distinct queries. In this work,
we extend these results to the case of convex minimization and show how to give
accurate and differentially private solutions to *exponentially many* convex
minimization problems on a sensitive dataset
Privately Releasing Conjunctions and the Statistical Query Barrier
Suppose we would like to know all answers to a set of statistical queries C
on a data set up to small error, but we can only access the data itself using
statistical queries. A trivial solution is to exhaustively ask all queries in
C. Can we do any better?
+ We show that the number of statistical queries necessary and sufficient for
this task is---up to polynomial factors---equal to the agnostic learning
complexity of C in Kearns' statistical query (SQ) model. This gives a complete
answer to the question when running time is not a concern.
+ We then show that the problem can be solved efficiently (allowing arbitrary
error on a small fraction of queries) whenever the answers to C can be
described by a submodular function. This includes many natural concept classes,
such as graph cuts and Boolean disjunctions and conjunctions.
While interesting from a learning theoretic point of view, our main
applications are in privacy-preserving data analysis:
Here, our second result leads to the first algorithm that efficiently
releases differentially private answers to of all Boolean conjunctions with 1%
average error. This presents significant progress on a key open problem in
privacy-preserving data analysis.
Our first result on the other hand gives unconditional lower bounds on any
differentially private algorithm that admits a (potentially
non-privacy-preserving) implementation using only statistical queries. Not only
our algorithms, but also most known private algorithms can be implemented using
only statistical queries, and hence are constrained by these lower bounds. Our
result therefore isolates the complexity of agnostic learning in the SQ-model
as a new barrier in the design of differentially private algorithms
Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
This paper has been replaced with http://digitalcommons.ilr.cornell.edu/ldi/37.
We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial
Privacy-Preserving Public Information for Sequential Games
In settings with incomplete information, players can find it difficult to
coordinate to find states with good social welfare. For example, in financial
settings, if a collection of financial firms have limited information about
each other's strategies, some large number of them may choose the same
high-risk investment in hopes of high returns. While this might be acceptable
in some cases, the economy can be hurt badly if many firms make investments in
the same risky market segment and it fails. One reason why many firms might end
up choosing the same segment is that they do not have information about other
firms' investments (imperfect information may lead to `bad' game states).
Directly reporting all players' investments, however, raises confidentiality
concerns for both individuals and institutions.
In this paper, we explore whether information about the game-state can be
publicly announced in a manner that maintains the privacy of the actions of the
players, and still suffices to deter players from reaching bad game-states. We
show that in many games of interest, it is possible for players to avoid these
bad states with the help of privacy-preserving, publicly-announced information.
We model behavior of players in this imperfect information setting in two ways
-- greedy and undominated strategic behaviours, and we prove guarantees on
social welfare that certain kinds of privacy-preserving information can help
attain. Furthermore, we design a counter with improved privacy guarantees under
continual observation
- …