1,110 research outputs found
Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy
Differential privacy is a promising privacy-preserving paradigm for
statistical query processing over sensitive data. It works by injecting random
noise into each query result, such that it is provably hard for the adversary
to infer the presence or absence of any individual record from the published
noisy results. The main objective in differentially private query processing is
to maximize the accuracy of the query results, while satisfying the privacy
guarantees. Previous work, notably \cite{LHR+10}, has suggested that with an
appropriate strategy, processing a batch of correlated queries as a whole
achieves considerably higher accuracy than answering them individually.
However, to our knowledge there is currently no practical solution to find such
a strategy for an arbitrary query batch; existing methods either return
strategies of poor quality (often worse than naive methods) or require
prohibitively expensive computations for even moderately large domains.
Motivated by this, we propose low-rank mechanism (LRM), the first practical
differentially private technique for answering batch linear queries with high
accuracy. LRM works for both exact (i.e., -) and approximate (i.e.,
(, )-) differential privacy definitions. We derive the
utility guarantees of LRM, and provide guidance on how to set the privacy
parameters given the user's utility expectation. Extensive experiments using
real data demonstrate that our proposed method consistently outperforms
state-of-the-art query processing solutions under differential privacy, by
large margins.Comment: ACM Transactions on Database Systems (ACM TODS). arXiv admin note:
text overlap with arXiv:1212.230
Convex Optimization for Linear Query Processing under Approximate Differential Privacy
Differential privacy enables organizations to collect accurate aggregates
over sensitive data with strong, rigorous guarantees on individuals' privacy.
Previous work has found that under differential privacy, computing multiple
correlated aggregates as a batch, using an appropriate \emph{strategy}, may
yield higher accuracy than computing each of them independently. However,
finding the best strategy that maximizes result accuracy is non-trivial, as it
involves solving a complex constrained optimization program that appears to be
non-linear and non-convex. Hence, in the past much effort has been devoted in
solving this non-convex optimization program. Existing approaches include
various sophisticated heuristics and expensive numerical solutions. None of
them, however, guarantees to find the optimal solution of this optimization
problem.
This paper points out that under (, )-differential privacy,
the optimal solution of the above constrained optimization problem in search of
a suitable strategy can be found, rather surprisingly, by solving a simple and
elegant convex optimization program. Then, we propose an efficient algorithm
based on Newton's method, which we prove to always converge to the optimal
solution with linear global convergence rate and quadratic local convergence
rate. Empirical evaluations demonstrate the accuracy and efficiency of the
proposed solution.Comment: to appear in ACM SIGKDD 201
Efficient Batch Query Answering Under Differential Privacy
Differential privacy is a rigorous privacy condition achieved by randomizing
query answers. This paper develops efficient algorithms for answering multiple
queries under differential privacy with low error. We pursue this goal by
advancing a recent approach called the matrix mechanism, which generalizes
standard differentially private mechanisms. This new mechanism works by first
answering a different set of queries (a strategy) and then inferring the
answers to the desired workload of queries. Although a few strategies are known
to work well on specific workloads, finding the strategy which minimizes error
on an arbitrary workload is intractable. We prove a new lower bound on the
optimal error of this mechanism, and we propose an efficient algorithm that
approaches this bound for a wide range of workloads.Comment: 6 figues, 22 page
An Adaptive Mechanism for Accurate Query Answering under Differential Privacy
We propose a novel mechanism for answering sets of count- ing queries under
differential privacy. Given a workload of counting queries, the mechanism
automatically selects a different set of "strategy" queries to answer
privately, using those answers to derive answers to the workload. The main
algorithm proposed in this paper approximates the optimal strategy for any
workload of linear counting queries. With no cost to the privacy guarantee, the
mechanism improves significantly on prior approaches and achieves near-optimal
error for many workloads, when applied under (\epsilon, \delta)-differential
privacy. The result is an adaptive mechanism which can help users achieve good
utility without requiring that they reason carefully about the best formulation
of their task.Comment: VLDB2012. arXiv admin note: substantial text overlap with
arXiv:1103.136
Low-Rank Mechanism: Optimizing Batch Queries under Differential Privacy
Differential privacy is a promising privacy-preserving paradigm for
statistical query processing over sensitive data. It works by injecting random
noise into each query result, such that it is provably hard for the adversary
to infer the presence or absence of any individual record from the published
noisy results. The main objective in differentially private query processing is
to maximize the accuracy of the query results, while satisfying the privacy
guarantees. Previous work, notably the matrix mechanism, has suggested that
processing a batch of correlated queries as a whole can potentially achieve
considerable accuracy gains, compared to answering them individually. However,
as we point out in this paper, the matrix mechanism is mainly of theoretical
interest; in particular, several inherent problems in its design limit its
accuracy in practice, which almost never exceeds that of naive methods. In
fact, we are not aware of any existing solution that can effectively optimize a
query batch under differential privacy. Motivated by this, we propose the
Low-Rank Mechanism (LRM), the first practical differentially private technique
for answering batch queries with high accuracy, based on a low rank
approximation of the workload matrix. We prove that the accuracy provided by
LRM is close to the theoretical lower bound for any mechanism to answer a batch
of queries under differential privacy. Extensive experiments using real data
demonstrate that LRM consistently outperforms state-of-the-art query processing
solutions under differential privacy, by large margins.Comment: VLDB201
Differentially Private Mixture of Generative Neural Networks
Generative models are used in a wide range of applications building on large
amounts of contextually rich information. Due to possible privacy violations of
the individuals whose data is used to train these models, however, publishing
or sharing generative models is not always viable. In this paper, we present a
novel technique for privately releasing generative models and entire
high-dimensional datasets produced by these models. We model the generator
distribution of the training data with a mixture of generative neural
networks. These are trained together and collectively learn the generator
distribution of a dataset. Data is divided into clusters, using a novel
differentially private kernel -means, then each cluster is given to separate
generative neural networks, such as Restricted Boltzmann Machines or
Variational Autoencoders, which are trained only on their own cluster using
differentially private gradient descent. We evaluate our approach using the
MNIST dataset, as well as call detail records and transit datasets, showing
that it produces realistic synthetic samples, which can also be used to
accurately compute arbitrary number of counting queries.Comment: A shorter version of this paper appeared at the 17th IEEE
International Conference on Data Mining (ICDM 2017). This is the full
version, published in IEEE Transactions on Knowledge and Data Engineering
(TKDE
MVG Mechanism: Differential Privacy under Matrix-Valued Query
Differential privacy mechanism design has traditionally been tailored for a
scalar-valued query function. Although many mechanisms such as the Laplace and
Gaussian mechanisms can be extended to a matrix-valued query function by adding
i.i.d. noise to each element of the matrix, this method is often suboptimal as
it forfeits an opportunity to exploit the structural characteristics typically
associated with matrix analysis. To address this challenge, we propose a novel
differential privacy mechanism called the Matrix-Variate Gaussian (MVG)
mechanism, which adds a matrix-valued noise drawn from a matrix-variate
Gaussian distribution, and we rigorously prove that the MVG mechanism preserves
-differential privacy. Furthermore, we introduce the concept
of directional noise made possible by the design of the MVG mechanism.
Directional noise allows the impact of the noise on the utility of the
matrix-valued query function to be moderated. Finally, we experimentally
demonstrate the performance of our mechanism using three matrix-valued queries
on three privacy-sensitive datasets. We find that the MVG mechanism notably
outperforms four previous state-of-the-art approaches, and provides comparable
utility to the non-private baseline.Comment: Appeared in CCS'1
The Geometry of Differential Privacy: the Sparse and Approximate Cases
In this work, we study trade-offs between accuracy and privacy in the context
of linear queries over histograms. This is a rich class of queries that
includes contingency tables and range queries, and has been a focus of a long
line of work. For a set of linear queries over a database , we
seek to find the differentially private mechanism that has the minimum mean
squared error. For pure differential privacy, an approximation to
the optimal mechanism is known. Our first contribution is to give an approximation guarantee for the case of (\eps,\delta)-differential
privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise
to the answers. We prove its approximation guarantee relative to the hereditary
discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex
geometry.
We next consider this question in the case when the number of queries exceeds
the number of individuals in the database, i.e. when . It is known that better mechanisms exist in this setting. Our second
main contribution is to give an (\eps,\delta)-differentially private
mechanism which is optimal up to a \polylog(d,N) factor for any given query
set and any given upper bound on . This approximation is
achieved by coupling the Gaussian noise addition approach with a linear
regression step. We give an analogous result for the \eps-differential
privacy setting. We also improve on the mean squared error upper bound for
answering counting queries on a database of size by Blum, Ligett, and Roth,
and match the lower bound implied by the work of Dinur and Nissim up to
logarithmic factors.
The connection between hereditary discrepancy and the privacy mechanism
enables us to derive the first polylogarithmic approximation to the hereditary
discrepancy of a matrix
- …