5,975 research outputs found
Beating Randomized Response on Incoherent Matrices
Computing accurate low rank approximations of large matrices is a fundamental
data mining task. In many applications however the matrix contains sensitive
information about individuals. In such case we would like to release a low rank
approximation that satisfies a strong privacy guarantee such as differential
privacy. Unfortunately, to date the best known algorithm for this task that
satisfies differential privacy is based on naive input perturbation or
randomized response: Each entry of the matrix is perturbed independently by a
sufficiently large random noise variable, a low rank approximation is then
computed on the resulting matrix.
We give (the first) significant improvements in accuracy over randomized
response under the natural and necessary assumption that the matrix has low
coherence. Our algorithm is also very efficient and finds a constant rank
approximation of an m x n matrix in time O(mn). Note that even generating the
noise matrix required for randomized response already requires time O(mn)
Low-Rank Mechanism: Optimizing Batch Queries under Differential Privacy
Differential privacy is a promising privacy-preserving paradigm for
statistical query processing over sensitive data. It works by injecting random
noise into each query result, such that it is provably hard for the adversary
to infer the presence or absence of any individual record from the published
noisy results. The main objective in differentially private query processing is
to maximize the accuracy of the query results, while satisfying the privacy
guarantees. Previous work, notably the matrix mechanism, has suggested that
processing a batch of correlated queries as a whole can potentially achieve
considerable accuracy gains, compared to answering them individually. However,
as we point out in this paper, the matrix mechanism is mainly of theoretical
interest; in particular, several inherent problems in its design limit its
accuracy in practice, which almost never exceeds that of naive methods. In
fact, we are not aware of any existing solution that can effectively optimize a
query batch under differential privacy. Motivated by this, we propose the
Low-Rank Mechanism (LRM), the first practical differentially private technique
for answering batch queries with high accuracy, based on a low rank
approximation of the workload matrix. We prove that the accuracy provided by
LRM is close to the theoretical lower bound for any mechanism to answer a batch
of queries under differential privacy. Extensive experiments using real data
demonstrate that LRM consistently outperforms state-of-the-art query processing
solutions under differential privacy, by large margins.Comment: VLDB201
Wishart Mechanism for Differentially Private Principal Components Analysis
We propose a new input perturbation mechanism for publishing a covariance
matrix to achieve -differential privacy. Our mechanism uses a
Wishart distribution to generate matrix noise. In particular, We apply this
mechanism to principal component analysis. Our mechanism is able to keep the
positive semi-definiteness of the published covariance matrix. Thus, our
approach gives rise to a general publishing framework for input perturbation of
a symmetric positive semidefinite matrix. Moreover, compared with the classic
Laplace mechanism, our method has better utility guarantee. To the best of our
knowledge, Wishart mechanism is the best input perturbation approach for
-differentially private PCA. We also compare our work with
previous exponential mechanism algorithms in the literature and provide near
optimal bound while having more flexibility and less computational
intractability.Comment: A full version with technical proofs. Accepted to AAAI-1
Near-Optimal Algorithms for Differentially-Private Principal Components
Principal components analysis (PCA) is a standard tool for identifying good
low-dimensional approximations to data in high dimension. Many data sets of
interest contain private or sensitive information about individuals. Algorithms
which operate on such data should be sensitive to the privacy risks in
publishing their outputs. Differential privacy is a framework for developing
tradeoffs between privacy and the utility of these outputs. In this paper we
investigate the theory and empirical performance of differentially private
approximations to PCA and propose a new method which explicitly optimizes the
utility of the output. We show that the sample complexity of the proposed
method differs from the existing procedure in the scaling with the data
dimension, and that our method is nearly optimal in terms of this scaling. We
furthermore illustrate our results, showing that on real data there is a large
performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of
Machine Learning Research, preliminary version was at NIPS 201
The Geometry of Differential Privacy: the Sparse and Approximate Cases
In this work, we study trade-offs between accuracy and privacy in the context
of linear queries over histograms. This is a rich class of queries that
includes contingency tables and range queries, and has been a focus of a long
line of work. For a set of linear queries over a database , we
seek to find the differentially private mechanism that has the minimum mean
squared error. For pure differential privacy, an approximation to
the optimal mechanism is known. Our first contribution is to give an approximation guarantee for the case of (\eps,\delta)-differential
privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise
to the answers. We prove its approximation guarantee relative to the hereditary
discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex
geometry.
We next consider this question in the case when the number of queries exceeds
the number of individuals in the database, i.e. when . It is known that better mechanisms exist in this setting. Our second
main contribution is to give an (\eps,\delta)-differentially private
mechanism which is optimal up to a \polylog(d,N) factor for any given query
set and any given upper bound on . This approximation is
achieved by coupling the Gaussian noise addition approach with a linear
regression step. We give an analogous result for the \eps-differential
privacy setting. We also improve on the mean squared error upper bound for
answering counting queries on a database of size by Blum, Ligett, and Roth,
and match the lower bound implied by the work of Dinur and Nissim up to
logarithmic factors.
The connection between hereditary discrepancy and the privacy mechanism
enables us to derive the first polylogarithmic approximation to the hereditary
discrepancy of a matrix
- …