Search CORE

5,975 research outputs found

Beating Randomized Response on Incoherent Matrices

Author: Hardt Moritz
Roth Aaron
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/11/2011
Field of study

Computing accurate low rank approximations of large matrices is a fundamental data mining task. In many applications however the matrix contains sensitive information about individuals. In such case we would like to release a low rank approximation that satisfies a strong privacy guarantee such as differential privacy. Unfortunately, to date the best known algorithm for this task that satisfies differential privacy is based on naive input perturbation or randomized response: Each entry of the matrix is perturbed independently by a sufficiently large random noise variable, a low rank approximation is then computed on the resulting matrix. We give (the first) significant improvements in accuracy over randomized response under the natural and necessary assumption that the matrix has low coherence. Our algorithm is also very efficient and finds a constant rank approximation of an m x n matrix in time O(mn). Note that even generating the noise matrix required for randomized response already requires time O(mn)

arXiv.org e-Print Archive

CiteSeerX

Low-Rank Mechanism: Optimizing Batch Queries under Differential Privacy

Author: Hao Zhifeng
Winslett Marianne
Xiao Xiaokui
Yang Yin
Yuan Ganzhao
Zhang Zhenjie
Publication venue
Publication date: 01/01/2012
Field of study

Differential privacy is a promising privacy-preserving paradigm for statistical query processing over sensitive data. It works by injecting random noise into each query result, such that it is provably hard for the adversary to infer the presence or absence of any individual record from the published noisy results. The main objective in differentially private query processing is to maximize the accuracy of the query results, while satisfying the privacy guarantees. Previous work, notably the matrix mechanism, has suggested that processing a batch of correlated queries as a whole can potentially achieve considerable accuracy gains, compared to answering them individually. However, as we point out in this paper, the matrix mechanism is mainly of theoretical interest; in particular, several inherent problems in its design limit its accuracy in practice, which almost never exceeds that of naive methods. In fact, we are not aware of any existing solution that can effectively optimize a query batch under differential privacy. Motivated by this, we propose the Low-Rank Mechanism (LRM), the first practical differentially private technique for answering batch queries with high accuracy, based on a low rank approximation of the workload matrix. We prove that the accuracy provided by LRM is close to the theoretical lower bound for any mechanism to answer a batch of queries under differential privacy. Extensive experiments using real data demonstrate that LRM consistently outperforms state-of-the-art query processing solutions under differential privacy, by large margins.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Wishart Mechanism for Differentially Private Principal Components Analysis

Author: Jiang Wuxuan
Xie Cong
Zhang Zhihua
Publication venue
Publication date: 19/11/2015
Field of study

We propose a new input perturbation mechanism for publishing a covariance matrix to achieve

(\epsilon,0)

-differential privacy. Our mechanism uses a Wishart distribution to generate matrix noise. In particular, We apply this mechanism to principal component analysis. Our mechanism is able to keep the positive semi-definiteness of the published covariance matrix. Thus, our approach gives rise to a general publishing framework for input perturbation of a symmetric positive semidefinite matrix. Moreover, compared with the classic Laplace mechanism, our method has better utility guarantee. To the best of our knowledge, Wishart mechanism is the best input perturbation approach for

(\epsilon,0)

-differentially private PCA. We also compare our work with previous exponential mechanism algorithms in the literature and provide near optimal bound while having more flexibility and less computational intractability.Comment: A full version with technical proofs. Accepted to AAAI-1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Near-Optimal Algorithms for Differentially-Private Principal Components

Author: Chaudhuri Kamalika
Sarwate Anand D.
Sinha Kaushik
Publication venue
Publication date: 07/08/2013
Field of study

Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of Machine Learning Research, preliminary version was at NIPS 201

arXiv.org e-Print Archive

CiteSeerX

The Geometry of Differential Privacy: the Sparse and Approximate Cases

Author: Nikolov Aleksandar
Talwar Kunal
Zhang Li
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/12/2012
Field of study

In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of

d

linear queries over a database

x \in \R^N

, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, an

O(\log^2 d)

approximation to the optimal mechanism is known. Our first contribution is to give an

O(\log^2 d)

approximation guarantee for the case of (\eps,\delta)-differential privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when

d > n \triangleq \|x\|_1

. It is known that better mechanisms exist in this setting. Our second main contribution is to give an (\eps,\delta)-differentially private mechanism which is optimal up to a \polylog(d,N) factor for any given query set

A

and any given upper bound

n

\|x\|_1

. This approximation is achieved by coupling the Gaussian noise addition approach with a linear regression step. We give an analogous result for the \eps-differential privacy setting. We also improve on the mean squared error upper bound for answering counting queries on a database of size

n

by Blum, Ligett, and Roth, and match the lower bound implied by the work of Dinur and Nissim up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix

A

arXiv.org e-Print Archive