5,975 research outputs found

    Beating Randomized Response on Incoherent Matrices

    Full text link
    Computing accurate low rank approximations of large matrices is a fundamental data mining task. In many applications however the matrix contains sensitive information about individuals. In such case we would like to release a low rank approximation that satisfies a strong privacy guarantee such as differential privacy. Unfortunately, to date the best known algorithm for this task that satisfies differential privacy is based on naive input perturbation or randomized response: Each entry of the matrix is perturbed independently by a sufficiently large random noise variable, a low rank approximation is then computed on the resulting matrix. We give (the first) significant improvements in accuracy over randomized response under the natural and necessary assumption that the matrix has low coherence. Our algorithm is also very efficient and finds a constant rank approximation of an m x n matrix in time O(mn). Note that even generating the noise matrix required for randomized response already requires time O(mn)

    Low-Rank Mechanism: Optimizing Batch Queries under Differential Privacy

    Full text link
    Differential privacy is a promising privacy-preserving paradigm for statistical query processing over sensitive data. It works by injecting random noise into each query result, such that it is provably hard for the adversary to infer the presence or absence of any individual record from the published noisy results. The main objective in differentially private query processing is to maximize the accuracy of the query results, while satisfying the privacy guarantees. Previous work, notably the matrix mechanism, has suggested that processing a batch of correlated queries as a whole can potentially achieve considerable accuracy gains, compared to answering them individually. However, as we point out in this paper, the matrix mechanism is mainly of theoretical interest; in particular, several inherent problems in its design limit its accuracy in practice, which almost never exceeds that of naive methods. In fact, we are not aware of any existing solution that can effectively optimize a query batch under differential privacy. Motivated by this, we propose the Low-Rank Mechanism (LRM), the first practical differentially private technique for answering batch queries with high accuracy, based on a low rank approximation of the workload matrix. We prove that the accuracy provided by LRM is close to the theoretical lower bound for any mechanism to answer a batch of queries under differential privacy. Extensive experiments using real data demonstrate that LRM consistently outperforms state-of-the-art query processing solutions under differential privacy, by large margins.Comment: VLDB201

    Wishart Mechanism for Differentially Private Principal Components Analysis

    Full text link
    We propose a new input perturbation mechanism for publishing a covariance matrix to achieve (ϵ,0)(\epsilon,0)-differential privacy. Our mechanism uses a Wishart distribution to generate matrix noise. In particular, We apply this mechanism to principal component analysis. Our mechanism is able to keep the positive semi-definiteness of the published covariance matrix. Thus, our approach gives rise to a general publishing framework for input perturbation of a symmetric positive semidefinite matrix. Moreover, compared with the classic Laplace mechanism, our method has better utility guarantee. To the best of our knowledge, Wishart mechanism is the best input perturbation approach for (ϵ,0)(\epsilon,0)-differentially private PCA. We also compare our work with previous exponential mechanism algorithms in the literature and provide near optimal bound while having more flexibility and less computational intractability.Comment: A full version with technical proofs. Accepted to AAAI-1

    Near-Optimal Algorithms for Differentially-Private Principal Components

    Full text link
    Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of Machine Learning Research, preliminary version was at NIPS 201

    The Geometry of Differential Privacy: the Sparse and Approximate Cases

    Full text link
    In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of dd linear queries over a database xRNx \in \R^N, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, an O(log2d)O(\log^2 d) approximation to the optimal mechanism is known. Our first contribution is to give an O(log2d)O(\log^2 d) approximation guarantee for the case of (\eps,\delta)-differential privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when d>nx1d > n \triangleq \|x\|_1. It is known that better mechanisms exist in this setting. Our second main contribution is to give an (\eps,\delta)-differentially private mechanism which is optimal up to a \polylog(d,N) factor for any given query set AA and any given upper bound nn on x1\|x\|_1. This approximation is achieved by coupling the Gaussian noise addition approach with a linear regression step. We give an analogous result for the \eps-differential privacy setting. We also improve on the mean squared error upper bound for answering counting queries on a database of size nn by Blum, Ligett, and Roth, and match the lower bound implied by the work of Dinur and Nissim up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix AA
    corecore