53 research outputs found

    An Improved Private Mechanism for Small Databases

    Full text link
    We study the problem of answering a workload of linear queries Q\mathcal{Q}, on a database of size at most n=o(Q)n = o(|\mathcal{Q}|) drawn from a universe U\mathcal{U} under the constraint of (approximate) differential privacy. Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for any given Q\mathcal{Q} and nn, answers the queries with average error that is at most a factor polynomial in logQ\log |\mathcal{Q}| and logU\log |\mathcal{U}| worse than the best possible. Here we improve on this guarantee and give a mechanism whose competitiveness ratio is at most polynomial in logn\log n and logU\log |\mathcal{U}|, and has no dependence on Q|\mathcal{Q}|. Our mechanism is based on the projection mechanism of Nikolov, Talwar, and Zhang, but in place of an ad-hoc noise distribution, we use a distribution which is in a sense optimal for the projection mechanism, and analyze it using convex duality and the restricted invertibility principle.Comment: To appear in ICALP 2015, Track

    Private Multiplicative Weights Beyond Linear Queries

    Full text link
    A wide variety of fundamental data analyses in machine learning, such as linear and logistic regression, require minimizing a convex function defined by the data. Since the data may contain sensitive information about individuals, and these analyses can leak that sensitive information, it is important to be able to solve convex minimization in a privacy-preserving way. A series of recent results show how to accurately solve a single convex minimization problem in a differentially private manner. However, the same data is often analyzed repeatedly, and little is known about solving multiple convex minimization problems with differential privacy. For simpler data analyses, such as linear queries, there are remarkable differentially private algorithms such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS 2010) that accurately answer exponentially many distinct queries. In this work, we extend these results to the case of convex minimization and show how to give accurate and differentially private solutions to *exponentially many* convex minimization problems on a sensitive dataset

    Tight Lower Bounds for Differentially Private Selection

    Full text link
    A pervasive task in the differential privacy literature is to select the kk items of "highest quality" out of a set of dd items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algorithms. The standard approaches to these tasks---repeated use of the exponential mechanism or the sparse vector technique---approximately solve this problem given a dataset of n=O(klogd)n = O(\sqrt{k}\log d) samples. We provide a tight lower bound for some very simple variants of the private selection problem. Our lower bound shows that a sample of size n=Ω(klogd)n = \Omega(\sqrt{k} \log d) is required even to achieve a very minimal accuracy guarantee. Our results are based on an extension of the fingerprinting method to sparse selection problems. Previously, the fingerprinting method has been used to provide tight lower bounds for answering an entire set of dd queries, but often only some much smaller set of kk queries are relevant. Our extension allows us to prove lower bounds that depend on both the number of relevant queries and the total number of queries

    Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods

    Get PDF
    This paper has been replaced with http://digitalcommons.ilr.cornell.edu/ldi/37. We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial

    Fast Private Data Release Algorithms for Sparse Queries

    Full text link
    We revisit the problem of accurately answering large classes of statistical queries while preserving differential privacy. Previous approaches to this problem have either been very general but have not had run-time polynomial in the size of the database, have applied only to very limited classes of queries, or have relaxed the notion of worst-case error guarantees. In this paper we consider the large class of sparse queries, which take non-zero values on only polynomially many universe elements. We give efficient query release algorithms for this class, in both the interactive and the non-interactive setting. Our algorithms also achieve better accuracy bounds than previous general techniques do when applied to sparse queries: our bounds are independent of the universe size. In fact, even the runtime of our interactive mechanism is independent of the universe size, and so can be implemented in the "infinite universe" model in which no finite universe need be specified by the data curator

    Distributed Private Heavy Hitters

    Full text link
    In this paper, we give efficient algorithms and lower bounds for solving the heavy hitters problem while preserving differential privacy in the fully distributed local model. In this model, there are n parties, each of which possesses a single element from a universe of size N. The heavy hitters problem is to find the identity of the most common element shared amongst the n parties. In the local model, there is no trusted database administrator, and so the algorithm must interact with each of the nn parties separately, using a differentially private protocol. We give tight information-theoretic upper and lower bounds on the accuracy to which this problem can be solved in the local model (giving a separation between the local model and the more common centralized model of privacy), as well as computationally efficient algorithms even in the case where the data universe N may be exponentially large

    Exploiting Metric Structure for Efficient Private Query Release

    Get PDF
    We consider the problem of privately answering queries defined on databases which are collections of points belonging to some metric space. We give simple, computationally efficient algorithms for answering distance queries defined over an arbitrary metric. Distance queries are specified by points in the metric space, and ask for the average distance from the query point to the points contained in the database, according to the specified metric. Our algorithms run efficiently in the database size and the dimension of the space, and operate in both the online query release setting, and the offline setting in which they must in polynomial time generate a fixed data structure which can answer all queries of interest. This represents one of the first subclasses of linear queries for which efficient algorithms are known for the private query release problem, circumventing known hardness results for generic linear queries
    corecore