127,404 research outputs found

    An Improved Private Mechanism for Small Databases

    Full text link
    We study the problem of answering a workload of linear queries Q\mathcal{Q}, on a database of size at most n=o(Q)n = o(|\mathcal{Q}|) drawn from a universe U\mathcal{U} under the constraint of (approximate) differential privacy. Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for any given Q\mathcal{Q} and nn, answers the queries with average error that is at most a factor polynomial in logQ\log |\mathcal{Q}| and logU\log |\mathcal{U}| worse than the best possible. Here we improve on this guarantee and give a mechanism whose competitiveness ratio is at most polynomial in logn\log n and logU\log |\mathcal{U}|, and has no dependence on Q|\mathcal{Q}|. Our mechanism is based on the projection mechanism of Nikolov, Talwar, and Zhang, but in place of an ad-hoc noise distribution, we use a distribution which is in a sense optimal for the projection mechanism, and analyze it using convex duality and the restricted invertibility principle.Comment: To appear in ICALP 2015, Track

    The Geometry of Differential Privacy: the Sparse and Approximate Cases

    Full text link
    In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of dd linear queries over a database xRNx \in \R^N, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, an O(log2d)O(\log^2 d) approximation to the optimal mechanism is known. Our first contribution is to give an O(log2d)O(\log^2 d) approximation guarantee for the case of (\eps,\delta)-differential privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when d>nx1d > n \triangleq \|x\|_1. It is known that better mechanisms exist in this setting. Our second main contribution is to give an (\eps,\delta)-differentially private mechanism which is optimal up to a \polylog(d,N) factor for any given query set AA and any given upper bound nn on x1\|x\|_1. This approximation is achieved by coupling the Gaussian noise addition approach with a linear regression step. We give an analogous result for the \eps-differential privacy setting. We also improve on the mean squared error upper bound for answering counting queries on a database of size nn by Blum, Ligett, and Roth, and match the lower bound implied by the work of Dinur and Nissim up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix AA

    Differentially Private Release and Learning of Threshold Functions

    Full text link
    We prove new upper and lower bounds on the sample complexity of (ϵ,δ)(\epsilon, \delta) differentially private algorithms for releasing approximate answers to threshold functions. A threshold function cxc_x over a totally ordered domain XX evaluates to cx(y)=1c_x(y) = 1 if yxy \le x, and evaluates to 00 otherwise. We give the first nontrivial lower bound for releasing thresholds with (ϵ,δ)(\epsilon,\delta) differential privacy, showing that the task is impossible over an infinite domain XX, and moreover requires sample complexity nΩ(logX)n \ge \Omega(\log^*|X|), which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with n2(1+o(1))logXn \le 2^{(1+ o(1))\log^*|X|} samples. This improves the previous best upper bound of 8(1+o(1))logX8^{(1 + o(1))\log^*|X|} (Beimel et al., RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with (ϵ,δ)(\epsilon,\delta) differential privacy and learning without privacy. For properly learning thresholds in \ell dimensions, this lower bound extends to nΩ(logX)n \ge \Omega(\ell \cdot \log^*|X|). To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database DD of elements from XX, the interior point problem asks for an element between the smallest and largest elements in DD. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.Comment: 43 page

    Fast Private Data Release Algorithms for Sparse Queries

    Full text link
    We revisit the problem of accurately answering large classes of statistical queries while preserving differential privacy. Previous approaches to this problem have either been very general but have not had run-time polynomial in the size of the database, have applied only to very limited classes of queries, or have relaxed the notion of worst-case error guarantees. In this paper we consider the large class of sparse queries, which take non-zero values on only polynomially many universe elements. We give efficient query release algorithms for this class, in both the interactive and the non-interactive setting. Our algorithms also achieve better accuracy bounds than previous general techniques do when applied to sparse queries: our bounds are independent of the universe size. In fact, even the runtime of our interactive mechanism is independent of the universe size, and so can be implemented in the "infinite universe" model in which no finite universe need be specified by the data curator
    corecore