356 research outputs found

    Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

    Full text link
    Consider a database of nn people, each represented by a bit-string of length dd corresponding to the setting of dd binary attributes. A kk-way marginal query is specified by a subset SS of kk attributes, and a S|S|-dimensional binary vector β\beta specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to SS agrees with β\beta. Privately releasing approximate answers to a set of kk-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least Ω(min{n,dk2})\Omega(\min\{\sqrt{n},d^{\frac{k}{2}}\}) and at most O~(min{nd1/4,dk2})\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\}). However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small nn. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most O~(ndk/24)\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}}). This error bound is as good as the best known information theoretic upper bounds for k=2k=2. This bound is an improvement over previous work on efficiently releasing marginals when kk is small and when error o(n)o(n) is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis

    Accurate and Efficient Private Release of Datacubes and Contingency Tables

    Full text link
    A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility of their output, while giving a rigorous privacy guarantee. Most results follow a common template: pick a "strategy" set of linear queries to apply to the data, then use the noisy answers to these queries to reconstruct the queries of interest. This entails either picking a strategy set that is hoped to be good for the queries, or performing a costly search over the space of all possible strategies. In this paper, we propose a new approach that balances accuracy and efficiency: we show how to improve the accuracy of a given query set by answering some strategy queries more accurately than others. This leads to an efficient optimal noise allocation for many popular strategies, including wavelets, hierarchies, Fourier coefficients and more. For the important case of marginal queries we show that this strictly improves on previous methods, both analytically and empirically. Our results also extend to ensuring that the returned query answers are consistent with an (unknown) data set at minimal extra cost in terms of time and noise

    Nearly Optimal Private Convolution

    Full text link
    We study computing the convolution of a private input xx with a public input hh, while satisfying the guarantees of (ϵ,δ)(\epsilon, \delta)-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then captures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data. We give a nearly optimal algorithm for computing convolutions while satisfying (ϵ,δ)(\epsilon, \delta)-differential privacy. Surprisingly, we follow the simple strategy of adding independent Laplacian noise to each Fourier coefficient and bounding the privacy loss using the composition theorem of Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal noise to add to each Fourier coefficient using convex programming duality. Our algorithm is very efficient -- it is essentially no more computationally expensive than a Fast Fourier Transform. To prove near optimality, we use the recent discrepancy lowerbounds of Muthukrishnan and Nikolov and derive a spectral lower bound using a characterization of discrepancy in terms of determinants

    An Adaptive Mechanism for Accurate Query Answering under Differential Privacy

    Full text link
    We propose a novel mechanism for answering sets of count- ing queries under differential privacy. Given a workload of counting queries, the mechanism automatically selects a different set of "strategy" queries to answer privately, using those answers to derive answers to the workload. The main algorithm proposed in this paper approximates the optimal strategy for any workload of linear counting queries. With no cost to the privacy guarantee, the mechanism improves significantly on prior approaches and achieves near-optimal error for many workloads, when applied under (\epsilon, \delta)-differential privacy. The result is an adaptive mechanism which can help users achieve good utility without requiring that they reason carefully about the best formulation of their task.Comment: VLDB2012. arXiv admin note: substantial text overlap with arXiv:1103.136

    Faster Algorithms for Privately Releasing Marginals

    Get PDF
    We study the problem of releasing kk-way marginals of a database D({0,1}d)nD \in (\{0,1\}^d)^n, while preserving differential privacy. The answer to a kk-way marginal query is the fraction of DD's records x{0,1}dx \in \{0,1\}^d with a given value in each of a given set of up to kk columns. Marginal queries enable a rich class of statistical analyses of a dataset, and designing efficient algorithms for privately releasing marginal queries has been identified as an important open problem in private data analysis (cf. Barak et. al., PODS '07). We give an algorithm that runs in time dO(k)d^{O(\sqrt{k})} and releases a private summary capable of answering any kk-way marginal query with at most ±.01\pm .01 error on every query as long as ndO(k)n \geq d^{O(\sqrt{k})}. To our knowledge, ours is the first algorithm capable of privately releasing marginal queries with non-trivial worst-case accuracy guarantees in time substantially smaller than the number of kk-way marginal queries, which is dΘ(k)d^{\Theta(k)} (for kdk \ll d)

    Marginal Release Under Local Differential Privacy

    Full text link
    Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals
    corecore