7,461 research outputs found

    A simple and practical algorithm for differentially private data release

    Get PDF
    We present new theoretical results on differentially private data release useful with respect to any target class of counting queries, coupled with experimental results on a variety of real world data sets. Specifically, we study a simple combination of the multiplicative weights approach of [Hardt and Rothblum, 2010] with the exponential mechanism of [McSherry and Talwar, 2007]. The multiplicative weights framework allows us to maintain and improve a distribution approximating a given data set with respect to a set of counting queries. We use the exponential mechanism to select those queries most incorrectly tracked by the current distribution. Combing the two, we quickly approach a distribution that agrees with the data set on the given set of queries up to small error. The resulting algorithm and its analysis is simple, but nevertheless improves upon previous work in terms of both error and running time. We also empirically demonstrate the practicality of our approach on several data sets commonly used in the statistical community for contingency table release

    Differentially Private Model Selection with Penalized and Constrained Likelihood

    Full text link
    In statistical disclosure control, the goal of data analysis is twofold: The released information must provide accurate and useful statistics about the underlying population of interest, while minimizing the potential for an individual record to be identified. In recent years, the notion of differential privacy has received much attention in theoretical computer science, machine learning, and statistics. It provides a rigorous and strong notion of protection for individuals' sensitive information. A fundamental question is how to incorporate differential privacy into traditional statistical inference procedures. In this paper we study model selection in multivariate linear regression under the constraint of differential privacy. We show that model selection procedures based on penalized least squares or likelihood can be made differentially private by a combination of regularization and randomization, and propose two algorithms to do so. We show that our private procedures are consistent under essentially the same conditions as the corresponding non-private procedures. We also find that under differential privacy, the procedure becomes more sensitive to the tuning parameters. We illustrate and evaluate our method using simulation studies and two real data examples

    Differentially Private Data Releasing for Smooth Queries with Synthetic Database Output

    Full text link
    We consider accurately answering smooth queries while preserving differential privacy. A query is said to be KK-smooth if it is specified by a function defined on [1,1]d[-1,1]^d whose partial derivatives up to order KK are all bounded. We develop an ϵ\epsilon-differentially private mechanism for the class of KK-smooth queries. The major advantage of the algorithm is that it outputs a synthetic database. In real applications, a synthetic database output is appealing. Our mechanism achieves an accuracy of O(nK2d+K/ϵ)O (n^{-\frac{K}{2d+K}}/\epsilon ), and runs in polynomial time. We also generalize the mechanism to preserve (ϵ,δ)(\epsilon, \delta)-differential privacy with slightly improved accuracy. Extensive experiments on benchmark datasets demonstrate that the mechanisms have good accuracy and are efficient

    Private Multiplicative Weights Beyond Linear Queries

    Full text link
    A wide variety of fundamental data analyses in machine learning, such as linear and logistic regression, require minimizing a convex function defined by the data. Since the data may contain sensitive information about individuals, and these analyses can leak that sensitive information, it is important to be able to solve convex minimization in a privacy-preserving way. A series of recent results show how to accurately solve a single convex minimization problem in a differentially private manner. However, the same data is often analyzed repeatedly, and little is known about solving multiple convex minimization problems with differential privacy. For simpler data analyses, such as linear queries, there are remarkable differentially private algorithms such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS 2010) that accurately answer exponentially many distinct queries. In this work, we extend these results to the case of convex minimization and show how to give accurate and differentially private solutions to *exponentially many* convex minimization problems on a sensitive dataset

    Prochlo: Strong Privacy for Analytics in the Crowd

    Full text link
    The large-scale monitoring of computer users' software activities has become commonplace, e.g., for application telemetry, error reporting, or demographic profiling. This paper describes a principled systems architecture---Encode, Shuffle, Analyze (ESA)---for performing such monitoring with high utility while also protecting user privacy. The ESA design, and its Prochlo implementation, are informed by our practical experiences with an existing, large deployment of privacy-preserving software monitoring. (cont.; see the paper

    Learning Coverage Functions and Private Release of Marginals

    Full text link
    We study the problem of approximating and learning coverage functions. A function c:2[n]R+c: 2^{[n]} \rightarrow \mathbf{R}^{+} is a coverage function, if there exists a universe UU with non-negative weights w(u)w(u) for each uUu \in U and subsets A1,A2,,AnA_1, A_2, \ldots, A_n of UU such that c(S)=uiSAiw(u)c(S) = \sum_{u \in \cup_{i \in S} A_i} w(u). Alternatively, coverage functions can be described as non-negative linear combinations of monotone disjunctions. They are a natural subclass of submodular functions and arise in a number of applications. We give an algorithm that for any γ,δ>0\gamma,\delta>0, given random and uniform examples of an unknown coverage function cc, finds a function hh that approximates cc within factor 1+γ1+\gamma on all but δ\delta-fraction of the points in time poly(n,1/γ,1/δ)poly(n,1/\gamma,1/\delta). This is the first fully-polynomial algorithm for learning an interesting class of functions in the demanding PMAC model of Balcan and Harvey (2011). Our algorithms are based on several new structural properties of coverage functions. Using the results in (Feldman and Kothari, 2014), we also show that coverage functions are learnable agnostically with excess 1\ell_1-error ϵ\epsilon over all product and symmetric distributions in time nlog(1/ϵ)n^{\log(1/\epsilon)}. In contrast, we show that, without assumptions on the distribution, learning coverage functions is at least as hard as learning polynomial-size disjoint DNF formulas, a class of functions for which the best known algorithm runs in time 2O~(n1/3)2^{\tilde{O}(n^{1/3})} (Klivans and Servedio, 2004). As an application of our learning results, we give simple differentially-private algorithms for releasing monotone conjunction counting queries with low average error. In particular, for any knk \leq n, we obtain private release of kk-way marginals with average error αˉ\bar{\alpha} in time nO(log(1/αˉ))n^{O(\log(1/\bar{\alpha}))}
    corecore