845 research outputs found

    Differentially Private Data Releasing for Smooth Queries with Synthetic Database Output

    Full text link
    We consider accurately answering smooth queries while preserving differential privacy. A query is said to be KK-smooth if it is specified by a function defined on [1,1]d[-1,1]^d whose partial derivatives up to order KK are all bounded. We develop an ϵ\epsilon-differentially private mechanism for the class of KK-smooth queries. The major advantage of the algorithm is that it outputs a synthetic database. In real applications, a synthetic database output is appealing. Our mechanism achieves an accuracy of O(nK2d+K/ϵ)O (n^{-\frac{K}{2d+K}}/\epsilon ), and runs in polynomial time. We also generalize the mechanism to preserve (ϵ,δ)(\epsilon, \delta)-differential privacy with slightly improved accuracy. Extensive experiments on benchmark datasets demonstrate that the mechanisms have good accuracy and are efficient

    Nearly Optimal Private Convolution

    Full text link
    We study computing the convolution of a private input xx with a public input hh, while satisfying the guarantees of (ϵ,δ)(\epsilon, \delta)-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then captures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data. We give a nearly optimal algorithm for computing convolutions while satisfying (ϵ,δ)(\epsilon, \delta)-differential privacy. Surprisingly, we follow the simple strategy of adding independent Laplacian noise to each Fourier coefficient and bounding the privacy loss using the composition theorem of Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal noise to add to each Fourier coefficient using convex programming duality. Our algorithm is very efficient -- it is essentially no more computationally expensive than a Fast Fourier Transform. To prove near optimality, we use the recent discrepancy lowerbounds of Muthukrishnan and Nikolov and derive a spectral lower bound using a characterization of discrepancy in terms of determinants

    Learning Coverage Functions and Private Release of Marginals

    Full text link
    We study the problem of approximating and learning coverage functions. A function c:2[n]R+c: 2^{[n]} \rightarrow \mathbf{R}^{+} is a coverage function, if there exists a universe UU with non-negative weights w(u)w(u) for each uUu \in U and subsets A1,A2,,AnA_1, A_2, \ldots, A_n of UU such that c(S)=uiSAiw(u)c(S) = \sum_{u \in \cup_{i \in S} A_i} w(u). Alternatively, coverage functions can be described as non-negative linear combinations of monotone disjunctions. They are a natural subclass of submodular functions and arise in a number of applications. We give an algorithm that for any γ,δ>0\gamma,\delta>0, given random and uniform examples of an unknown coverage function cc, finds a function hh that approximates cc within factor 1+γ1+\gamma on all but δ\delta-fraction of the points in time poly(n,1/γ,1/δ)poly(n,1/\gamma,1/\delta). This is the first fully-polynomial algorithm for learning an interesting class of functions in the demanding PMAC model of Balcan and Harvey (2011). Our algorithms are based on several new structural properties of coverage functions. Using the results in (Feldman and Kothari, 2014), we also show that coverage functions are learnable agnostically with excess 1\ell_1-error ϵ\epsilon over all product and symmetric distributions in time nlog(1/ϵ)n^{\log(1/\epsilon)}. In contrast, we show that, without assumptions on the distribution, learning coverage functions is at least as hard as learning polynomial-size disjoint DNF formulas, a class of functions for which the best known algorithm runs in time 2O~(n1/3)2^{\tilde{O}(n^{1/3})} (Klivans and Servedio, 2004). As an application of our learning results, we give simple differentially-private algorithms for releasing monotone conjunction counting queries with low average error. In particular, for any knk \leq n, we obtain private release of kk-way marginals with average error αˉ\bar{\alpha} in time nO(log(1/αˉ))n^{O(\log(1/\bar{\alpha}))}

    What Can We Learn Privately?

    Full text link
    Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (non-private) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private PAC learner for the class of parity functions. Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms.Comment: 35 pages, 2 figure
    corecore