2,172 research outputs found

    Less is More: Revisiting Gaussian Mechanism for Differential Privacy

    Full text link
    In this paper, we identify that the classic Gaussian mechanism and its variants for differential privacy all suffer from \textbf{the curse of full-rank covariance matrices}, and hence the expected accuracy losses of these mechanisms applied to high dimensional query results, e.g., in RM\mathbb{R}^M, all increase linearly with MM. To lift this curse, we design a Rank-1 Singular Multivariate Gaussian Mechanism (R1SMG). It achieves (ϵ,δ)(\epsilon,\delta)-DP on query results in RM\mathbb{R}^M by perturbing the results with noise following a singular multivariate Gaussian distribution, whose covariance matrix is a \textbf{randomly} generated rank-1 positive semi-definite matrix. In contrast, the classic Gaussian mechanism and its variants all consider \textbf{deterministic} full-rank covariance matrices. Our idea is motivated by a clue from Dwork et al.'s work on Gaussian mechanism that has been ignored in the literature: when projecting multivariate Gaussian noise with a full-rank covariance matrix onto a set of orthonormal basis in RM\mathbb{R}^M, only the coefficient of a single basis can contribute to the privacy guarantee. This paper makes the following technical contributions. (i) R1SMG achieves (ϵ,δ)(\epsilon,\delta)-DP guarantee on query results in RM\mathbb{R}^M, while the magnitude of the additive noise decreases with MM. Therefore, \textbf{less is more}, i.e., less amount of noise is able to sanitize higher dimensional query results. When M→∞M\rightarrow \infty, the expected accuracy loss converges to 2(Δ2f)2/ϵ{2(\Delta_2f)^2}/{\epsilon}, where Δ2f\Delta_2f is the l2l_2 sensitivity of the query function ff. (ii) Compared with other mechanisms, R1SMG is less likely to generate noise with large magnitude that overwhelms the query results, because the kurtosis and skewness of the nondeterministic accuracy loss introduced by R1SMG is larger than that introduced by other mechanisms

    Comparing Population Means under Local Differential Privacy: with Significance and Power

    Full text link
    A statistical hypothesis test determines whether a hypothesis should be rejected based on samples from populations. In particular, randomized controlled experiments (or A/B testing) that compare population means using, e.g., t-tests, have been widely deployed in technology companies to aid in making data-driven decisions. Samples used in these tests are collected from users and may contain sensitive information. Both the data collection and the testing process may compromise individuals' privacy. In this paper, we study how to conduct hypothesis tests to compare population means while preserving privacy. We use the notation of local differential privacy (LDP), which has recently emerged as the main tool to ensure each individual's privacy without the need of a trusted data collector. We propose LDP tests that inject noise into every user's data in the samples before collecting them (so users do not need to trust the data collector), and draw conclusions with bounded type-I (significance level) and type-II errors (1 - power). Our approaches can be extended to the scenario where some users require LDP while some are willing to provide exact data. We report experimental results on real-world datasets to verify the effectiveness of our approaches.Comment: Full version of an AAAI 2018 conference pape

    One-shot Empirical Privacy Estimation for Federated Learning

    Full text link
    Privacy estimation techniques for differentially private (DP) algorithms are useful for comparing against analytical bounds, or to empirically measure privacy loss in settings where known analytical bounds are not tight. However, existing privacy auditing techniques usually make strong assumptions on the adversary (e.g., knowledge of intermediate model iterates or the training data distribution), are tailored to specific tasks and model architectures, and require retraining the model many times (typically on the order of thousands). These shortcomings make deploying such techniques at scale difficult in practice, especially in federated settings where model training can take days or weeks. In this work, we present a novel "one-shot" approach that can systematically address these challenges, allowing efficient auditing or estimation of the privacy loss of a model during the same, single training run used to fit model parameters, and without requiring any a priori knowledge about the model architecture or task. We show that our method provides provably correct estimates for privacy loss under the Gaussian mechanism, and we demonstrate its performance on a well-established FL benchmark dataset under several adversarial models
    • …
    corecore