Search CORE

2,172 research outputs found

Less is More: Revisiting Gaussian Mechanism for Differential Privacy

Author: Ji Tianxi
Li Pan
Publication venue
Publication date: 04/06/2023
Field of study

In this paper, we identify that the classic Gaussian mechanism and its variants for differential privacy all suffer from \textbf{the curse of full-rank covariance matrices}, and hence the expected accuracy losses of these mechanisms applied to high dimensional query results, e.g., in

\mathbb{R}^M

, all increase linearly with

M

. To lift this curse, we design a Rank-1 Singular Multivariate Gaussian Mechanism (R1SMG). It achieves

(\epsilon,\delta)

-DP on query results in

\mathbb{R}^M

by perturbing the results with noise following a singular multivariate Gaussian distribution, whose covariance matrix is a \textbf{randomly} generated rank-1 positive semi-definite matrix. In contrast, the classic Gaussian mechanism and its variants all consider \textbf{deterministic} full-rank covariance matrices. Our idea is motivated by a clue from Dwork et al.'s work on Gaussian mechanism that has been ignored in the literature: when projecting multivariate Gaussian noise with a full-rank covariance matrix onto a set of orthonormal basis in

\mathbb{R}^M

, only the coefficient of a single basis can contribute to the privacy guarantee. This paper makes the following technical contributions. (i) R1SMG achieves

(\epsilon,\delta)

-DP guarantee on query results in

\mathbb{R}^M

, while the magnitude of the additive noise decreases with

M

. Therefore, \textbf{less is more}, i.e., less amount of noise is able to sanitize higher dimensional query results. When

M\rightarrow \infty

, the expected accuracy loss converges to

{2(\Delta_2f)^2}/{\epsilon}

, where

\Delta_2f

is the

l_2

sensitivity of the query function

f

. (ii) Compared with other mechanisms, R1SMG is less likely to generate noise with large magnitude that overwhelms the query results, because the kurtosis and skewness of the nondeterministic accuracy loss introduced by R1SMG is larger than that introduced by other mechanisms

arXiv.org e-Print Archive

Comparing Population Means under Local Differential Privacy: with Significance and Power

Author: Allen Joshua
Ding Bolin
Li Paul
Nori Harsha
Publication venue
Publication date: 23/03/2018
Field of study

A statistical hypothesis test determines whether a hypothesis should be rejected based on samples from populations. In particular, randomized controlled experiments (or A/B testing) that compare population means using, e.g., t-tests, have been widely deployed in technology companies to aid in making data-driven decisions. Samples used in these tests are collected from users and may contain sensitive information. Both the data collection and the testing process may compromise individuals' privacy. In this paper, we study how to conduct hypothesis tests to compare population means while preserving privacy. We use the notation of local differential privacy (LDP), which has recently emerged as the main tool to ensure each individual's privacy without the need of a trusted data collector. We propose LDP tests that inject noise into every user's data in the samples before collecting them (so users do not need to trust the data collector), and draw conclusions with bounded type-I (significance level) and type-II errors (1 - power). Our approaches can be extended to the scenario where some users require LDP while some are willing to provide exact data. We report experimental results on real-world datasets to verify the effectiveness of our approaches.Comment: Full version of an AAAI 2018 conference pape

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

One-shot Empirical Privacy Estimation for Federated Learning

Author: Andrew Galen
Kairouz Peter
McMahan H. Brendan
Oh Sewoong
Oprea Alina
Suriyakumar Vinith
Publication venue
Publication date: 22/05/2023
Field of study

Privacy estimation techniques for differentially private (DP) algorithms are useful for comparing against analytical bounds, or to empirically measure privacy loss in settings where known analytical bounds are not tight. However, existing privacy auditing techniques usually make strong assumptions on the adversary (e.g., knowledge of intermediate model iterates or the training data distribution), are tailored to specific tasks and model architectures, and require retraining the model many times (typically on the order of thousands). These shortcomings make deploying such techniques at scale difficult in practice, especially in federated settings where model training can take days or weeks. In this work, we present a novel "one-shot" approach that can systematically address these challenges, allowing efficient auditing or estimation of the privacy loss of a model during the same, single training run used to fit model parameters, and without requiring any a priori knowledge about the model architecture or task. We show that our method provides provably correct estimates for privacy loss under the Gaussian mechanism, and we demonstrate its performance on a well-established FL benchmark dataset under several adversarial models

arXiv.org e-Print Archive