2,172 research outputs found
Less is More: Revisiting Gaussian Mechanism for Differential Privacy
In this paper, we identify that the classic Gaussian mechanism and its
variants for differential privacy all suffer from \textbf{the curse of
full-rank covariance matrices}, and hence the expected accuracy losses of these
mechanisms applied to high dimensional query results, e.g., in ,
all increase linearly with .
To lift this curse, we design a Rank-1 Singular Multivariate Gaussian
Mechanism (R1SMG). It achieves -DP on query results in
by perturbing the results with noise following a singular
multivariate Gaussian distribution, whose covariance matrix is a
\textbf{randomly} generated rank-1 positive semi-definite matrix. In contrast,
the classic Gaussian mechanism and its variants all consider
\textbf{deterministic} full-rank covariance matrices. Our idea is motivated by
a clue from Dwork et al.'s work on Gaussian mechanism that has been ignored in
the literature: when projecting multivariate Gaussian noise with a full-rank
covariance matrix onto a set of orthonormal basis in , only the
coefficient of a single basis can contribute to the privacy guarantee.
This paper makes the following technical contributions.
(i) R1SMG achieves -DP guarantee on query results in
, while the magnitude of the additive noise decreases with .
Therefore, \textbf{less is more}, i.e., less amount of noise is able to
sanitize higher dimensional query results. When , the
expected accuracy loss converges to , where
is the sensitivity of the query function .
(ii) Compared with other mechanisms, R1SMG is less likely to generate noise
with large magnitude that overwhelms the query results, because the kurtosis
and skewness of the nondeterministic accuracy loss introduced by R1SMG is
larger than that introduced by other mechanisms
Comparing Population Means under Local Differential Privacy: with Significance and Power
A statistical hypothesis test determines whether a hypothesis should be
rejected based on samples from populations. In particular, randomized
controlled experiments (or A/B testing) that compare population means using,
e.g., t-tests, have been widely deployed in technology companies to aid in
making data-driven decisions. Samples used in these tests are collected from
users and may contain sensitive information. Both the data collection and the
testing process may compromise individuals' privacy. In this paper, we study
how to conduct hypothesis tests to compare population means while preserving
privacy. We use the notation of local differential privacy (LDP), which has
recently emerged as the main tool to ensure each individual's privacy without
the need of a trusted data collector. We propose LDP tests that inject noise
into every user's data in the samples before collecting them (so users do not
need to trust the data collector), and draw conclusions with bounded type-I
(significance level) and type-II errors (1 - power). Our approaches can be
extended to the scenario where some users require LDP while some are willing to
provide exact data. We report experimental results on real-world datasets to
verify the effectiveness of our approaches.Comment: Full version of an AAAI 2018 conference pape
One-shot Empirical Privacy Estimation for Federated Learning
Privacy estimation techniques for differentially private (DP) algorithms are
useful for comparing against analytical bounds, or to empirically measure
privacy loss in settings where known analytical bounds are not tight. However,
existing privacy auditing techniques usually make strong assumptions on the
adversary (e.g., knowledge of intermediate model iterates or the training data
distribution), are tailored to specific tasks and model architectures, and
require retraining the model many times (typically on the order of thousands).
These shortcomings make deploying such techniques at scale difficult in
practice, especially in federated settings where model training can take days
or weeks. In this work, we present a novel "one-shot" approach that can
systematically address these challenges, allowing efficient auditing or
estimation of the privacy loss of a model during the same, single training run
used to fit model parameters, and without requiring any a priori knowledge
about the model architecture or task. We show that our method provides provably
correct estimates for privacy loss under the Gaussian mechanism, and we
demonstrate its performance on a well-established FL benchmark dataset under
several adversarial models
- …