1,138 research outputs found
Locally Private Causal Inference
Local differential privacy (LDP) is a differential privacy (DP) paradigm in
which individuals first apply a DP mechanism to their data (often by adding
noise) before transmiting the result to a curator. LDP ensures strong user
privacy protection because the curator does not have access to any of the
user's original information. On the curator's side, however, the noise for
privacy results in additional bias and variance in their analyses; thus it is
of great importance for analysts to incorporate the privacy noise into valid
statistical inference. In this article, we develop methodologies to infer
causal effects from privatized data under the Rubin Causal Model framework.
First, we present asymptotically unbiased and consistent estimators with their
variance estimators and plug-in confidence intervals. Second, we develop a
Bayesian nonparametric methodology along with a blocked Gibbs sampling
algorithm, which performs well in terms of MSE for tight privacy budgets.
Finally, we present simulation studies to evaluate the performance of our
proposed frequentist and Bayesian methodologies for various privacy budgets,
resulting in useful suggestions for performing causal inference for privatized
data.Comment: 24 page
Optimizing Noise for -Differential Privacy via Anti-Concentration and Stochastic Dominance
In this paper, we establish anti-concentration inequalities for additive
noise mechanisms which achieve -differential privacy (-DP), a notion of
privacy phrased in terms of a tradeoff function (a.k.a. ROC curve) which
limits the ability of an adversary to determine which individuals were in the
database. We show that canonical noise distributions (CNDs), proposed by Awan
and Vadhan (2023), match the anti-concentration bounds at half-integer values,
indicating that their tail behavior is near-optimal. We also show that all CNDs
are sub-exponential, regardless of the -DP guarantee. In the case of
log-concave CNDs, we show that they are the stochastically smallest noise
compared to any other noise distributions with the same privacy guarantee. In
terms of integer-valued noise, we propose a new notion of discrete CND and
prove that a discrete CND always exists, can be constructed by rounding a
continuous CND, and that the discrete CND is unique when designed for a
statistic with sensitivity 1. We further show that the discrete CND at
sensitivity 1 is stochastically smallest compared to other integer-valued
noises. Our theoretical results shed light on the different types of privacy
guarantees possible in the -DP framework and can be incorporated in more
complex mechanisms to optimize performance.Comment: 17 pages before appendix, 25 pages total, 6 figure
One Step to Efficient Synthetic Data
A common approach to synthetic data is to sample from a fitted model. We show
that under general assumptions, this approach results in a sample with
inefficient estimators and whose joint distribution is inconsistent with the
true distribution. Motivated by this, we propose a general method of producing
synthetic data, which is widely applicable for parametric models, has
asymptotically efficient summary statistics, and is both easily implemented and
highly computationally efficient. Our approach allows for the construction of
both partially synthetic datasets, which preserve certain summary statistics,
as well as fully synthetic data which satisfy the strong guarantee of
differential privacy (DP), both with the same asymptotic guarantees. We also
provide theoretical and empirical evidence that the distribution from our
procedure converges to the true distribution. Besides our focus on synthetic
data, our procedure can also be used to perform approximate hypothesis tests in
the presence of intractable likelihood functions.Comment: 17 pages, before appendices/reference
Simulation-based, Finite-sample Inference for Privatized Data
Privacy protection methods, such as differentially private mechanisms,
introduce noise into resulting statistics which often produces complex and
intractable sampling distributions. In this paper, we propose a
simulation-based "repro sample" approach to produce statistically valid
confidence intervals and hypothesis tests, which builds on the work of Xie and
Wang (2022). We show that this methodology is applicable to a wide variety of
private inference problems, appropriately accounts for biases introduced by
privacy mechanisms (such as by clamping), and improves over other
state-of-the-art inference methods such as the parametric bootstrap in terms of
the coverage and type I error of the private inference. We also develop
significant improvements and extensions for the repro sample methodology for
general models (not necessarily related to privacy), including 1) modifying the
procedure to ensure guaranteed coverage and type I errors, even accounting for
Monte Carlo error, and 2) proposing efficient numerical algorithms to implement
the confidence intervals and -values.Comment: 25 pages before references and appendices, 42 pages total, 10
figures, 9 table
Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies
Differentially private (DP) mechanisms protect individual-level information
by introducing randomness into the statistical analysis procedure. Despite the
availability of numerous DP tools, there remains a lack of general techniques
for conducting statistical inference under DP. We examine a DP bootstrap
procedure that releases multiple private bootstrap estimates to infer the
sampling distribution and construct confidence intervals (CIs). Our privacy
analysis presents new results on the privacy cost of a single DP bootstrap
estimate, applicable to any DP mechanisms, and identifies some misapplications
of the bootstrap in the existing literature. Using the Gaussian-DP (GDP)
framework (Dong et al.,2022), we show that the release of DP bootstrap
estimates from mechanisms satisfying -GDP
asymptotically satisfies -GDP as goes to infinity. Moreover, we use
deconvolution with the DP bootstrap estimates to accurately infer the sampling
distribution, which is novel in DP. We derive CIs from our density estimate for
tasks such as population mean estimation, logistic regression, and quantile
regression, and we compare them to existing methods using simulations and
real-world experiments on 2016 Canada Census data. Our private CIs achieve the
nominal coverage level and offer the first approach to private inference for
quantile regression
- …