1,138 research outputs found

    Locally Private Causal Inference

    Full text link
    Local differential privacy (LDP) is a differential privacy (DP) paradigm in which individuals first apply a DP mechanism to their data (often by adding noise) before transmiting the result to a curator. LDP ensures strong user privacy protection because the curator does not have access to any of the user's original information. On the curator's side, however, the noise for privacy results in additional bias and variance in their analyses; thus it is of great importance for analysts to incorporate the privacy noise into valid statistical inference. In this article, we develop methodologies to infer causal effects from privatized data under the Rubin Causal Model framework. First, we present asymptotically unbiased and consistent estimators with their variance estimators and plug-in confidence intervals. Second, we develop a Bayesian nonparametric methodology along with a blocked Gibbs sampling algorithm, which performs well in terms of MSE for tight privacy budgets. Finally, we present simulation studies to evaluate the performance of our proposed frequentist and Bayesian methodologies for various privacy budgets, resulting in useful suggestions for performing causal inference for privatized data.Comment: 24 page

    Optimizing Noise for ff-Differential Privacy via Anti-Concentration and Stochastic Dominance

    Full text link
    In this paper, we establish anti-concentration inequalities for additive noise mechanisms which achieve ff-differential privacy (ff-DP), a notion of privacy phrased in terms of a tradeoff function (a.k.a. ROC curve) ff which limits the ability of an adversary to determine which individuals were in the database. We show that canonical noise distributions (CNDs), proposed by Awan and Vadhan (2023), match the anti-concentration bounds at half-integer values, indicating that their tail behavior is near-optimal. We also show that all CNDs are sub-exponential, regardless of the ff-DP guarantee. In the case of log-concave CNDs, we show that they are the stochastically smallest noise compared to any other noise distributions with the same privacy guarantee. In terms of integer-valued noise, we propose a new notion of discrete CND and prove that a discrete CND always exists, can be constructed by rounding a continuous CND, and that the discrete CND is unique when designed for a statistic with sensitivity 1. We further show that the discrete CND at sensitivity 1 is stochastically smallest compared to other integer-valued noises. Our theoretical results shed light on the different types of privacy guarantees possible in the ff-DP framework and can be incorporated in more complex mechanisms to optimize performance.Comment: 17 pages before appendix, 25 pages total, 6 figure

    One Step to Efficient Synthetic Data

    Full text link
    A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data, which is widely applicable for parametric models, has asymptotically efficient summary statistics, and is both easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data which satisfy the strong guarantee of differential privacy (DP), both with the same asymptotic guarantees. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform approximate hypothesis tests in the presence of intractable likelihood functions.Comment: 17 pages, before appendices/reference

    Simulation-based, Finite-sample Inference for Privatized Data

    Full text link
    Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often produces complex and intractable sampling distributions. In this paper, we propose a simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests, which builds on the work of Xie and Wang (2022). We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including 1) modifying the procedure to ensure guaranteed coverage and type I errors, even accounting for Monte Carlo error, and 2) proposing efficient numerical algorithms to implement the confidence intervals and pp-values.Comment: 25 pages before references and appendices, 42 pages total, 10 figures, 9 table

    Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies

    Full text link
    Differentially private (DP) mechanisms protect individual-level information by introducing randomness into the statistical analysis procedure. Despite the availability of numerous DP tools, there remains a lack of general techniques for conducting statistical inference under DP. We examine a DP bootstrap procedure that releases multiple private bootstrap estimates to infer the sampling distribution and construct confidence intervals (CIs). Our privacy analysis presents new results on the privacy cost of a single DP bootstrap estimate, applicable to any DP mechanisms, and identifies some misapplications of the bootstrap in the existing literature. Using the Gaussian-DP (GDP) framework (Dong et al.,2022), we show that the release of BB DP bootstrap estimates from mechanisms satisfying (μ/(2−2/e)B)(\mu/\sqrt{(2-2/\mathrm{e})B})-GDP asymptotically satisfies μ\mu-GDP as BB goes to infinity. Moreover, we use deconvolution with the DP bootstrap estimates to accurately infer the sampling distribution, which is novel in DP. We derive CIs from our density estimate for tasks such as population mean estimation, logistic regression, and quantile regression, and we compare them to existing methods using simulations and real-world experiments on 2016 Canada Census data. Our private CIs achieve the nominal coverage level and offer the first approach to private inference for quantile regression
    • …
    corecore