3,136 research outputs found
On the Differential Privacy of Bayesian Inference
We study how to communicate findings of Bayesian inference to third parties,
while preserving the strong guarantee of differential privacy. Our main
contributions are four different algorithms for private Bayesian inference on
proba-bilistic graphical models. These include two mechanisms for adding noise
to the Bayesian updates, either directly to the posterior parameters, or to
their Fourier transform so as to preserve update consistency. We also utilise a
recently introduced posterior sampling mechanism, for which we prove bounds for
the specific but general case of discrete Bayesian networks; and we introduce a
maximum-a-posteriori private mechanism. Our analysis includes utility and
privacy bounds, with a novel focus on the influence of graph structure on
privacy. Worked examples and experiments with Bayesian na{\"i}ve Bayes and
Bayesian linear regression illustrate the application of our mechanisms.Comment: AAAI 2016, Feb 2016, Phoenix, Arizona, United State
Differentially Private Exponential Random Graphs
We propose methods to release and analyze synthetic graphs in order to
protect privacy of individual relationships captured by the social network.
Proposed techniques aim at fitting and estimating a wide class of exponential
random graph models (ERGMs) in a differentially private manner, and thus offer
rigorous privacy guarantees. More specifically, we use the randomized response
mechanism to release networks under -edge differential privacy. To
maintain utility for statistical inference, treating the original graph as
missing, we propose a way to use likelihood based inference and Markov chain
Monte Carlo (MCMC) techniques to fit ERGMs to the produced synthetic networks.
We demonstrate the usefulness of the proposed techniques on a real data
example.Comment: minor edit
Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models
Motivated by a real-life problem of sharing social network data that contain
sensitive personal information, we propose a novel approach to release and
analyze synthetic graphs in order to protect privacy of individual
relationships captured by the social network while maintaining the validity of
statistical results. A case study using a version of the Enron e-mail corpus
dataset demonstrates the application and usefulness of the proposed techniques
in solving the challenging problem of maintaining privacy \emph{and} supporting
open access to network data to ensure reproducibility of existing studies and
discovering new scientific insights that can be obtained by analyzing such
data. We use a simple yet effective randomized response mechanism to generate
synthetic networks under -edge differential privacy, and then use
likelihood based inference for missing data and Markov chain Monte Carlo
techniques to fit exponential-family random graph models to the generated
synthetic networks.Comment: Updated, 39 page
PreFair: Privately Generating Justifiably Fair Synthetic Data
When a database is protected by Differential Privacy (DP), its usability is
limited in scope. In this scenario, generating a synthetic version of the data
that mimics the properties of the private data allows users to perform any
operation on the synthetic data, while maintaining the privacy of the original
data. Therefore, multiple works have been devoted to devising systems for DP
synthetic data generation. However, such systems may preserve or even magnify
properties of the data that make it unfair, endering the synthetic data unfit
for use. In this work, we present PreFair, a system that allows for DP fair
synthetic data generation. PreFair extends the state-of-the-art DP data
generation mechanisms by incorporating a causal fairness criterion that ensures
fair synthetic data. We adapt the notion of justifiable fairness to fit the
synthetic data generation scenario. We further study the problem of generating
DP fair synthetic data, showing its intractability and designing algorithms
that are optimal under certain assumptions. We also provide an extensive
experimental evaluation, showing that PreFair generates synthetic data that is
significantly fairer than the data generated by leading DP data generation
mechanisms, while remaining faithful to the private data.Comment: 15 pages, 11 figure
Recommended from our members
Noise-Aware Inference for Differential Privacy
Domains involving sensitive human data, such as health care, human mobility, and online activity, are becoming increasingly dependent upon machine learning algorithms. This leads to scenarios in which data owners wish to protect the privacy of individuals comprising the sensitive data, while at the same time data modelers wish to analyze and draw conclusions from the data. Thus there is a growing demand to develop effective private inference methods that can marry the needs of both parties. For this we turn to differential privacy, which provides a framework for executing algorithms in a private fashion by injecting specifically-designed randomization at various points in the process. The majority of existing work proceeds by ignoring the injected randomization, potentially leading to pathologies in algorithmic performance. There is, however, a small body of existing work that performs inference over the injected randomization in an attempt to design more principled algorithms. This thesis summarizes the subfield of noise-aware differentially private inference and contributes novel algorithms for important problems.
Differential privacy literature provides a multitude of privacy mechanisms. We opt for sufficient statistics perturbation (SSP), in which sufficient statistics, a quantity that captures all information about the model parameters, are corrupted with random noise and released to the public. This mechanism offers desirable efficiency properties in comparison to alternatives. In this thesis we develop methods in a principled manner that directly accounts for the injected noise in three settings: maximum likelihood estimation of undirected graphical models, Bayesian inference of exponential family models, and Bayesian inference of conditional regression models
Causal Discovery Under Local Privacy
Differential privacy is a widely adopted framework designed to safeguard the
sensitive information of data providers within a data set. It is based on the
application of controlled noise at the interface between the server that stores
and processes the data, and the data consumers. Local differential privacy is a
variant that allows data providers to apply the privatization mechanism
themselves on their data individually. Therefore it provides protection also in
contexts in which the server, or even the data collector, cannot be trusted.
The introduction of noise, however, inevitably affects the utility of the data,
particularly by distorting the correlations between individual data components.
This distortion can prove detrimental to tasks such as causal discovery. In
this paper, we consider various well-known locally differentially private
mechanisms and compare the trade-off between the privacy they provide, and the
accuracy of the causal structure produced by algorithms for causal learning
when applied to data obfuscated by these mechanisms. Our analysis yields
valuable insights for selecting appropriate local differentially private
protocols for causal discovery tasks. We foresee that our findings will aid
researchers and practitioners in conducting locally private causal discovery
- …