2,753 research outputs found
On the Differential Privacy of Bayesian Inference
We study how to communicate findings of Bayesian inference to third parties,
while preserving the strong guarantee of differential privacy. Our main
contributions are four different algorithms for private Bayesian inference on
proba-bilistic graphical models. These include two mechanisms for adding noise
to the Bayesian updates, either directly to the posterior parameters, or to
their Fourier transform so as to preserve update consistency. We also utilise a
recently introduced posterior sampling mechanism, for which we prove bounds for
the specific but general case of discrete Bayesian networks; and we introduce a
maximum-a-posteriori private mechanism. Our analysis includes utility and
privacy bounds, with a novel focus on the influence of graph structure on
privacy. Worked examples and experiments with Bayesian na{\"i}ve Bayes and
Bayesian linear regression illustrate the application of our mechanisms.Comment: AAAI 2016, Feb 2016, Phoenix, Arizona, United State
Fast Differentially Private Matrix Factorization
Differentially private collaborative filtering is a challenging task, both in
terms of accuracy and speed. We present a simple algorithm that is provably
differentially private, while offering good performance, using a novel
connection of differential privacy to Bayesian posterior sampling via
Stochastic Gradient Langevin Dynamics. Due to its simplicity the algorithm
lends itself to efficient implementation. By careful systems design and by
exploiting the power law behavior of the data to maximize CPU cache bandwidth
we are able to generate 1024 dimensional models at a rate of 8.5 million
recommendations per second on a single PC
A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data
Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureau’s Quarterly Workforce Indicators
- …