24,727 research outputs found
Differential Privacy Applications to Bayesian and Linear Mixed Model Estimation
We consider a particular maximum likelihood estimator (MLE) and a computationally-intensive Bayesian method for differentially private estimation of the linear mixed-effects model (LMM) with normal random errors. The LMM is important because it is used in small area estimation and detailed industry tabulations that present significant challenges for confidentiality protection of the underlying data. The differentially private MLE performs well compared to the regular MLE, and deteriorates as the protection increases for a problem in which the small-area variation is at the county level. More dimensions of random effects are needed to adequately represent the time- dimension of the data, and for these cases the differentially private MLE cannot be computed. The direct Bayesian approach for the same model uses an informative, but reasonably diffuse, prior to compute the posterior predictive distribution for the random effects. The differential privacy of this approach is estimated by direct computation of the relevant odds ratios after deleting influential observations according to various criteria
Accurate and Efficient Private Release of Datacubes and Contingency Tables
A central problem in releasing aggregate information about sensitive data is
to do so accurately while providing a privacy guarantee on the output. Recent
work focuses on the class of linear queries, which include basic counting
queries, data cubes, and contingency tables. The goal is to maximize the
utility of their output, while giving a rigorous privacy guarantee. Most
results follow a common template: pick a "strategy" set of linear queries to
apply to the data, then use the noisy answers to these queries to reconstruct
the queries of interest. This entails either picking a strategy set that is
hoped to be good for the queries, or performing a costly search over the space
of all possible strategies.
In this paper, we propose a new approach that balances accuracy and
efficiency: we show how to improve the accuracy of a given query set by
answering some strategy queries more accurately than others. This leads to an
efficient optimal noise allocation for many popular strategies, including
wavelets, hierarchies, Fourier coefficients and more. For the important case of
marginal queries we show that this strictly improves on previous methods, both
analytically and empirically. Our results also extend to ensuring that the
returned query answers are consistent with an (unknown) data set at minimal
extra cost in terms of time and noise
- …