12,916 research outputs found
Privacy Games: Optimal User-Centric Data Obfuscation
In this paper, we design user-centric obfuscation mechanisms that impose the
minimum utility loss for guaranteeing user's privacy. We optimize utility
subject to a joint guarantee of differential privacy (indistinguishability) and
distortion privacy (inference error). This double shield of protection limits
the information leakage through obfuscation mechanism as well as the posterior
inference. We show that the privacy achieved through joint
differential-distortion mechanisms against optimal attacks is as large as the
maximum privacy that can be achieved by either of these mechanisms separately.
Their utility cost is also not larger than what either of the differential or
distortion mechanisms imposes. We model the optimization problem as a
leader-follower game between the designer of obfuscation mechanism and the
potential adversary, and design adaptive mechanisms that anticipate and protect
against optimal inference algorithms. Thus, the obfuscation mechanism is
optimal against any inference algorithm
A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data
Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureau’s Quarterly Workforce Indicators
Controlling for individual heterogeneity in longitudinal models, with applications to student achievement
Longitudinal data tracking repeated measurements on individuals are highly
valued for research because they offer controls for unmeasured individual
heterogeneity that might otherwise bias results. Random effects or mixed models
approaches, which treat individual heterogeneity as part of the model error
term and use generalized least squares to estimate model parameters, are often
criticized because correlation between unobserved individual effects and other
model variables can lead to biased and inconsistent parameter estimates.
Starting with an examination of the relationship between random effects and
fixed effects estimators in the standard unobserved effects model, this article
demonstrates through analysis and simulation that the mixed model approach has
a ``bias compression'' property under a general model for individual
heterogeneity that can mitigate bias due to uncontrolled differences among
individuals. The general model is motivated by the complexities of longitudinal
student achievement measures, but the results have broad applicability to
longitudinal modeling.Comment: Published at http://dx.doi.org/10.1214/07-EJS057 in the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …