12,916 research outputs found

    Privacy Games: Optimal User-Centric Data Obfuscation

    Full text link
    In this paper, we design user-centric obfuscation mechanisms that impose the minimum utility loss for guaranteeing user's privacy. We optimize utility subject to a joint guarantee of differential privacy (indistinguishability) and distortion privacy (inference error). This double shield of protection limits the information leakage through obfuscation mechanism as well as the posterior inference. We show that the privacy achieved through joint differential-distortion mechanisms against optimal attacks is as large as the maximum privacy that can be achieved by either of these mechanisms separately. Their utility cost is also not larger than what either of the differential or distortion mechanisms imposes. We model the optimization problem as a leader-follower game between the designer of obfuscation mechanism and the potential adversary, and design adaptive mechanisms that anticipate and protect against optimal inference algorithms. Thus, the obfuscation mechanism is optimal against any inference algorithm

    A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data

    Get PDF
    Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureau’s Quarterly Workforce Indicators

    Controlling for individual heterogeneity in longitudinal models, with applications to student achievement

    Full text link
    Longitudinal data tracking repeated measurements on individuals are highly valued for research because they offer controls for unmeasured individual heterogeneity that might otherwise bias results. Random effects or mixed models approaches, which treat individual heterogeneity as part of the model error term and use generalized least squares to estimate model parameters, are often criticized because correlation between unobserved individual effects and other model variables can lead to biased and inconsistent parameter estimates. Starting with an examination of the relationship between random effects and fixed effects estimators in the standard unobserved effects model, this article demonstrates through analysis and simulation that the mixed model approach has a ``bias compression'' property under a general model for individual heterogeneity that can mitigate bias due to uncontrolled differences among individuals. The general model is motivated by the complexities of longitudinal student achievement measures, but the results have broad applicability to longitudinal modeling.Comment: Published at http://dx.doi.org/10.1214/07-EJS057 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore