Microdata Disclosure by Resampling - Empirical Findings for . . .


A problem which statistical o#ces and research institutes are faced with by releasing micro-data is the preservation of confidentiality. Traditional methods to avoid disclosure often destroy the structure of data, i.d., information loss is more or less high. In this paper I discuss an alternative technique of creating scientific-use-files, which reproduce the characteristics of the original data quite well. It is based on an idea of Fienberg (1997 und 1994) [4], [5] to estimate and resample from the empirical multivariate cumulative distribution function of the data to get synthetic data. The procedure creates datasets - the resample - which have the same charateristics as the original survey data. In this paper I present some applications of this method with (a) simulated data and (b) innovation survey data, the Mannheim Innovation Panel (MIP), and compare resampling with a traditional method of disclosure control, disturbance with multiplicative error, concerning confidentiality on the one hand and the usage of the disturbed data for di#erent kinds of analyses on the other hand. Univariate and multivariate distributions can be better reproduced by resampling. Linear regression results can be reproduced quite well with perturbed data as well as with resamples. Anonymized data with multiplicative perturbed values better protect against re-identification as resampling

Similar works

Full text

oaioai:CiteSeerX.psu: time updated on 10/22/2014

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.