When data are made available to others to analyze for their purposes, steps must be taken to ensure confidentiality, that is to prevent the identities of the persons or institutions that were studied are not disclosed and cannot be deduced. Disclosure risk analysis is conducted in order to create a public-use file (PUF) from confidential, or restricted-use, data. Based on this analysis of disclosure risks, statistical disclosure limitation (SDL) methodologies are applied to the data to create the PUF. The public-use file (PUF) is the only version of the microdata to which most researchers ever have access and the version from which much of the utility of the data is extracted. Therefore, decisions made to create the PUF, in terms of variable changes (e.g., deletions, recodes) and the selection of statistical disclosure limitation (SDL) methods (e.g., data swapping, imputation collapsing categories) are very important and must match the key intended purposes of the data collection and the disclosure risk. Typically, decisions regarding disclosure risk are made after data collection is completed. This article will describe a new model for conducting disclosure risk analysis for the creation of PUFs that moves decisions regarding disclosure risk to the beginning of the survey research process. Early thinking and decision-making regarding disclosure risk can lead to a more analytically useful PUF and the most optimal set of data products that can be developed (tables, maps, online analysis, and so on, in addition to the PUF). Efficiencies created between the various stages of the research process by the model will shorten the time between data collection and data release, thus increasing the value of the shared data to secondary analysts and to science
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.