Fully Synthetic Data for Complex Surveys

Mathur, Shirley; Reiter, Jeremy P.; Si, Yajuan

Fully Synthetic Data for Complex Surveys

Authors: Shirley Mathur
Jeremy P. Reiter
Yajuan Si
Publication date: 16 September 2023
Publisher

Abstract

When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Specifically, we generate pseudo-populations by applying the weighted finite population Bayesian bootstrap to account for survey weights, take simple random samples from those pseudo-populations, estimate synthesis models using these simple random samples, and release simulated data drawn from the models as the public use files. We use the framework of multiple imputation to enable variance estimation using two data generation strategies. In the first, we generate multiple data sets from each simple random sample, whereas in the second, we generate a single synthetic data set from each simple random sample. We present multiple imputation combining rules for each setting. We illustrate each approach and the repeated sampling properties of the combining rules using simulation studies

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2309.09115

Last time updated on 10/10/2023