10,395 research outputs found

    Multiple imputation for sharing precise geographies in public use data

    Full text link
    When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS506 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality

    Get PDF
    "To protect the cofidentiality of survey respondents' identities and sensitive attributes, statistical agencies can release data in which cofidential values are replaced with multiple imputations. These are called synthetic data. We propose a two-stage approach to generating synthetic data that enables agencies to release different numbers of imputations for different variables. Generation in two stages can reduce computational burdens, decrease disclosure risk, and increase inferential accuracy relative to generation in one stage. We present methods for obtaining inferences from such data. We describe the application of two stage synthesis to creating a public use file for a German business database." (Author's abstract, IAB-Doku) ((en))IAB-Betriebspanel, Datenaufbereitung, Datenanonymisierung, Datenschutz, angewandte Statistik, statistische Methode, Arbeitsmarktforschung, Imputationsverfahren

    A Survey of Irradiated Pillars, Globules, and Jets in the Carina Nebul

    Get PDF
    We present wide-field, deep narrowband H2_2, Brγ\gamma, Hα\alpha, [S II], [O III], and broadband I and K-band images of the Carina star formation region. The new images provide a large-scale overview of all the H2_2 and Brγ\gamma emission present in over a square degree centered on this signature star forming complex. By comparing these images with archival HST and Spitzer images we observe how intense UV radiation from O and B stars affects star formation in molecular clouds. We use the images to locate new candidate outflows and identify the principal shock waves and irradiated interfaces within dozens of distinct areas of star-forming activity. Shocked molecular gas in jets traces the parts of the flow that are most shielded from the intense UV radiation. Combining the H2_2 and optical images gives a more complete view of the jets, which are sometimes only visible in H2_2. The Carina region hosts several compact young clusters, and the gas within these clusters is affected by radiation from both the cluster stars and the massive stars nearby. The Carina Nebula is ideal for studying the physics of young H II regions and PDR's, as it contains multiple examples of walls and irradiated pillars at various stages of development. Some of the pillars have detached from their host molecular clouds to form proplyds. Fluorescent H2_2 outlines the interfaces between the ionized and molecular gas, and after removing continuum, we detect spatial offsets between the Brγ\gamma and H2_2 emission along the irradiated interfaces. These spatial offsets can be used to test current models of PDRs once synthetic maps of these lines become available.Comment: Accepted in the Astronomical Journa

    Estimating propensity scores with missing covariate data using general location mixture models

    No full text
    In many observational studies, researchers estimate causal effects using propensity scores, e.g., by matching or sub-classifying on the scores. Estimation of propensity scores is complicated when some values of the covariates aremissing. We propose to use multiple imputation to create completed datasets, from which propensity scores can be estimated, with a general location mixture model. The model assumes that the control units are a latent mixture of (i)units whose covariates are drawn from the same distributions as the treated units’ covariates and (ii) units whose covariates are drawn from different distributions. This formulation reduces the influence of control units outside the treated units’ region of the covariate space on the estimation of parameters in the imputation model, which can result in more plausible imputations and better balance in the true covariate distributions. We illustrate the benefits of 1 the latent class modeling approach with simulations and with an observationalstudy of the effect of breast feeding on children’s cognitive abilities

    Estimating propensity scores with missing covariate data using general location mixture models

    No full text
    In many observational studies, researchers estimate causal effects using propensity scores, e.g., by matching or sub-classifying on the scores. Estimation of propensity scores is complicated when some values of the covariates aremissing. We propose to use multiple imputation to create completed datasets, from which propensity scores can be estimated, with a general location mixture model. The model assumes that the control units are a latent mixture of (i)units whose covariates are drawn from the same distributions as the treated units’ covariates and (ii) units whose covariates are drawn from different distributions. This formulation reduces the influence of control units outside the treated units’ region of the covariate space on the estimation of parameters in the imputation model, which can result in more plausible imputations and better balance in the true covariate distributions. We illustrate the benefits of 1 the latent class modeling approach with simulations and with an observationalstudy of the effect of breast feeding on children’s cognitive abilities
    corecore