10,395 research outputs found
Multiple imputation for sharing precise geographies in public use data
When releasing data to the public, data stewards are ethically and often
legally obligated to protect the confidentiality of data subjects' identities
and sensitive attributes. They also strive to release data that are informative
for a wide range of secondary analyses. Achieving both objectives is
particularly challenging when data stewards seek to release highly resolved
geographical information. We present an approach for protecting the
confidentiality of data with geographic identifiers based on multiple
imputation. The basic idea is to convert geography to latitude and longitude,
estimate a bivariate response model conditional on attributes, and simulate new
latitude and longitude values from these models. We illustrate the proposed
methods using data describing causes of death in Durham, North Carolina. In the
context of the application, we present a straightforward tool for generating
simulated geographies and attributes based on regression trees, and we present
methods for assessing disclosure risks with such simulated data.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS506 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality
"To protect the cofidentiality of survey respondents' identities and sensitive attributes, statistical agencies can release data in which cofidential values are replaced with multiple imputations. These are called synthetic data. We propose a two-stage approach to generating synthetic data that enables agencies to release different numbers of imputations for different variables. Generation in two stages can reduce computational burdens, decrease disclosure risk, and increase inferential accuracy relative to generation in one stage. We present methods for obtaining inferences from such data. We describe the application of two stage synthesis to creating a public use file for a German business database." (Author's abstract, IAB-Doku) ((en))IAB-Betriebspanel, Datenaufbereitung, Datenanonymisierung, Datenschutz, angewandte Statistik, statistische Methode, Arbeitsmarktforschung, Imputationsverfahren
A Survey of Irradiated Pillars, Globules, and Jets in the Carina Nebul
We present wide-field, deep narrowband H, Br, H, [S II],
[O III], and broadband I and K-band images of the Carina star formation region.
The new images provide a large-scale overview of all the H and Br
emission present in over a square degree centered on this signature star
forming complex. By comparing these images with archival HST and Spitzer images
we observe how intense UV radiation from O and B stars affects star formation
in molecular clouds. We use the images to locate new candidate outflows and
identify the principal shock waves and irradiated interfaces within dozens of
distinct areas of star-forming activity. Shocked molecular gas in jets traces
the parts of the flow that are most shielded from the intense UV radiation.
Combining the H and optical images gives a more complete view of the jets,
which are sometimes only visible in H. The Carina region hosts several
compact young clusters, and the gas within these clusters is affected by
radiation from both the cluster stars and the massive stars nearby. The Carina
Nebula is ideal for studying the physics of young H II regions and PDR's, as it
contains multiple examples of walls and irradiated pillars at various stages of
development. Some of the pillars have detached from their host molecular clouds
to form proplyds. Fluorescent H outlines the interfaces between the ionized
and molecular gas, and after removing continuum, we detect spatial offsets
between the Br and H emission along the irradiated interfaces.
These spatial offsets can be used to test current models of PDRs once synthetic
maps of these lines become available.Comment: Accepted in the Astronomical Journa
Estimating propensity scores with missing covariate data using general location mixture models
In many observational studies, researchers estimate causal effects using propensity scores, e.g., by matching or sub-classifying on the scores. Estimation of propensity scores is complicated when some values of the covariates aremissing. We propose to use multiple imputation to create completed datasets, from which propensity scores can be estimated, with a general location mixture model. The model assumes that the control units are a latent mixture of (i)units whose covariates are drawn from the same distributions as the treated units’ covariates and (ii) units whose covariates are drawn from different distributions. This formulation reduces the influence of control units outside the treated units’ region of the covariate space on the estimation of parameters in the imputation model, which can result in more plausible imputations and better balance in the true covariate distributions. We illustrate the benefits of 1 the latent class modeling approach with simulations and with an observationalstudy of the effect of breast feeding on children’s cognitive abilities
Estimating propensity scores with missing covariate data using general location mixture models
In many observational studies, researchers estimate causal effects using propensity scores, e.g., by matching or sub-classifying on the scores. Estimation of propensity scores is complicated when some values of the covariates aremissing. We propose to use multiple imputation to create completed datasets, from which propensity scores can be estimated, with a general location mixture model. The model assumes that the control units are a latent mixture of (i)units whose covariates are drawn from the same distributions as the treated units’ covariates and (ii) units whose covariates are drawn from different distributions. This formulation reduces the influence of control units outside the treated units’ region of the covariate space on the estimation of parameters in the imputation model, which can result in more plausible imputations and better balance in the true covariate distributions. We illustrate the benefits of 1 the latent class modeling approach with simulations and with an observationalstudy of the effect of breast feeding on children’s cognitive abilities
- …
