194 research outputs found
Statistical disclosure control for numeric microdata via sequential joint probability preserving data shuffling
Traditional perturbative statistical disclosure control (SDC) approaches such
as microaggregation, noise addition, rank swapping, etc, perturb the data in an
``ad-hoc" way in the sense that while they manage to preserve some particular
aspects of the data, they end up modifying others. Synthetic data approaches
based on the fully conditional specification data synthesis paradigm, on the
other hand, aim to generate new datasets that follow the same joint probability
distribution as the original data. These synthetic data approaches, however,
rely either on parametric statistical models, or non-parametric machine
learning models, which need to fit well the original data in order to generate
credible and useful synthetic data. Another important drawback is that they
tend to perform better when the variables are synthesized in the correct causal
order (i.e., in the same order as the true data generating process), which is
often unknown in practice. To circumvent these issues, we propose a fully
non-parametric and model free perturbative SDC approach that approximates the
joint distribution of the original data via sequential applications of
restricted permutations to the numerical microdata (where the restricted
permutations are guided by the joint distribution of a discretized version of
the data). Empirical comparisons against popular SDC approaches, using both
real and simulated datasets, suggest that the proposed approach is competitive
in terms of the trade-off between confidentiality and data utility.Comment: 25 page, 12 figure
Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes
Causal inference approaches in systems genetics exploit quantitative trait
loci (QTL) genotypes to infer causal relationships among phenotypes. The
genetic architecture of each phenotype may be complex, and poorly estimated
genetic architectures may compromise the inference of causal relationships
among phenotypes. Existing methods assume QTLs are known or inferred without
regard to the phenotype network structure. In this paper we develop a
QTL-driven phenotype network method (QTLnet) to jointly infer a causal
phenotype network and associated genetic architecture for sets of correlated
phenotypes. Randomization of alleles during meiosis and the unidirectional
influence of genotype on phenotype allow the inference of QTLs causal to
phenotypes. Causal relationships among phenotypes can be inferred using these
QTL nodes, enabling us to distinguish among phenotype networks that would
otherwise be distribution equivalent. We jointly model phenotypes and QTLs
using homogeneous conditional Gaussian regression models, and we derive a
graphical criterion for distribution equivalence. We validate the QTLnet
approach in a simulation study. Finally, we illustrate with simulated data and
a real example how QTLnet can be used to infer both direct and indirect effects
of QTLs and phenotypes that co-map to a genomic region.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS288 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …