219 research outputs found
Sharp Total Variation Bounds for Finitely Exchangeable Arrays
In this article we demonstrate the relationship between finitely exchangeable
arrays and finitely exchangeable sequences. We then derive sharp bounds on the
total variation distance between distributions of finitely and infinitely
exchangeable arrays
MALTS: Matching After Learning to Stretch
We introduce a flexible framework that produces high-quality almost-exact
matches for causal inference. Most prior work in matching uses ad-hoc distance
metrics, often leading to poor quality matches, particularly when there are
irrelevant covariates. In this work, we learn an interpretable distance metric
for matching, which leads to substantially higher quality matches. The learned
distance metric stretches the covariate space according to each covariate's
contribution to outcome prediction: this stretching means that mismatches on
important covariates carry a larger penalty than mismatches on irrelevant
covariates. Our ability to learn flexible distance metrics leads to matches
that are interpretable and useful for the estimation of conditional average
treatment effects.Comment: 40 pages, 5 Tables, 12 Figure
Multiple Imputation Using Gaussian Copulas
Missing observations are pervasive throughout empirical research, especially
in the social sciences. Despite multiple approaches to dealing adequately with
missing data, many scholars still fail to address this vital issue. In this
paper, we present a simple-to-use method for generating multiple imputations
using a Gaussian copula. The Gaussian copula for multiple imputation (Hoff,
2007) allows scholars to attain estimation results that have good coverage and
small bias. The use of copulas to model the dependence among variables will
enable researchers to construct valid joint distributions of the data, even
without knowledge of the actual underlying marginal distributions. Multiple
imputations are then generated by drawing observations from the resulting
posterior joint distribution and replacing the missing values. Using simulated
and observational data from published social science research, we compare
imputation via Gaussian copulas with two other widely used imputation methods:
MICE and Amelia II. Our results suggest that the Gaussian copula approach has a
slightly smaller bias, higher coverage rates, and narrower confidence intervals
compared to the other methods. This is especially true when the variables with
missing data are not normally distributed. These results, combined with
theoretical guarantees and ease-of-use suggest that the approach examined
provides an attractive alternative for applied researchers undertaking multiple
imputations
Damaging de novo mutations diminish motor skills in children on the autism spectrum
In individuals with autism spectrum disorder (ASD), de novo mutations have previously been shown to be significantly correlated with lower IQ but not with the core characteristics of ASD: deficits in social communication and interaction and restricted interests and repetitive patterns of behavior. We extend these findings by demonstrating in the Simons Simplex Collection that damaging de novo mutations in ASD individuals are also significantly and convincingly correlated with measures of impaired motor skills. This correlation is not explained by a correlation between IQ and motor skills. We find that IQ and motor skills are distinctly associated with damaging mutations and, in particular, that motor skills are a more sensitive indicator of mutational severity than is IQ, as judged by mutational type and target gene. We use this finding to propose a combined classification of phenotypic severity: mild (little impairment of either), moderate (impairment mainly to motor skills), and severe (impairment of both IQ and motor skills)
Hierarchical array priors for ANOVA decompositions of cross-classified data
ANOVA decompositions are a standard method for describing and estimating
heterogeneity among the means of a response variable across levels of multiple
categorical factors. In such a decomposition, the complete set of main effects
and interaction terms can be viewed as a collection of vectors, matrices and
arrays that share various index sets defined by the factor levels. For many
types of categorical factors, it is plausible that an ANOVA decomposition
exhibits some consistency across orders of effects, in that the levels of a
factor that have similar main-effect coefficients may also have similar
coefficients in higher-order interaction terms. In such a case, estimation of
the higher-order interactions should be improved by borrowing information from
the main effects and lower-order interactions. To take advantage of such
patterns, this article introduces a class of hierarchical prior distributions
for collections of interaction arrays that can adapt to the presence of such
interactions. These prior distributions are based on a type of array-variate
normal distribution, for which a covariance matrix for each factor is
estimated. This prior is able to adapt to potential similarities among the
levels of a factor, and incorporate any such information into the estimation of
the effects in which the factor appears. In the presence of such similarities,
this prior is able to borrow information from well-estimated main effects and
lower-order interactions to assist in the estimation of higher-order terms for
which data information is limited.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS685 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …
