219 research outputs found

    Sharp Total Variation Bounds for Finitely Exchangeable Arrays

    Full text link
    In this article we demonstrate the relationship between finitely exchangeable arrays and finitely exchangeable sequences. We then derive sharp bounds on the total variation distance between distributions of finitely and infinitely exchangeable arrays

    MALTS: Matching After Learning to Stretch

    Full text link
    We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate's contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.Comment: 40 pages, 5 Tables, 12 Figure

    Multiple Imputation Using Gaussian Copulas

    Get PDF
    Missing observations are pervasive throughout empirical research, especially in the social sciences. Despite multiple approaches to dealing adequately with missing data, many scholars still fail to address this vital issue. In this paper, we present a simple-to-use method for generating multiple imputations using a Gaussian copula. The Gaussian copula for multiple imputation (Hoff, 2007) allows scholars to attain estimation results that have good coverage and small bias. The use of copulas to model the dependence among variables will enable researchers to construct valid joint distributions of the data, even without knowledge of the actual underlying marginal distributions. Multiple imputations are then generated by drawing observations from the resulting posterior joint distribution and replacing the missing values. Using simulated and observational data from published social science research, we compare imputation via Gaussian copulas with two other widely used imputation methods: MICE and Amelia II. Our results suggest that the Gaussian copula approach has a slightly smaller bias, higher coverage rates, and narrower confidence intervals compared to the other methods. This is especially true when the variables with missing data are not normally distributed. These results, combined with theoretical guarantees and ease-of-use suggest that the approach examined provides an attractive alternative for applied researchers undertaking multiple imputations

    Damaging de novo mutations diminish motor skills in children on the autism spectrum

    Get PDF
    In individuals with autism spectrum disorder (ASD), de novo mutations have previously been shown to be significantly correlated with lower IQ but not with the core characteristics of ASD: deficits in social communication and interaction and restricted interests and repetitive patterns of behavior. We extend these findings by demonstrating in the Simons Simplex Collection that damaging de novo mutations in ASD individuals are also significantly and convincingly correlated with measures of impaired motor skills. This correlation is not explained by a correlation between IQ and motor skills. We find that IQ and motor skills are distinctly associated with damaging mutations and, in particular, that motor skills are a more sensitive indicator of mutational severity than is IQ, as judged by mutational type and target gene. We use this finding to propose a combined classification of phenotypic severity: mild (little impairment of either), moderate (impairment mainly to motor skills), and severe (impairment of both IQ and motor skills)

    Hierarchical array priors for ANOVA decompositions of cross-classified data

    Full text link
    ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays that can adapt to the presence of such interactions. These prior distributions are based on a type of array-variate normal distribution, for which a covariance matrix for each factor is estimated. This prior is able to adapt to potential similarities among the levels of a factor, and incorporate any such information into the estimation of the effects in which the factor appears. In the presence of such similarities, this prior is able to borrow information from well-estimated main effects and lower-order interactions to assist in the estimation of higher-order terms for which data information is limited.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS685 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore