11,036 research outputs found

    Invariant Causal Prediction for Nonlinear Models

    Full text link
    An important problem in many domains is to predict how a system will respond to interventions. This task is inherently linked to estimating the system's underlying causal structure. To this end, Invariant Causal Prediction (ICP) (Peters et al., 2016) has been proposed which learns a causal model exploiting the invariance of causal relations using data from different environments. When considering linear models, the implementation of ICP is relatively straightforward. However, the nonlinear case is more challenging due to the difficulty of performing nonparametric tests for conditional independence. In this work, we present and evaluate an array of methods for nonlinear and nonparametric versions of ICP for learning the causal parents of given target variables. We find that an approach which first fits a nonlinear model with data pooled over all environments and then tests for differences between the residual distributions across environments is quite robust across a large variety of simulation settings. We call this procedure "invariant residual distribution test". In general, we observe that the performance of all approaches is critically dependent on the true (unknown) causal structure and it becomes challenging to achieve high power if the parental set includes more than two variables. As a real-world example, we consider fertility rate modelling which is central to world population projections. We explore predicting the effect of hypothetical interventions using the accepted models from nonlinear ICP. The results reaffirm the previously observed central causal role of child mortality rates

    Unsupervised Domain Adaptation with Copula Models

    Full text link
    We study the task of unsupervised domain adaptation, where no labeled data from the target domain is provided during training time. To deal with the potential discrepancy between the source and target distributions, both in features and labels, we exploit a copula-based regression framework. The benefits of this approach are two-fold: (a) it allows us to model a broader range of conditional predictive densities beyond the common exponential family, (b) we show how to leverage Sklar's theorem, the essence of the copula formulation relating the joint density to the copula dependency functions, to find effective feature mappings that mitigate the domain mismatch. By transforming the data to a copula domain, we show on a number of benchmark datasets (including human emotion estimation), and using different regression models for prediction, that we can achieve a more robust and accurate estimation of target labels, compared to recently proposed feature transformation (adaptation) methods.Comment: IEEE International Workshop On Machine Learning for Signal Processing 201

    Sample Selection in Models of Academic Performance

    Get PDF
    This article shows how admission and enrollment processes affect the interpretation of simple validation studies of academic performance. In a competitive market for students, optimal behavior of admissions committees and applicants drives the simple correlation between test scores and performance toward zero, regardless of the relationship in the population of prospective students. Data from our university’s MBA program support the prediction that applicants exhibit a higher correlation between test scores and undergraduate GPAs than do current students. This suggests that standard validation studies will understate the importance of GMAT scores in predicting performance of potential MBA students

    Do Community-Level Models Account for the Effects of Biotic Interactions? A Comparison of Community-Level and Species Distribution Modeling of Rocky Mountain Conifers

    Full text link
    Community-level models (CLMs) aim to improve species distribution modeling (SDM) methods by attempting to explicitly incorporate the influences of interacting species. However, the ability of CLMs to appropriately account for biotic interactions is unclear. We applied CLM and SDM methods to predict the distributions of three dominant conifer tree species in the U.S. Rocky Mountains and compared CLM and SDM predictive accuracy as well as the ability of each approach to accurately reproduce species co-occurrence patterns. We specifically evaluated the performance of two statistical algorithms, MARS and CForest, within both CLM and SDM frameworks. Across all species, differences in SDM and CLM predictive accuracy were slight and can be attributed to differences in model structure rather than accounting for the effects of biotic interactions. In addition, CLMs generally over-predicted species cooccurrence, while SDMs under-predicted cooccurrence. Our results demonstrate no real improvement in the ability of CLMs to account for biotic interactions relative to SDMs. We conclude that alternative modeling approaches are needed in order to accurately account for the effects of biotic interactions on species distributions
    • …
    corecore