418 research outputs found
Learning Instrumental Variables with Structural and Non-Gaussianity Assumptions
Learning a causal effect from observational data requires strong assumptions. One possible method is to use instrumental variables, which are typically justified by background knowledge. It is possible, under further assumptions, to discover whether a variable is structurally instrumental to a target causal effect X→YX→Y. However, the few existing approaches are lacking on how general these assumptions can be, and how to express possible equivalence classes of solutions. We present instrumental variable discovery methods that systematically characterize which set of causal effects can and cannot be discovered under local graphical criteria that define instrumental variables, without reconstructing full causal graphs. We also introduce the first methods to exploit non-Gaussianity assumptions, highlighting identifiability problems and solutions. Due to the difficulty of estimating such models from finite data, we investigate how to strengthen assumptions in order to make the statistical problem more manageable
Identification and Estimation of Causal Effects Using non-Gaussianity and Auxiliary Covariates
Assessing causal effects in the presence of unmeasured confounding is a
challenging problem. Although auxiliary variables, such as instrumental
variables, are commonly used to identify causal effects, they are often
unavailable in practice due to stringent and untestable conditions. To address
this issue, previous researches have utilized linear structural equation models
to show that the causal effect can be identifiable when noise variables of the
treatment and outcome are both non-Gaussian. In this paper, we investigate the
problem of identifying the causal effect using auxiliary covariates and
non-Gaussianity from the treatment. Our key idea is to characterize the impact
of unmeasured confounders using an observed covariate, assuming they are all
Gaussian. The auxiliary covariate can be an invalid instrument or an invalid
proxy variable. We demonstrate that the causal effect can be identified using
this measured covariate, even when the only source of non-Gaussianity comes
from the treatment. We then extend the identification results to the
multi-treatment setting and provide sufficient conditions for identification.
Based on our identification results, we propose a simple and efficient
procedure for calculating causal effects and show the -consistency of
the proposed estimator. Finally, we evaluate the performance of our estimator
through simulation studies and an application.Comment: 16 papges, 7 Figure
Recommended from our members
Exporting and productivity as part of the growth process: causal evidence from a data-driven structural VAR
This paper introduces a little known category of estimators - Linear Non-Gaussian vector autoregression models that are acyclic or cyclic - imported from the machine learning literature, to revisit a well-known debate. Does exporting increase firm productivity? Or is it only more productive firms that remain in the export market? We focus on a relatively well-studied country (Chile) and on already-exporting firms (i.e. the intensive margin of exporting). We explicitly look at the co-evolution of productivity and growth, and attempt to ascertain both contemporaneous and lagged causal relationships. Our findings suggest that exporting does not have any causal influence on the other variables. Instead, export seems to be determined by other dimensions of firm growth. With respect to learning by exporting (LBE), we find no evidence that export growth causes productivity growth within the period and very little evidence that exporting growth has a causal effect on subsequent TFP growth
Invariant Causal Prediction for Nonlinear Models
An important problem in many domains is to predict how a system will respond
to interventions. This task is inherently linked to estimating the system's
underlying causal structure. To this end, Invariant Causal Prediction (ICP)
(Peters et al., 2016) has been proposed which learns a causal model exploiting
the invariance of causal relations using data from different environments. When
considering linear models, the implementation of ICP is relatively
straightforward. However, the nonlinear case is more challenging due to the
difficulty of performing nonparametric tests for conditional independence. In
this work, we present and evaluate an array of methods for nonlinear and
nonparametric versions of ICP for learning the causal parents of given target
variables. We find that an approach which first fits a nonlinear model with
data pooled over all environments and then tests for differences between the
residual distributions across environments is quite robust across a large
variety of simulation settings. We call this procedure "invariant residual
distribution test". In general, we observe that the performance of all
approaches is critically dependent on the true (unknown) causal structure and
it becomes challenging to achieve high power if the parental set includes more
than two variables. As a real-world example, we consider fertility rate
modelling which is central to world population projections. We explore
predicting the effect of hypothetical interventions using the accepted models
from nonlinear ICP. The results reaffirm the previously observed central causal
role of child mortality rates
Distribution-Based Causal Inference : A Review and Practical Guidance for Epidemiologists
Peer reviewe
Perturbations and Causality in Gaussian Latent Variable Models
Causal inference is a challenging problem with observational data alone. The
task becomes easier when having access to data from perturbing the underlying
system, even when happening in a non-randomized way: this is the setting we
consider, encompassing also latent confounding variables. To identify causal
relations among a collections of covariates and a response variable, existing
procedures rely on at least one of the following assumptions: i) the response
variable remains unperturbed, ii) the latent variables remain unperturbed, and
iii) the latent effects are dense. In this paper, we examine a perturbation
model for interventional data, which can be viewed as a mixed-effects linear
structural causal model, over a collection of Gaussian variables that does not
satisfy any of these conditions. We propose a maximum-likelihood estimator --
dubbed DirectLikelihood -- that exploits system-wide invariances to uniquely
identify the population causal structure from unspecific perturbation data, and
our results carry over to linear structural causal models without requiring
Gaussianity. We illustrate the utility of our framework on synthetic data as
well as real data involving California reservoirs and protein expressions
- …