6 research outputs found
Causal Inference Methods For Bias Correction In Data Analyses
Many problems in the empirical sciences and rational decision making require causal, rather than associative, reasoning. The field of causal inference is concerned with establishing and quantifying cause-effect relationships to inform interventions, even in the absence of direct experimentation or randomization. With the proliferation of massive datasets, it is crucial that we develop principled approaches to drawing actionable conclusions from imperfect information. Inferring valid causal conclusions is impeded by the fact that data are unstructured and filled with different sources of bias. The types of bias that we consider in this thesis include: confounding bias induced by common causes of observed exposures and outcomes, bias in estimation induced by high dimensional data and curse of dimensionality, discriminatory bias encoded in data that reflect historical patterns of discrimination and inequality, and missing data bias where instantiations of variables are systematically missing.
The focus of this thesis is on the development of novel causal and statistical methodologies to better understand and resolve these pressing challenges. We draw on methodological insights from both machine learning/artificial intelligence and statistical theory. Specifically, we use ideas from graphical modeling to encode our assumptions about the underlying data generating mechanisms in a clear and succinct manner. Further, we use ideas from nonparametric and semiparametric theories to enable the use of flexible machine learning modes in the estimation of causal effects that are identified as functions of observed data.
There are four main contributions to this thesis. First, we bridge the gap between identification and semiparametric estimation of causal effects that are identified in causal graphical models with unmeasured confounders. Second, we use semiparametric inference theory for marginal structural models to give the first general approach to causal sufficient dimension reduction of a high dimensional treatment. Third, we address conceptual, methodological, and practical gaps in assessing and overcoming disparities in automated decision making using causal inference and constrained optimization. Fourth, we use graphical representations of missing data mechanisms and provide a complete characterization of identification of the underlying joint distribution where some variables are systematically missing and others are unmeasured
Deep Causal Learning: Representation, Discovery and Inference
Causal learning has attracted much attention in recent years because
causality reveals the essential relationship between things and indicates how
the world progresses. However, there are many problems and bottlenecks in
traditional causal learning methods, such as high-dimensional unstructured
variables, combinatorial optimization problems, unknown intervention,
unobserved confounders, selection bias and estimation bias. Deep causal
learning, that is, causal learning based on deep neural networks, brings new
insights for addressing these problems. While many deep learning-based causal
discovery and causal inference methods have been proposed, there is a lack of
reviews exploring the internal mechanism of deep learning to improve causal
learning. In this article, we comprehensively review how deep learning can
contribute to causal learning by addressing conventional challenges from three
aspects: representation, discovery, and inference. We point out that deep
causal learning is important for the theoretical extension and application
expansion of causal science and is also an indispensable part of general
artificial intelligence. We conclude the article with a summary of open issues
and potential directions for future work
Semiparametric Inference For Causal Effects In Graphical Models With Hidden Variables
Identification theory for causal effects in causal models associated with
hidden variable directed acyclic graphs (DAGs) is well studied. However, the
corresponding algorithms are underused due to the complexity of estimating the
identifying functionals they output. In this work, we bridge the gap between
identification and estimation of population-level causal effects involving a
single treatment and a single outcome. We derive influence function based
estimators that exhibit double robustness for the identified effects in a large
class of hidden variable DAGs where the treatment satisfies a simple graphical
criterion; this class includes models yielding the adjustment and front-door
functionals as special cases. We also provide necessary and sufficient
conditions under which the statistical model of a hidden variable DAG is
nonparametrically saturated and implies no equality constraints on the observed
data distribution. Further, we derive an important class of hidden variable
DAGs that imply observed data distributions observationally equivalent (up to
equality constraints) to fully observed DAGs. In these classes of DAGs, we
derive estimators that achieve the semiparametric efficiency bounds for the
target of interest where the treatment satisfies our graphical criterion.
Finally, we provide a sound and complete identification algorithm that directly
yields a weight based estimation strategy for any identifiable effect in hidden
variable causal models.Comment: 75 page