15 research outputs found
Variational Temporal Deconfounder for Individualized Treatment Effect Estimation from Longitudinal Observational Data
Estimating treatment effects, especially individualized treatment effects
(ITE), using observational data is challenging due to the complex situations of
confounding bias. Existing approaches for estimating treatment effects from
longitudinal observational data are usually built upon a strong assumption of
"unconfoundedness", which is hard to fulfill in real-world practice. In this
paper, we propose the Variational Temporal Deconfounder (VTD), an approach that
leverages deep variational embeddings in the longitudinal setting using proxies
(i.e., surrogate variables that serve for unobservable variables).
Specifically, VTD leverages observed proxies to learn a hidden embedding that
reflects the true hidden confounders in the observational data. As such, our
VTD method does not rely on the "unconfoundedness" assumption. We test our VTD
method on both synthetic and real-world clinical data, and the results show
that our approach is effective when hidden confounding is the leading bias
compared to other existing models
R-miss-tastic: a unified platform for missing values methods and workflows
Missing values are unavoidable when working with data. Their occurrence is
exacerbated as more data from different sources become available. However, most
statistical models and visualization methods require complete data, and
improper handling of missing data results in information loss, or biased
analyses. Since the seminal work of Rubin (1976), there has been a burgeoning
literature on missing values with heterogeneous aims and motivations. This has
resulted in the development of various methods, formalizations, and tools
(including a large number of R packages and Python modules). However, for
practitioners, it remains challenging to decide which method is most suited for
their problem, partially because handling missing data is still not a topic
systematically covered in statistics or data science curricula.
To help address this challenge, we have launched a unified platform:
"R-miss-tastic", which aims to provide an overview of standard missing values
problems, methods, how to handle them in analyses, and relevant implementations
of methodologies. In the same perspective, we have also developed several
pipelines in R and Python to allow for a hands-on illustration of how to handle
missing values in various statistical tasks such as estimation and prediction,
while ensuring reproducibility of the analyses. This will hopefully also
provide some guidance on deciding which method to choose for a specific problem
and data. The objective of this work is not only to comprehensively organize
materials, but also to create standardized analysis workflows, and to provide a
common ground for discussions among the community. This platform is thus suited
for beginners, students, more advanced analysts and researchers.Comment: 38 pages, 9 figure
Recommended from our members
Improving Evaluation Methods for Causal Modeling
Causal modeling is central to many areas of artificial intelligence, including complex reasoning, planning, knowledge-base construction, robotics, explanation, and fairness. Active communities of researchers in machine learning, statistics, social science, and other fields develop and enhance algorithms that learn causal models from data, and this work has produced a series of impressive technical advances. However, evaluation techniques for causal modeling algorithms have remained somewhat primitive, limiting what we can learn from the experimental studies of algorithm performance, constraining the types of algorithms and model representations that researchers consider, and creating a gap between theory and practice. We argue for expanding the standard techniques for evaluating algorithms that construct causal models. Specifically, we argue for the addition of evaluation techniques that use interventional measures rather than structural or observational measures, and that evaluate with those measures on empirical data rather than synthetic data. We survey the current practice in evaluation and show that, while the evaluation techniques we advocate are rarely used in practice, they are feasible and produce substantially different results than using structural measures and synthetic data. We also provide a protocol for generating observational-style data sets from experimental data, allowing the creation of a large number of data sets suitable for evaluation of causal modeling algorithms. We then perform a large-scale evaluation of seven causal modeling methods over 37 data sets, drawn from randomized controlled trials, as well as simulators, real-world computational systems, and observational data sets augmented with a synthetic response variable. We find notable performance differences when comparing across data from different sources. This difference demonstrates the importance of using data from a variety of sources when evaluating any causal modeling methods