46 research outputs found
Causal Deep Reinforcement Learning using Observational Data
Deep reinforcement learning (DRL) requires the collection of plenty of
interventional data, which is sometimes expensive and even unethical in the
real world, such as in the autonomous driving and the medical field. Offline
reinforcement learning promises to alleviate this issue by exploiting the vast
amount of observational data available in the real world. However,
observational data may mislead the learning agent to undesirable outcomes if
the behavior policy that generates the data depends on unobserved random
variables (i.e., confounders). In this paper, we propose two deconfounding
methods in DRL to address this problem. The methods first calculate the
importance degree of different samples based on the causal inference technique,
and then adjust the impact of different samples on the loss function by
reweighting or resampling the offline dataset to ensure its unbiasedness. These
deconfounding methods can be flexibly combined with the existing model-free DRL
algorithms such as soft actor-critic and deep Q-learning, provided that a weak
condition can be satisfied by the loss functions of these algorithms. We prove
the effectiveness of our deconfounding methods and validate them
experimentally
A Perspective on Individualized Treatment Effects Estimation from Time-series Health Data
The burden of diseases is rising worldwide, with unequal treatment efficacy
for patient populations that are underrepresented in clinical trials.
Healthcare, however, is driven by the average population effect of medical
treatments and, therefore, operates in a "one-size-fits-all" approach, not
necessarily what best fits each patient. These facts suggest a pressing need
for methodologies to study individualized treatment effects (ITE) to drive
personalized treatment. Despite the increased interest in
machine-learning-driven ITE estimation models, the vast majority focus on
tabular data with limited review and understanding of methodologies proposed
for time-series electronic health records (EHRs). To this end, this work
provides an overview of ITE works for time-series data and insights into future
research. The work summarizes the latest work in the literature and reviews it
in light of theoretical assumptions, types of treatment settings, and
computational frameworks. Furthermore, this work discusses challenges and
future research directions for ITEs in a time-series setting. We hope this work
opens new directions and serves as a resource for understanding one of the
exciting yet under-studied research areas
Causal Reinforcement Learning using Observational and Interventional Data
Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interacting with the environment (observational data). A key ingredient, that makes this situation non-trivial, is that we allow the observed agent to interact with the environment based on hidden information, which is not observed by the learning agent. We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model ? And can we expect the offline experiences to improve the agent's performances ? To answer these questions, we import ideas from the well-established causal framework of do-calculus, and we express model-based reinforcement learning as a causal inference problem. Then, we propose a general yet simple methodology for leveraging offline data during learning. In a nutshell, the method relies on learning a latent-based causal transition model that explains both the interventional and observational regimes, and then using the recovered latent variable to infer the standard POMDP transition model via deconfounding. We prove our method is correct and efficient in the sense that it attains better generalization guarantees due to the offline data (in the asymptotic case), and we illustrate its effectiveness empirically on synthetic toy problems. Our contribution aims at bridging the gap between the fields of reinforcement learning and causality
Causal Reinforcement Learning: A Survey
Reinforcement learning is an essential paradigm for solving sequential
decision problems under uncertainty. Despite many remarkable achievements in
recent decades, applying reinforcement learning methods in the real world
remains challenging. One of the main obstacles is that reinforcement learning
agents lack a fundamental understanding of the world and must therefore learn
from scratch through numerous trial-and-error interactions. They may also face
challenges in providing explanations for their decisions and generalizing the
acquired knowledge. Causality, however, offers a notable advantage as it can
formalize knowledge in a systematic manner and leverage invariance for
effective knowledge transfer. This has led to the emergence of causal
reinforcement learning, a subfield of reinforcement learning that seeks to
enhance existing algorithms by incorporating causal relationships into the
learning process. In this survey, we comprehensively review the literature on
causal reinforcement learning. We first introduce the basic concepts of
causality and reinforcement learning, and then explain how causality can
address core challenges in non-causal reinforcement learning. We categorize and
systematically review existing causal reinforcement learning approaches based
on their target problems and methodologies. Finally, we outline open issues and
future directions in this emerging field.Comment: 48 pages, 10 figure
A Survey on Causal Reinforcement Learning
While Reinforcement Learning (RL) achieves tremendous success in sequential
decision-making problems of many domains, it still faces key challenges of data
inefficiency and the lack of interpretability. Interestingly, many researchers
have leveraged insights from the causality literature recently, bringing forth
flourishing works to unify the merits of causality and address well the
challenges from RL. As such, it is of great necessity and significance to
collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL
methods, and investigate the potential functionality from causality toward RL.
In particular, we divide existing CRL approaches into two categories according
to whether their causality-based information is given in advance or not. We
further analyze each category in terms of the formalization of different
models, ranging from the Markov Decision Process (MDP), Partially Observed
Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment
Regime (DTR). Moreover, we summarize the evaluation matrices and open sources
while we discuss emerging applications, along with promising prospects for the
future development of CRL.Comment: 29 pages, 20 figure