10,314 research outputs found
Epistemic risk-sensitive reinforcement learning
We develop a framework for risk-sensitive behaviour in reinforcement learning (RL) due to uncertainty about the environment dynamics by leveraging utility-based definitions of risk sensitivity. In this framework, the preference for risk can be tuned by varying the utility function, for which we develop dynamic programming (DP) and policy gradient-based algorithms. The risk-averse behavior is compared with the behavior of risk-neutral policy in environments with epistemic risk
Epistemic Risk-Sensitive Reinforcement Learning
We develop a framework for interacting with uncertain environments in
reinforcement learning (RL) by leveraging preferences in the form of utility
functions. We claim that there is value in considering different risk measures
during learning. In this framework, the preference for risk can be tuned by
variation of the parameter and the resulting behavior can be
risk-averse, risk-neutral or risk-taking depending on the parameter choice. We
evaluate our framework for learning problems with model uncertainty. We measure
and control for \emph{epistemic} risk using dynamic programming (DP) and policy
gradient-based algorithms. The risk-averse behavior is then compared with the
behavior of the optimal risk-neutral policy in environments with epistemic
risk.Comment: 8 pages, 2 figure
Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control
Uncertainty quantification is one of the central challenges for machine
learning in real-world applications. In reinforcement learning, an agent
confronts two kinds of uncertainty, called epistemic uncertainty and aleatoric
uncertainty. Disentangling and evaluating these uncertainties simultaneously
stands a chance of improving the agent's final performance, accelerating
training, and facilitating quality assurance after deployment. In this work, we
propose an uncertainty-aware reinforcement learning algorithm for continuous
control tasks that extends the Deep Deterministic Policy Gradient algorithm
(DDPG). It exploits epistemic uncertainty to accelerate exploration and
aleatoric uncertainty to learn a risk-sensitive policy. We conduct numerical
experiments showing that our variant of DDPG outperforms vanilla DDPG without
uncertainty estimation in benchmark tasks on robotic control and power-grid
optimization.Comment: 10 pages, 6 figures. Accepted to International Joint Conference on
Neural Networks (IJCNN 2022), July 18-23, Padua, Ital
One Risk to Rule Them All: Addressing Distributional Shift in Offline Reinforcement Learning via Risk-Aversion
Offline reinforcement learning (RL) is suitable for safety-critical domains
where online exploration is not feasible. In such domains, decision-making
should take into consideration the risk of catastrophic outcomes. In other
words, decision-making should be risk-averse. An additional challenge of
offline RL is avoiding distributional shift, i.e. ensuring that state-action
pairs visited by the policy remain near those in the dataset. Previous works on
risk in offline RL combine offline RL techniques (to avoid distributional
shift), with risk-sensitive RL algorithms (to achieve risk-aversion). In this
work, we propose risk-aversion as a mechanism to jointly address both of these
issues. We propose a model-based approach, and use an ensemble of models to
estimate epistemic uncertainty, in addition to aleatoric uncertainty. We train
a policy that is risk-averse, and avoids high uncertainty actions.
Risk-aversion to epistemic uncertainty prevents distributional shift, as areas
not covered by the dataset have high epistemic uncertainty. Risk-aversion to
aleatoric uncertainty discourages actions that are inherently risky due to
environment stochasticity. Thus, by only introducing risk-aversion, we avoid
distributional shift in addition to achieving risk-aversion to aleatoric risk.
Our algorithm, 1R2R, achieves strong performance on deterministic benchmarks,
and outperforms existing approaches for risk-sensitive objectives in stochastic
domains
- …