1,218 research outputs found
Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning
Many real-world domains require safe decision making in uncertain
environments. In this work, we introduce a deep reinforcement learning
framework for approaching this important problem. We consider a distribution
over transition models, and apply a risk-averse perspective towards model
uncertainty through the use of coherent distortion risk measures. We provide
robustness guarantees for this framework by showing it is equivalent to a
specific class of distributionally robust safe reinforcement learning problems.
Unlike existing approaches to robustness in deep reinforcement learning,
however, our formulation does not involve minimax optimization. This leads to
an efficient, model-free implementation of our approach that only requires
standard data collection from a single training environment. In experiments on
continuous control tasks with safety constraints, we demonstrate that our
framework produces robust performance and safety at deployment time across a
range of perturbed test environments.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS
2023
One Risk to Rule Them All: Addressing Distributional Shift in Offline Reinforcement Learning via Risk-Aversion
Offline reinforcement learning (RL) is suitable for safety-critical domains
where online exploration is not feasible. In such domains, decision-making
should take into consideration the risk of catastrophic outcomes. In other
words, decision-making should be risk-averse. An additional challenge of
offline RL is avoiding distributional shift, i.e. ensuring that state-action
pairs visited by the policy remain near those in the dataset. Previous works on
risk in offline RL combine offline RL techniques (to avoid distributional
shift), with risk-sensitive RL algorithms (to achieve risk-aversion). In this
work, we propose risk-aversion as a mechanism to jointly address both of these
issues. We propose a model-based approach, and use an ensemble of models to
estimate epistemic uncertainty, in addition to aleatoric uncertainty. We train
a policy that is risk-averse, and avoids high uncertainty actions.
Risk-aversion to epistemic uncertainty prevents distributional shift, as areas
not covered by the dataset have high epistemic uncertainty. Risk-aversion to
aleatoric uncertainty discourages actions that are inherently risky due to
environment stochasticity. Thus, by only introducing risk-aversion, we avoid
distributional shift in addition to achieving risk-aversion to aleatoric risk.
Our algorithm, 1R2R, achieves strong performance on deterministic benchmarks,
and outperforms existing approaches for risk-sensitive objectives in stochastic
domains
SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
Although Reinforcement Learning (RL) is effective for sequential
decision-making problems under uncertainty, it still fails to thrive in
real-world systems where risk or safety is a binding constraint. In this paper,
we formulate the RL problem with safety constraints as a non-zero-sum game.
While deployed with maximum entropy RL, this formulation leads to a safe
adversarially guided soft actor-critic framework, called SAAC. In SAAC, the
adversary aims to break the safety constraint while the RL agent aims to
maximize the constrained value function given the adversary's policy. The
safety constraint on the agent's value function manifests only as a repulsion
term between the agent's and the adversary's policies. Unlike previous
approaches, SAAC can address different safety criteria such as safe
exploration, mean-variance risk sensitivity, and CVaR-like coherent risk
sensitivity. We illustrate the design of the adversary for these constraints.
Then, in each of these variations, we show the agent differentiates itself from
the adversary's unsafe actions in addition to learning to solve the task.
Finally, for challenging continuous control tasks, we demonstrate that SAAC
achieves faster convergence, better efficiency, and fewer failures to satisfy
the safety constraints than risk-averse distributional RL and risk-neutral soft
actor-critic algorithms
- …