3 research outputs found
Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning
In real-world tasks, reinforcement learning (RL) agents frequently encounter
situations that are not present during training time. To ensure reliable
performance, the RL agents need to exhibit robustness against worst-case
situations. The robust RL framework addresses this challenge via a worst-case
optimization between an agent and an adversary. Previous robust RL algorithms
are either sample inefficient, lack robustness guarantees, or do not scale to
large problems. We propose the Robust Hallucinated Upper-Confidence RL
(RH-UCRL) algorithm to provably solve this problem while attaining near-optimal
sample complexity guarantees. RH-UCRL is a model-based reinforcement learning
(MBRL) algorithm that effectively distinguishes between epistemic and aleatoric
uncertainty and efficiently explores both the agent and adversary decision
spaces during policy learning. We scale RH-UCRL to complex tasks via neural
networks ensemble models as well as neural network policies. Experimentally, we
demonstrate that RH-UCRL outperforms other robust deep RL algorithms in a
variety of adversarial environments
Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. Leveraging the powerful Stochastic Gradient Langevin Dynamics, we present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our algorithm consistently outperforms existing baselines, in terms of generalization across different training and testing conditions, on several MuJoCo environments. Our experiments also show that, even for objective functions that entirely ignore potential environmental shifts, our sampling approach remains highly robust in comparison to standard RL algorithms