17,731 research outputs found
Falsification-Based Robust Adversarial Reinforcement Learning
Reinforcement learning (RL) has achieved tremendous progress in solving
various sequential decision-making problems, e.g., control tasks in robotics.
However, RL methods often fail to generalize to safety-critical scenarios since
policies are overfitted to training environments. Previously, robust
adversarial reinforcement learning (RARL) was proposed to train an adversarial
network that applies disturbances to a system, which improves robustness in
test scenarios. A drawback of neural-network-based adversaries is that
integrating system requirements without handcrafting sophisticated reward
signals is difficult. Safety falsification methods allow one to find a set of
initial conditions as well as an input sequence, such that the system violates
a given property formulated in temporal logic. In this paper, we propose
falsification-based RARL (FRARL), the first generic framework for integrating
temporal-logic falsification in adversarial learning to improve policy
robustness. With falsification method, we do not need to construct an extra
reward function for the adversary. We evaluate our approach on a braking
assistance system and an adaptive cruise control system of autonomous vehicles.
Experiments show that policies trained with a falsification-based adversary
generalize better and show less violation of the safety specification in test
scenarios than the ones trained without an adversary or with an adversarial
network.Comment: 11 pages, 3 figure
Robust Reinforcement Learning via Adversarial Kernel Approximation
Robust Markov Decision Processes (RMDPs) provide a framework for sequential
decision-making that is robust to perturbations on the transition kernel.
However, robust reinforcement learning (RL) approaches in RMDPs do not scale
well to realistic online settings with high-dimensional domains. By
characterizing the adversarial kernel in RMDPs, we propose a novel approach for
online robust RL that approximates the adversarial kernel and uses a standard
(non-robust) RL algorithm to learn a robust policy. Notably, our approach can
be applied on top of any underlying RL algorithm, enabling easy scaling to
high-dimensional domains. Experiments in classic control tasks, MinAtar and
DeepMind Control Suite demonstrate the effectiveness and the applicability of
our method
- …