7,376 research outputs found
Reinforcement Learning with Perturbed Rewards
Recent studies have shown that reinforcement learning (RL) models are
vulnerable in various noisy scenarios. For instance, the observed reward
channel is often subject to noise in practice (e.g., when rewards are collected
through sensors), and is therefore not credible. In addition, for applications
such as robotics, a deep reinforcement learning (DRL) algorithm can be
manipulated to produce arbitrary errors by receiving corrupted rewards. In this
paper, we consider noisy RL problems with perturbed rewards, which can be
approximated with a confusion matrix. We develop a robust RL framework that
enables agents to learn in noisy environments where only perturbed rewards are
observed. Our solution framework builds on existing RL/DRL algorithms and
firstly addresses the biased noisy reward setting without any assumptions on
the true distribution (e.g., zero-mean Gaussian noise as made in previous
works). The core ideas of our solution include estimating a reward confusion
matrix and defining a set of unbiased surrogate rewards. We prove the
convergence and sample complexity of our approach. Extensive experiments on
different DRL platforms show that trained policies based on our estimated
surrogate reward can achieve higher expected rewards, and converge faster than
existing baselines. For instance, the state-of-the-art PPO algorithm is able to
obtain 84.6% and 80.8% improvements on average score for five Atari games, with
error rates as 10% and 30% respectively.Comment: AAAI 2020 (Spotlight
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
AI systems are increasingly applied to complex tasks that involve interaction
with humans. During training, such systems are potentially dangerous, as they
haven't yet learned to avoid actions that could cause serious harm. How can an
AI system explore and learn without making a single mistake that harms humans
or otherwise causes serious damage? For model-free reinforcement learning,
having a human "in the loop" and ready to intervene is currently the only way
to prevent all catastrophes. We formalize human intervention for RL and show
how to reduce the human labor required by training a supervised learner to
imitate the human's intervention decisions. We evaluate this scheme on Atari
games, with a Deep RL agent being overseen by a human for four hours. When the
class of catastrophes is simple, we are able to prevent all catastrophes
without affecting the agent's learning (whereas an RL baseline fails due to
catastrophic forgetting). However, this scheme is less successful when
catastrophes are more complex: it reduces but does not eliminate catastrophes
and the supervised learner fails on adversarial examples found by the agent.
Extrapolating to more challenging environments, we show that our implementation
would not scale (due to the infeasible amount of human labor required). We
outline extensions of the scheme that are necessary if we are to train
model-free agents without a single catastrophe
CopyCAT: Taking Control of Neural Policies with Constant Attacks
We propose a new perspective on adversarial attacks against deep
reinforcement learning agents. Our main contribution is CopyCAT, a targeted
attack able to consistently lure an agent into following an outsider's policy.
It is pre-computed, therefore fast inferred, and could thus be usable in a
real-time scenario. We show its effectiveness on Atari 2600 games in the novel
read-only setting. In this setting, the adversary cannot directly modify the
agent's state -- its representation of the environment -- but can only attack
the agent's observation -- its perception of the environment. Directly
modifying the agent's state would require a write-access to the agent's inner
workings and we argue that this assumption is too strong in realistic settings.Comment: AAMAS 202
- …