18,513 research outputs found
EMI: Exploration with Mutual Information
Reinforcement learning algorithms struggle when the reward signal is very
sparse. In these cases, naive random exploration methods essentially rely on a
random walk to stumble onto a rewarding state. Recent works utilize intrinsic
motivation to guide the exploration via generative models, predictive forward
models, or discriminative modeling of novelty. We propose EMI, which is an
exploration method that constructs embedding representation of states and
actions that does not rely on generative decoding of the full observation but
extracts predictive signals that can be used to guide exploration based on
forward prediction in the representation space. Our experiments show
competitive results on challenging locomotion tasks with continuous control and
on image-based exploration tasks with discrete actions on Atari. The source
code is available at https://github.com/snu-mllab/EMI .Comment: Accepted and to appear at ICML 201
Automatic Curriculum Learning For Deep RL: A Short Survey
Automatic Curriculum Learning (ACL) has become a cornerstone of recent
successes in Deep Reinforcement Learning (DRL).These methods shape the learning
trajectories of agents by challenging them with tasks adapted to their
capacities. In recent years, they have been used to improve sample efficiency
and asymptotic performance, to organize exploration, to encourage
generalization or to solve sparse reward problems, among others. The ambition
of this work is dual: 1) to present a compact and accessible introduction to
the Automatic Curriculum Learning literature and 2) to draw a bigger picture of
the current state of the art in ACL to encourage the cross-breeding of existing
concepts and the emergence of new ideas.Comment: Accepted at IJCAI202
VIME: Variational Information Maximizing Exploration
Scalable and effective exploration remains a key challenge in reinforcement
learning (RL). While there are methods with optimality guarantees in the
setting of discrete state and action spaces, these methods cannot be applied in
high-dimensional deep RL scenarios. As such, most contemporary RL relies on
simple heuristics such as epsilon-greedy exploration or adding Gaussian noise
to the controls. This paper introduces Variational Information Maximizing
Exploration (VIME), an exploration strategy based on maximization of
information gain about the agent's belief of environment dynamics. We propose a
practical implementation, using variational inference in Bayesian neural
networks which efficiently handles continuous state and action spaces. VIME
modifies the MDP reward function, and can be applied with several different
underlying RL algorithms. We demonstrate that VIME achieves significantly
better performance compared to heuristic exploration methods across a variety
of continuous control tasks and algorithms, including tasks with very sparse
rewards.Comment: Published in Advances in Neural Information Processing Systems 29
(NIPS), pages 1109-111
- âŠ