84 research outputs found
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
Exploration in sparse reward environments remains one of the key challenges
of model-free reinforcement learning. Instead of solely relying on extrinsic
rewards provided by the environment, many state-of-the-art methods use
intrinsic rewards to encourage exploration. However, we show that existing
methods fall short in procedurally-generated environments where an agent is
unlikely to visit a state more than once. We propose a novel type of intrinsic
reward which encourages the agent to take actions that lead to significant
changes in its learned state representation. We evaluate our method on multiple
challenging procedurally-generated tasks in MiniGrid, as well as on tasks with
high-dimensional observations used in prior work. Our experiments demonstrate
that this approach is more sample efficient than existing exploration methods,
particularly for procedurally-generated MiniGrid environments. Furthermore, we
analyze the learned behavior as well as the intrinsic reward received by our
agent. In contrast to previous approaches, our intrinsic reward does not
diminish during the course of training and it rewards the agent substantially
more for interacting with objects that it can control
CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control
Intrinsic motivation is a promising exploration technique for solving
reinforcement learning tasks with sparse or absent extrinsic rewards. There
exist two technical challenges in implementing intrinsic motivation: 1) how to
design a proper intrinsic objective to facilitate efficient exploration; and 2)
how to combine the intrinsic objective with the extrinsic objective to help
find better solutions. In the current literature, the intrinsic objectives are
all designed in a task-agnostic manner and combined with the extrinsic
objective via simple addition (or used by itself for reward-free pre-training).
In this work, we show that these designs would fail in typical sparse-reward
continuous control tasks. To address the problem, we propose Constrained
Intrinsic Motivation (CIM) to leverage readily attainable task priors to
construct a constrained intrinsic objective, and at the same time, exploit the
Lagrangian method to adaptively balance the intrinsic and extrinsic objectives
via a simultaneous-maximization framework. We empirically show, on multiple
sparse-reward continuous control tasks, that our CIM approach achieves greatly
improved performance and sample efficiency over state-of-the-art methods.
Moreover, the key techniques of our CIM can also be plugged into existing
methods to boost their performances
Learning with AMIGo: Adversarially Motivated Intrinsic Goals
A key challenge for reinforcement learning (RL) consists of learning in
environments with sparse extrinsic rewards. In contrast to current RL methods,
humans are able to learn new skills with little or no reward by using various
forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating --
as form of meta-learning -- a goal-generating teacher that proposes
Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student"
policy in the absence of (or alongside) environment reward. Specifically,
through a simple but effective "constructively adversarial" objective, the
teacher learns to propose increasingly challenging -- yet achievable -- goals
that allow the student to learn general skills for acting in a new environment,
independent of the task to be solved. We show that our method generates a
natural curriculum of self-proposed goals which ultimately allows the agent to
solve challenging procedurally-generated tasks where other forms of intrinsic
motivation and state-of-the-art RL methods fail.Comment: 18 pages, 6 figures, published at The Ninth International Conference
on Learning Representations (2021
Generative models for sequential dynamics in active inference
A central theme of theoretical neurobiology is that most of our cognitive operations require processing of discrete sequences of items. This processing in turn emerges from continuous neuronal dynamics. Notable examples are sequences of words during linguistic communication or sequences of locations during navigation. In this perspective, we address the problem of sequential brain processing from the perspective of active inference, which inherits from a Helmholtzian view of the predictive (Bayesian) brain. Underneath the active inference lies a generative model; namely, a probabilistic description of how (observable) consequences are generated by (unobservable) causes. We show that one can account for many aspects of sequential brain processing by assuming the brain entails a generative model of the sensed world that comprises central pattern generators, narratives, or well-defined sequences. We provide examples in the domains of motor control (e.g., handwriting), perception (e.g., birdsong recognition) through to planning and understanding (e.g., language). The solutions to these problems include the use of sequences of attracting points to direct complex movements—and the move from continuous representations of auditory speech signals to the discrete words that generate those signals
A Computational Model of Learning Flexible Navigation in a Maze by Layout-Conforming Replay of Place Cells
Recent experimental observations have shown that the reactivation of
hippocampal place cells (PC) during sleep or immobility depicts trajectories
that can go around barriers and can flexibly adapt to a changing maze layout.
Such layout-conforming replay sheds a light on how the activity of place cells
supports the learning of flexible navigation of an animal in a dynamically
changing maze. However, existing computational models of replay fall short of
generating layout-conforming replay, restricting their usage to simple
environments, like linear tracks or open fields. In this paper, we propose a
computational model that generates layout-conforming replay and explains how
such replay drives the learning of flexible navigation in a maze. First, we
propose a Hebbian-like rule to learn the inter-PC synaptic strength during
exploring a maze. Then we use a continuous attractor network (CAN) with
feedback inhibition to model the interaction among place cells and hippocampal
interneurons. The activity bump of place cells drifts along a path in the maze,
which models layout-conforming replay. During replay in rest, the synaptic
strengths from place cells to striatal medium spiny neurons (MSN) are learned
by a novel dopamine-modulated three-factor rule to store place-reward
associations. During goal-directed navigation, the CAN periodically generates
replay trajectories from the animal's location for path planning, and the
trajectory leading to a maximal MSN activity is followed by the animal. We have
implemented our model into a high-fidelity virtual rat in the MuJoCo physics
simulator. Extensive experiments have demonstrated that its superior
flexibility during navigation in a maze is due to a continuous re-learning of
inter-PC and PC-MSN synaptic strength
- …