30,953 research outputs found
A Survey of Reinforcement Learning Informed by Natural Language
To be successful in real-world tasks, Reinforcement Learning (RL) needs to
exploit the compositional, relational, and hierarchical structure of the world,
and learn to transfer it to the task at hand. Recent advances in representation
learning for language make it possible to build models that acquire world
knowledge from text corpora and integrate this knowledge into downstream
decision making problems. We thus argue that the time is right to investigate a
tight integration of natural language understanding into RL in particular. We
survey the state of the field, including work on instruction following, text
games, and learning from textual domain knowledge. Finally, we call for the
development of new environments as well as further investigation into the
potential uses of recent Natural Language Processing (NLP) techniques for such
tasks.Comment: Published at IJCAI'1
Hierarchical Reinforcement Learning with Deep Nested Agents
Deep hierarchical reinforcement learning has gained a lot of attention in
recent years due to its ability to produce state-of-the-art results in
challenging environments where non-hierarchical frameworks fail to learn useful
policies. However, as problem domains become more complex, deep hierarchical
reinforcement learning can become inefficient, leading to longer convergence
times and poor performance. We introduce the Deep Nested Agent framework, which
is a variant of deep hierarchical reinforcement learning where information from
the main agent is propagated to the low level agent by incorporating
this information into the nested agent's state. We demonstrate the
effectiveness and performance of the Deep Nested Agent framework by applying it
to three scenarios in Minecraft with comparisons to a deep non-hierarchical
single agent framework, as well as, a deep hierarchical framework.Comment: 11 page
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
This paper addresses the general problem of reinforcement learning (RL) in
partially observable environments. In 2013, our large RL recurrent neural
networks (RNNs) learned from scratch to drive simulated cars from
high-dimensional video input. However, real brains are more powerful in many
ways. In particular, they learn a predictive model of their initially unknown
environment, and somehow use it for abstract (e.g., hierarchical) planning and
reasoning. Guided by algorithmic information theory, we describe RNN-based AIs
(RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending
sequences of tasks, some of them provided by the user, others invented by the
RNNAI itself in a curious, playful fashion, to improve its RNN-based world
model. Unlike our previous model-building RNN-based RL machines dating back to
1990, the RNNAI learns to actively query its model for abstract reasoning and
planning and decision making, essentially "learning to think." The basic ideas
of this report can be applied to many other cases where one RNN-like system
exploits the algorithmic information content of another. They are taken from a
grant proposal submitted in Fall 2014, and also explain concepts such as
"mirror neurons." Experimental results will be described in separate papers.Comment: 36 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:1404.782
Deep Successor Reinforcement Learning
Learning robust value functions given raw observations and rewards is now
possible with model-free and model-based deep reinforcement learning
algorithms. There is a third alternative, called Successor Representations
(SR), which decomposes the value function into two components -- a reward
predictor and a successor map. The successor map represents the expected future
state occupancy from any given state and the reward predictor maps states to
scalar rewards. The value function of a state can be computed as the inner
product between the successor map and the reward weights. In this paper, we
present DSR, which generalizes SR within an end-to-end deep reinforcement
learning framework. DSR has several appealing properties including: increased
sensitivity to distal reward changes due to factorization of reward and world
dynamics, and the ability to extract bottleneck states (subgoals) given
successor maps trained under a random policy. We show the efficacy of our
approach on two diverse environments given raw pixel observations -- simple
grid-world domains (MazeBase) and the Doom game engine.Comment: 10 pages, 6 figure
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
The problem of sparse rewards is one of the hardest challenges in
contemporary reinforcement learning. Hierarchical reinforcement learning (HRL)
tackles this problem by using a set of temporally-extended actions, or options,
each of which has its own subgoal. These subgoals are normally handcrafted for
specific tasks. Here, though, we introduce a generic class of subgoals with
broad applicability in the visual domain. Underlying our approach (in common
with work using "auxiliary tasks") is the hypothesis that the ability to
control aspects of the environment is an inherently useful skill to have. We
incorporate such subgoals in an end-to-end hierarchical reinforcement learning
system and test two variants of our algorithm on a number of games from the
Atari suite. We highlight the advantage of our approach in one of the hardest
games -- Montezuma's revenge -- for which the ability to handle sparse rewards
is key. Our agent learns several times faster than the current state-of-the-art
HRL agent in this game, reaching a similar level of performance. UPDATE
22/11/17: We found that a standard A3C agent with a simple shaped reward, i.e.
extrinsic reward + feature control intrinsic reward, has comparable performance
to our agent in Montezuma Revenge. In light of the new experiments performed,
the advantage of our HRL approach can be attributed more to its ability to
learn useful features from intrinsic rewards rather than its ability to explore
and reuse abstracted skills with hierarchical components. This has led us to a
new conclusion about the result
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Common approaches to Reinforcement Learning (RL) are seriously challenged by
large-scale applications involving huge state spaces and sparse delayed reward
feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address
this scalability issue by learning action selection policies at multiple levels
of temporal abstraction. Abstraction can be had by identifying a relatively
small set of states that are likely to be useful as subgoals, in concert with
the learning of corresponding skill policies to achieve those subgoals. Many
approaches to subgoal discovery in HRL depend on the analysis of a model of the
environment, but the need to learn such a model introduces its own problems of
scale. Once subgoals are identified, skills may be learned through intrinsic
motivation, introducing an internal reward signal marking subgoal attainment.
In this paper, we present a novel model-free method for subgoal discovery using
incremental unsupervised learning over a small memory of the most recent
experiences (trajectories) of the agent. When combined with an intrinsic
motivation learning mechanism, this method learns both subgoals and skills,
based on experiences in the environment. Thus, we offer an original approach to
HRL that does not require the acquisition of a model of the environment,
suitable for large-scale applications. We demonstrate the efficiency of our
method on two RL problems with sparse delayed feedback: a variant of the rooms
environment and the first screen of the ATARI 2600 Montezuma's Revenge game
Crossmodal Attentive Skill Learner
This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated
with the recently-introduced Asynchronous Advantage Option-Critic (A2OC)
architecture [Harb et al., 2017] to enable hierarchical reinforcement learning
across multiple sensory inputs. We provide concrete examples where the approach
not only improves performance in a single task, but accelerates transfer to new
tasks. We demonstrate the attention mechanism anticipates and identifies useful
latent features, while filtering irrelevant sensor modalities during execution.
We modify the Arcade Learning Environment [Bellemare et al., 2013] to support
audio queries, and conduct evaluations of crossmodal learning in the Atari 2600
game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017],
we open-source a fast hybrid CPU-GPU implementation of CASL.Comment: International Conference on Autonomous Agents and Multiagent Systems
(AAMAS) 2018, NIPS 2017 Deep Reinforcement Learning Symposiu
Generalization Tower Network: A Novel Deep Neural Network Architecture for Multi-Task Learning
Deep learning (DL) advances state-of-the-art reinforcement learning (RL), by
incorporating deep neural networks in learning representations from the input
to RL. However, the conventional deep neural network architecture is limited in
learning representations for multi-task RL (MT-RL), as multiple tasks can refer
to different kinds of representations. In this paper, we thus propose a novel
deep neural network architecture, namely generalization tower network (GTN),
which can achieve MT-RL within a single learned model. Specifically, the
architecture of GTN is composed of both horizontal and vertical streams. In our
GTN architecture, horizontal streams are used to learn representation shared in
similar tasks. In contrast, the vertical streams are introduced to be more
suitable for handling diverse tasks, which encodes hierarchical shared
knowledge of these tasks. The effectiveness of the introduced vertical stream
is validated by experimental results. Experimental results further verify that
our GTN architecture is able to advance the state-of-the-art MT-RL, via being
tested on 51 Atari games
Learning to Compose Skills
We present a differentiable framework capable of learning a wide variety of
compositions of simple policies that we call skills. By recursively composing
skills with themselves, we can create hierarchies that display complex
behavior. Skill networks are trained to generate skill-state embeddings that
are provided as inputs to a trainable composition function, which in turn
outputs a policy for the overall task. Our experiments on an environment
consisting of multiple collect and evade tasks show that this architecture is
able to quickly build complex skills from simpler ones. Furthermore, the
learned composition function displays some transfer to unseen combinations of
skills, allowing for zero-shot generalizations.Comment: Presented at NIPS 2017 Deep RL Symposiu
Some Considerations on Learning to Explore via Meta-Reinforcement Learning
We consider the problem of exploration in meta reinforcement learning. Two
new meta reinforcement learning algorithms are suggested: E-MAML and
E-. Results are presented on a novel environment we call `Krazy
World' and a set of maze environments. We show E-MAML and E-
deliver better performance on tasks where exploration is important
- …