9,298 research outputs found
Generative Temporal Models with Spatial Memory for Partially Observed Environments
In model-based reinforcement learning, generative and temporal models of
environments can be leveraged to boost agent performance, either by tuning the
agent's representations during training or via use as part of an explicit
planning mechanism. However, their application in practice has been limited to
simplistic environments, due to the difficulty of training such models in
larger, potentially partially-observed and 3D environments. In this work we
introduce a novel action-conditioned generative model of such challenging
environments. The model features a non-parametric spatial memory system in
which we store learned, disentangled representations of the environment.
Low-dimensional spatial updates are computed using a state-space model that
makes use of knowledge on the prior dynamics of the moving agent, and
high-dimensional visual observations are modelled with a Variational
Auto-Encoder. The result is a scalable architecture capable of performing
coherent predictions over hundreds of time steps across a range of partially
observed 2D and 3D environments.Comment: ICML 201
A Deep Hierarchical Approach to Lifelong Learning in Minecraft
We propose a lifelong learning system that has the ability to reuse and
transfer knowledge from one task to another while efficiently retaining the
previously learned knowledge-base. Knowledge is transferred by learning
reusable skills to solve tasks in Minecraft, a popular video game which is an
unsolved and high-dimensional lifelong learning problem. These reusable skills,
which we refer to as Deep Skill Networks, are then incorporated into our novel
Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using
two techniques: (1) a deep skill array and (2) skill distillation, our novel
variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill
distillation enables the HDRLN to efficiently retain knowledge and therefore
scale in lifelong learning, by accumulating knowledge and encapsulating
multiple reusable skills into a single distilled network. The H-DRLN exhibits
superior performance and lower learning sample complexity compared to the
regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft
- …