20,553 research outputs found
Efficient Learning and Planning with Compressed Predictive States
Predictive state representations (PSRs) offer an expressive framework for
modelling partially observable systems. By compactly representing systems as
functions of observable quantities, the PSR learning approach avoids using
local-minima prone expectation-maximization and instead employs a globally
optimal moment-based algorithm. Moreover, since PSRs do not require a
predetermined latent state structure as an input, they offer an attractive
framework for model-based reinforcement learning when agents must plan without
a priori access to a system model. Unfortunately, the expressiveness of PSRs
comes with significant computational cost, and this cost is a major factor
inhibiting the use of PSRs in applications. In order to alleviate this
shortcoming, we introduce the notion of compressed PSRs (CPSRs). The CPSR
learning approach combines recent advancements in dimensionality reduction,
incremental matrix decomposition, and compressed sensing. We show how this
approach provides a principled avenue for learning accurate approximations of
PSRs, drastically reducing the computational costs associated with learning
while also providing effective regularization. Going further, we propose a
planning framework which exploits these learned models. And we show that this
approach facilitates model-learning and planning in large complex partially
observable domains, a task that is infeasible without the principled use of
compression.Comment: 45 pages, 10 figures, submitted to the Journal of Machine Learning
Researc
Temporal-Difference Networks for Dynamical Systems with Continuous Observations and Actions
Temporal-difference (TD) networks are a class of predictive state
representations that use well-established TD methods to learn models of
partially observable dynamical systems. Previous research with TD networks has
dealt only with dynamical systems with finite sets of observations and actions.
We present an algorithm for learning TD network representations of dynamical
systems with continuous observations and actions. Our results show that the
algorithm is capable of learning accurate and robust models of several noisy
continuous dynamical systems. The algorithm presented here is the first fully
incremental method for learning a predictive representation of a continuous
dynamical system.Comment: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty
in Artificial Intelligence (UAI2009
COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration
Data efficiency and robustness to task-irrelevant perturbations are
long-standing challenges for deep reinforcement learning algorithms. Here we
introduce a modular approach to addressing these challenges in a continuous
control environment, without using hand-crafted or supervised information. Our
Curious Object-Based seaRch Agent (COBRA) uses task-free intrinsically
motivated exploration and unsupervised learning to build object-based models of
its environment and action space. Subsequently, it can learn a variety of tasks
through model-based search in very few steps and excel on structured hold-out
tests of policy robustness
General Value Function Networks
State construction is important for learning in partially observable
environments. A general purpose strategy for state construction is to learn the
state update using a Recurrent Neural Network (RNN), which updates the internal
state using the current internal state and the most recent observation. This
internal state provides a summary of the observed sequence, to facilitate
accurate predictions and decision-making. At the same time, RNNs can be hard to
specify and train for non-experts. Training RNNs is notoriously tricky,
particularly as the common strategy to approximate gradients back in time,
called truncated Back-prop Through Time (BPTT), can be sensitive to the
truncation window. Further, domain-expertise---which can usually help constrain
the function class and so improve trainability---can be difficult to
incorporate into complex recurrent units used within RNNs. In this work, we
explore how to use multi-step predictions, as a simple and general approach to
inject prior knowledge, while retaining much of the generality and learning
power behind RNNs. In particular, we revisit the idea of using predictions to
construct state and ask: does constraining (parts of) the state to consist of
predictions about the future improve RNN trainability? We formulate a novel RNN
architecture, called a General Value Function Network (GVFN), where each
internal state component corresponds to a prediction about the future
represented as a value function. We first provide an objective for optimizing
GVFNs, and derive several algorithms to optimize this objective. We then show
that GVFNs are more robust to the truncation level, in many cases only
requiring one-step gradient updates
Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning
Temporal observations such as videos contain essential information about the
dynamics of the underlying scene, but they are often interleaved with
inessential, predictable details. One way of dealing with this problem is by
focusing on the most informative moments in a sequence. We propose a model that
learns to discover these important events and the times when they occur and
uses them to represent the full sequence. We do so using a hierarchical
Keyframe-Inpainter (KeyIn) model that first generates a video's keyframes and
then inpaints the rest by generating the frames at the intervening times. We
propose a fully differentiable formulation to efficiently learn this procedure.
We show that KeyIn finds informative keyframes in several datasets with
different dynamics and visual properties. KeyIn outperforms other recent
hierarchical predictive models for planning. For more details, please see the
project website at \url{https://sites.google.com/view/keyin}.Comment: Conference on Learning for Dynamics and Control, 2020. Website:
https://sites.google.com/view/keyin/hom
Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
Modern reinforcement learning algorithms reach super-human performance on
many board and video games, but they are sample inefficient, i.e. they
typically require significantly more playing experience than humans to reach an
equal performance level. To improve sample efficiency, an agent may build a
model of the environment and use planning methods to update its policy. In this
article we introduce Variational State Tabulation (VaST), which maps an
environment with a high-dimensional state space (e.g. the space of visual
inputs) to an abstract tabular model. Prioritized sweeping with small backups,
a highly efficient planning method, can then be used to update state-action
values. We show how VaST can rapidly learn to maximize reward in tasks like 3D
navigation and efficiently adapt to sudden changes in rewards or transition
probabilities.Comment: Accepted at ICML 2018; camera-ready versio
Learning to Make Predictions In Partially Observable Environments Without a Generative Model
When faced with the problem of learning a model of a high-dimensional
environment, a common approach is to limit the model to make only a restricted
set of predictions, thereby simplifying the learning problem. These partial
models may be directly useful for making decisions or may be combined together
to form a more complete, structured model. However, in partially observable
(non-Markov) environments, standard model-learning methods learn generative
models, i.e. models that provide a probability distribution over all possible
futures (such as POMDPs). It is not straightforward to restrict such models to
make only certain predictions, and doing so does not always simplify the
learning problem. In this paper we present prediction profile models:
non-generative partial models for partially observable systems that make only a
given set of predictions, and are therefore far simpler than generative models
in some cases. We formalize the problem of learning a prediction profile model
as a transformation of the original model-learning problem, and show
empirically that one can learn prediction profile models that make a small set
of important predictions even in systems that are too complex for standard
generative models
Entity Abstraction in Visual Model-Based Reinforcement Learning
This paper tests the hypothesis that modeling a scene in terms of entities
and their local interactions, as opposed to modeling the scene globally,
provides a significant benefit in generalizing to physical tasks in a
combinatorial space the learner has not encountered before. We present
object-centric perception, prediction, and planning (OP3), which to the best of
our knowledge is the first fully probabilistic entity-centric dynamic latent
variable framework for model-based reinforcement learning that acquires entity
representations from raw visual observations without supervision and uses them
to predict and plan. OP3 enforces entity-abstraction -- symmetric processing of
each entity representation with the same locally-scoped function -- which
enables it to scale to model different numbers and configurations of objects
from those in training. Our approach to solving the key technical challenge of
grounding these entity representations to actual objects in the environment is
to frame this variable binding problem as an inference problem, and we develop
an interactive inference algorithm that uses temporal continuity and
interactive feedback to bind information about object properties to the entity
variables. On block-stacking tasks, OP3 generalizes to novel block
configurations and more objects than observed during training, outperforming an
oracle model that assumes access to object supervision and achieving two to
three times better accuracy than a state-of-the-art video prediction model that
does not exhibit entity abstraction.Comment: Accepted at CoRL 201
Plan2Vec: Unsupervised Representation Learning by Latent Plans
In this paper we introduce plan2vec, an unsupervised representation learning
approach that is inspired by reinforcement learning. Plan2vec constructs a
weighted graph on an image dataset using near-neighbor distances, and then
extrapolates this local metric to a global embedding by distilling
path-integral over planned path. When applied to control, plan2vec offers a way
to learn goal-conditioned value estimates that are accurate over long horizons
that is both compute and sample efficient. We demonstrate the effectiveness of
plan2vec on one simulated and two challenging real-world image datasets.
Experimental results show that plan2vec successfully amortizes the planning
cost, enabling reactive planning that is linear in memory and computation
complexity rather than exhaustive over the entire state space.Comment: code available at https://geyang.github.io/plan2ve
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
This paper addresses the general problem of reinforcement learning (RL) in
partially observable environments. In 2013, our large RL recurrent neural
networks (RNNs) learned from scratch to drive simulated cars from
high-dimensional video input. However, real brains are more powerful in many
ways. In particular, they learn a predictive model of their initially unknown
environment, and somehow use it for abstract (e.g., hierarchical) planning and
reasoning. Guided by algorithmic information theory, we describe RNN-based AIs
(RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending
sequences of tasks, some of them provided by the user, others invented by the
RNNAI itself in a curious, playful fashion, to improve its RNN-based world
model. Unlike our previous model-building RNN-based RL machines dating back to
1990, the RNNAI learns to actively query its model for abstract reasoning and
planning and decision making, essentially "learning to think." The basic ideas
of this report can be applied to many other cases where one RNN-like system
exploits the algorithmic information content of another. They are taken from a
grant proposal submitted in Fall 2014, and also explain concepts such as
"mirror neurons." Experimental results will be described in separate papers.Comment: 36 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:1404.782
- …