8,477 research outputs found
Planning from Images with Deep Latent Gaussian Process Dynamics
Planning is a powerful approach to control problems with known environment
dynamics. In unknown environments the agent needs to learn a model of the
system dynamics to make planning applicable. This is particularly challenging
when the underlying states are only indirectly observable through images. We
propose to learn a deep latent Gaussian process dynamics (DLGPD) model that
learns low-dimensional system dynamics from environment interactions with
visual observations. The method infers latent state representations from
observations using neural networks and models the system dynamics in the
learned latent space with Gaussian processes. All parts of the model can be
trained jointly by optimizing a lower bound on the likelihood of transitions in
image space. We evaluate the proposed approach on the pendulum swing-up task
while using the learned dynamics model for planning in latent space in order to
solve the control problem. We also demonstrate that our method can quickly
adapt a trained agent to changes in the system dynamics from just a few
rollouts. We compare our approach to a state-of-the-art purely deep learning
based method and demonstrate the advantages of combining Gaussian processes
with deep learning for data efficiency and transfer learning.Comment: Accepted for publication at the 2nd Annual Conference on Learning for
Dynamics and Control (L4DC) 2020, with supplementary material. First two
authors contributed equall
Learning Latent Dynamics for Planning from Pixels
Planning has been very successful for control tasks with known environment
dynamics. To leverage planning in unknown environments, the agent needs to
learn the dynamics from interactions with the world. However, learning dynamics
models that are accurate enough for planning has been a long-standing
challenge, especially in image-based domains. We propose the Deep Planning
Network (PlaNet), a purely model-based agent that learns the environment
dynamics from images and chooses actions through fast online planning in latent
space. To achieve high performance, the dynamics model must accurately predict
the rewards ahead for multiple time steps. We approach this using a latent
dynamics model with both deterministic and stochastic transition components.
Moreover, we propose a multi-step variational inference objective that we name
latent overshooting. Using only pixel observations, our agent solves continuous
control tasks with contact dynamics, partial observability, and sparse rewards,
which exceed the difficulty of tasks that were previously solved by planning
with learned models. PlaNet uses substantially fewer episodes and reaches final
performance close to and sometimes higher than strong model-free algorithms.Comment: 20 pages, 12 figures, 1 tabl
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
We introduce Embed to Control (E2C), a method for model learning and control
of non-linear dynamical systems from raw pixel images. E2C consists of a deep
generative model, belonging to the family of variational autoencoders, that
learns to generate image trajectories from a latent space in which the dynamics
is constrained to be locally linear. Our model is derived directly from an
optimal control formulation in latent space, supports long-term prediction of
image sequences and exhibits strong performance on a variety of complex control
problems.Comment: Final NIPS versio
Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning
In this paper we study how to learn stochastic, multimodal transition
dynamics in reinforcement learning (RL) tasks. We focus on evaluating
transition function estimation, while we defer planning over this model to
future work. Stochasticity is a fundamental property of many task environments.
However, discriminative function approximators have difficulty estimating
multimodal stochasticity. In contrast, deep generative models do capture
complex high-dimensional outcome distributions. First we discuss why, amongst
such models, conditional variational inference (VI) is theoretically most
appealing for model-based RL. Subsequently, we compare different VI models on
their ability to learn complex stochasticity on simulated functions, as well as
on a typical RL gridworld with multimodal dynamics. Results show VI
successfully predicts multimodal outcomes, but also robustly ignores these for
deterministic parts of the transition dynamics. In summary, we show a robust
method to learn multimodal transitions using function approximation, which is a
key preliminary for model-based RL in stochastic domains.Comment: Scaling Up Reinforcement Learning (SURL) Workshop @ European Machine
Learning Conference (ECML
Robust Locally-Linear Controllable Embedding
Embed-to-control (E2C) is a model for solving high-dimensional optimal
control problems by combining variational auto-encoders with locally-optimal
controllers. However, the E2C model suffers from two major drawbacks: 1) its
objective function does not correspond to the likelihood of the data sequence
and 2) the variational encoder used for embedding typically has large
variational approximation error, especially when there is noise in the system
dynamics. In this paper, we present a new model for learning robust
locally-linear controllable embedding (RCE). Our model directly estimates the
predictive conditional density of the future observation given the current one,
while introducing the bottleneck between the current and future observations.
Although the bottleneck provides a natural embedding candidate for control, our
RCE model introduces additional specific structures in the generative graphical
model so that the model dynamics can be robustly linearized. We also propose a
principled variational approximation of the embedding posterior that takes the
future observation into account, and thus, makes the variational approximation
more robust against the noise. Experimental results show that RCE outperforms
the E2C model, and does so significantly when the underlying dynamics is noisy.Comment: 13 page
Learning Plannable Representations with Causal InfoGAN
In recent years, deep generative models have been shown to 'imagine'
convincing high-dimensional observations such as images, audio, and even video,
learning directly from raw data. In this work, we ask how to imagine
goal-directed visual plans -- a plausible sequence of observations that
transition a dynamical system from its current configuration to a desired goal
state, which can later be used as a reference trajectory for control. We focus
on systems with high-dimensional observations, such as images, and propose an
approach that naturally combines representation learning and planning. Our
framework learns a generative model of sequential observations, where the
generative process is induced by a transition in a low-dimensional planning
model, and an additional noise. By maximizing the mutual information between
the generated observations and the transition in the planning model, we obtain
a low-dimensional representation that best explains the causal nature of the
data. We structure the planning model to be compatible with efficient planning
algorithms, and we propose several such models based on either discrete or
continuous states. Finally, to generate a visual plan, we project the current
and goal observations onto their respective states in the planning model, plan
a trajectory, and then use the generative model to transform the trajectory to
a sequence of observations. We demonstrate our method on imagining plausible
visual plans of rope manipulation.Comment: ICML / IJCAI / AAMAS 2018 Workshop on Planning and Learning (PAL-18
Adaptive Path-Integral Autoencoder: Representation Learning and Planning for Dynamical Systems
We present a representation learning algorithm that learns a low-dimensional
latent dynamical system from high-dimensional \textit{sequential} raw data,
e.g., video. The framework builds upon recent advances in amortized inference
methods that use both an inference network and a refinement procedure to output
samples from a variational distribution given an observation sequence, and
takes advantage of the duality between control and inference to approximately
solve the intractable inference problem using the path integral control
approach. The learned dynamical model can be used to predict and plan the
future states; we also present the efficient planning method that exploits the
learned low-dimensional latent dynamics. Numerical experiments show that the
proposed path-integral control based variational inference method leads to
tighter lower bounds in statistical model learning of sequential data. The
supplementary video: https://youtu.be/xCp35crUoLQComment: Neural Information Processing Systems (NeurIPS) 201
VMAV-C: A Deep Attention-based Reinforcement Learning Algorithm for Model-based Control
Recent breakthroughs in Go play and strategic games have witnessed the great
potential of reinforcement learning in intelligently scheduling in uncertain
environment, but some bottlenecks are also encountered when we generalize this
paradigm to universal complex tasks. Among them, the low efficiency of data
utilization in model-free reinforcement algorithms is of great concern. In
contrast, the model-based reinforcement learning algorithms can reveal
underlying dynamics in learning environments and seldom suffer the data
utilization problem. To address the problem, a model-based reinforcement
learning algorithm with attention mechanism embedded is proposed as an
extension of World Models in this paper. We learn the environment model through
Mixture Density Network Recurrent Network(MDN-RNN) for agents to interact, with
combinations of variational auto-encoder(VAE) and attention incorporated in
state value estimates during the process of learning policy. In this way, agent
can learn optimal policies through less interactions with actual environment,
and final experiments demonstrate the effectiveness of our model in control
problem
Entity Abstraction in Visual Model-Based Reinforcement Learning
This paper tests the hypothesis that modeling a scene in terms of entities
and their local interactions, as opposed to modeling the scene globally,
provides a significant benefit in generalizing to physical tasks in a
combinatorial space the learner has not encountered before. We present
object-centric perception, prediction, and planning (OP3), which to the best of
our knowledge is the first fully probabilistic entity-centric dynamic latent
variable framework for model-based reinforcement learning that acquires entity
representations from raw visual observations without supervision and uses them
to predict and plan. OP3 enforces entity-abstraction -- symmetric processing of
each entity representation with the same locally-scoped function -- which
enables it to scale to model different numbers and configurations of objects
from those in training. Our approach to solving the key technical challenge of
grounding these entity representations to actual objects in the environment is
to frame this variable binding problem as an inference problem, and we develop
an interactive inference algorithm that uses temporal continuity and
interactive feedback to bind information about object properties to the entity
variables. On block-stacking tasks, OP3 generalizes to novel block
configurations and more objects than observed during training, outperforming an
oracle model that assumes access to object supervision and achieving two to
three times better accuracy than a state-of-the-art video prediction model that
does not exhibit entity abstraction.Comment: Accepted at CoRL 201
Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning
Temporal observations such as videos contain essential information about the
dynamics of the underlying scene, but they are often interleaved with
inessential, predictable details. One way of dealing with this problem is by
focusing on the most informative moments in a sequence. We propose a model that
learns to discover these important events and the times when they occur and
uses them to represent the full sequence. We do so using a hierarchical
Keyframe-Inpainter (KeyIn) model that first generates a video's keyframes and
then inpaints the rest by generating the frames at the intervening times. We
propose a fully differentiable formulation to efficiently learn this procedure.
We show that KeyIn finds informative keyframes in several datasets with
different dynamics and visual properties. KeyIn outperforms other recent
hierarchical predictive models for planning. For more details, please see the
project website at \url{https://sites.google.com/view/keyin}.Comment: Conference on Learning for Dynamics and Control, 2020. Website:
https://sites.google.com/view/keyin/hom
- …