4,304 research outputs found
Interpretable Latent Spaces for Learning from Demonstration
Effective human-robot interaction, such as in robot learning from human
demonstration, requires the learning agent to be able to ground abstract
concepts (such as those contained within instructions) in a corresponding
high-dimensional sensory input stream from the world. Models such as deep
neural networks, with high capacity through their large parameter spaces, can
be used to compress the high-dimensional sensory data to lower dimensional
representations. These low-dimensional representations facilitate symbol
grounding, but may not guarantee that the representation would be
human-interpretable. We propose a method which utilises the grouping of
user-defined symbols and their corresponding sensory observations in order to
align the learnt compressed latent representation with the semantic notions
contained in the abstract labels. We demonstrate this through experiments with
both simulated and real-world object data, showing that such alignment can be
achieved in a process of physical symbol grounding.Comment: 12 pages, 6 figures, accepted at the Conference on Robot Learning
(CoRL) 2018, Zurich, Switzerlan
Channel-Recurrent Autoencoding for Image Modeling
Despite recent successes in synthesizing faces and bedrooms, existing
generative models struggle to capture more complex image types, potentially due
to the oversimplification of their latent space constructions. To tackle this
issue, building on Variational Autoencoders (VAEs), we integrate recurrent
connections across channels to both inference and generation steps, allowing
the high-level features to be captured in global-to-local, coarse-to-fine
manners. Combined with adversarial loss, our channel-recurrent VAE-GAN
(crVAE-GAN) outperforms VAE-GAN in generating a diverse spectrum of high
resolution images while maintaining the same level of computational efficacy.
Our model produces interpretable and expressive latent representations to
benefit downstream tasks such as image completion. Moreover, we propose two
novel regularizations, namely the KL objective weighting scheme over time steps
and mutual information maximization between transformed latent variables and
the outputs, to enhance the training.Comment: Code: https://github.com/WendyShang/crVAE. Supplementary Materials:
http://www-personal.umich.edu/~shangw/wacv18_supplementary_material.pd
OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning
Reinforcement learning has shown promise in learning policies that can solve
complex problems. However, manually specifying a good reward function can be
difficult, especially for intricate tasks. Inverse reinforcement learning
offers a useful paradigm to learn the underlying reward function directly from
expert demonstrations. Yet in reality, the corpus of demonstrations may contain
trajectories arising from a diverse set of underlying reward functions rather
than a single one. Thus, in inverse reinforcement learning, it is useful to
consider such a decomposition. The options framework in reinforcement learning
is specifically designed to decompose policies in a similar light. We therefore
extend the options framework and propose a method to simultaneously recover
reward options in addition to policy options. We leverage adversarial methods
to learn joint reward-policy options using only observed expert states. We show
that this approach works well in both simple and complex continuous control
tasks and shows significant performance increases in one-shot transfer
learning.Comment: Accepted to the Thirthy-Second AAAI Conference On Artificial
Intelligence (AAAI), 201
Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Imitation learning has traditionally been applied to learn a single task from
demonstrations thereof. The requirement of structured and isolated
demonstrations limits the scalability of imitation learning approaches as they
are difficult to apply to real-world scenarios, where robots have to be able to
execute a multitude of tasks. In this paper, we propose a multi-modal imitation
learning framework that is able to segment and imitate skills from unlabelled
and unstructured demonstrations by learning skill segmentation and imitation
learning jointly. The extensive simulation results indicate that our method can
efficiently separate the demonstrations into individual skills and learn to
imitate them using a single multi-modal policy. The video of our experiments is
available at http://sites.google.com/view/nips17intentionganComment: Paper accepted to NIPS 201
NewtonianVAE: Proportional Control and Goal Identification from Pixels via Physical Latent Spaces
Learning low-dimensional latent state space dynamics models has been a
powerful paradigm for enabling vision-based planning and learning for control.
We introduce a latent dynamics learning framework that is uniquely designed to
induce proportional controlability in the latent space, thus enabling the use
of much simpler controllers than prior work. We show that our learned dynamics
model enables proportional control from pixels, dramatically simplifies and
accelerates behavioural cloning of vision-based controllers, and provides
interpretable goal discovery when applied to imitation learning of switching
controllers from demonstration
DeepVoxels: Learning Persistent 3D Feature Embeddings
In this work, we address the lack of 3D understanding of generative neural
networks by introducing a persistent 3D feature embedding for view synthesis.
To this end, we propose DeepVoxels, a learned representation that encodes the
view-dependent appearance of a 3D scene without having to explicitly model its
geometry. At its core, our approach is based on a Cartesian 3D grid of
persistent embedded features that learn to make use of the underlying 3D scene
structure. Our approach combines insights from 3D geometric computer vision
with recent advances in learning image-to-image mappings based on adversarial
loss functions. DeepVoxels is supervised, without requiring a 3D reconstruction
of the scene, using a 2D re-rendering loss and enforces perspective and
multi-view geometry in a principled manner. We apply our persistent 3D scene
representation to the problem of novel view synthesis demonstrating
high-quality results for a variety of challenging scenes.Comment: Video: https://www.youtube.com/watch?v=HM_WsZhoGXw Supplemental
material:
https://drive.google.com/file/d/1BnZRyNcVUty6-LxAstN83H79ktUq8Cjp/view?usp=sharing
Code: https://github.com/vsitzmann/deepvoxels Project page:
https://vsitzmann.github.io/deepvoxels
- …