5,376 research outputs found
Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Imitation learning has traditionally been applied to learn a single task from
demonstrations thereof. The requirement of structured and isolated
demonstrations limits the scalability of imitation learning approaches as they
are difficult to apply to real-world scenarios, where robots have to be able to
execute a multitude of tasks. In this paper, we propose a multi-modal imitation
learning framework that is able to segment and imitate skills from unlabelled
and unstructured demonstrations by learning skill segmentation and imitation
learning jointly. The extensive simulation results indicate that our method can
efficiently separate the demonstrations into individual skills and learn to
imitate them using a single multi-modal policy. The video of our experiments is
available at http://sites.google.com/view/nips17intentionganComment: Paper accepted to NIPS 201
Multi-View Frame Reconstruction with Conditional GAN
Multi-view frame reconstruction is an important problem particularly when
multiple frames are missing and past and future frames within the camera are
far apart from the missing ones. Realistic coherent frames can still be
reconstructed using corresponding frames from other overlapping cameras. We
propose an adversarial approach to learn the spatio-temporal representation of
the missing frame using conditional Generative Adversarial Network (cGAN). The
conditional input to each cGAN is the preceding or following frames within the
camera or the corresponding frames in other overlapping cameras, all of which
are merged together using a weighted average. Representations learned from
frames within the camera are given more weight compared to the ones learned
from other cameras when they are close to the missing frames and vice versa.
Experiments on two challenging datasets demonstrate that our framework produces
comparable results with the state-of-the-art reconstruction method in a single
camera and achieves promising performance in multi-camera scenario.Comment: 5 pages, 4 figures, 3 tables, Accepted at IEEE Global Conference on
Signal and Information Processing, 201
Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation
The recent advances in deep learning have made it possible to generate
photo-realistic images by using neural networks and even to extrapolate video
frames from an input video clip. In this paper, for the sake of both furthering
this exploration and our own interest in a realistic application, we study
image-to-video translation and particularly focus on the videos of facial
expressions. This problem challenges the deep neural networks by another
temporal dimension comparing to the image-to-image translation. Moreover, its
single input image fails most existing video generation methods that rely on
recurrent models. We propose a user-controllable approach so as to generate
video clips of various lengths from a single face image. The lengths and types
of the expressions are controlled by users. To this end, we design a novel
neural network architecture that can incorporate the user input into its skip
connections and propose several improvements to the adversarial training method
for the neural network. Experiments and user studies verify the effectiveness
of our approach. Especially, we would like to highlight that even for the face
images in the wild (downloaded from the Web and the authors' own photos), our
model can generate high-quality facial expression videos of which about 50\%
are labeled as real by Amazon Mechanical Turk workers.Comment: 10 page
- …