Search CORE

1,235 research outputs found

Time-Contrastive Networks: Self-Supervised Learning from Video

Author: Chebotar Yevgen
Hsu Jasmine
Jang Eric
Levine Sergey
Lynch Corey
Schaal Stefan
Sermanet Pierre
Publication venue
Publication date: 19/03/2018
Field of study

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitat

arXiv.org e-Print Archive

Crossref

Sim2Real View Invariant Visual Servoing by Recurrent Control

Author: Jang Eric
Levine Sergey
Sadeghi Fereshteh
Toshev Alexander
Publication venue
Publication date: 20/12/2017
Field of study

Humans are remarkably proficient at controlling their limbs and tools from a wide range of viewpoints and angles, even in the presence of optical distortions. In robotics, this ability is referred to as visual servoing: moving a tool or end-point to a desired location using primarily visual feedback. In this paper, we study how viewpoint-invariant visual servoing skills can be learned automatically in a robotic manipulation scenario. To this end, we train a deep recurrent controller that can automatically determine which actions move the end-point of a robotic arm to a desired object. The problem that must be solved by this controller is fundamentally ambiguous: under severe variation in viewpoint, it may be impossible to determine the actions in a single feedforward operation. Instead, our visual servoing system must use its memory of past movements to understand how the actions affect the robot motion from the current viewpoint, correcting mistakes and gradually moving closer to the target. This ability is in stark contrast to most visual servoing methods, which either assume known dynamics or require a calibration phase. We show how we can learn this recurrent controller using simulated data and a reinforcement learning objective. We then describe how the resulting model can be transferred to a real-world robot by disentangling perception from control and only adapting the visual layers. The adapted model can servo to previously unseen objects from novel viewpoints on a real-world Kuka IIWA robotic arm. For supplementary videos, see: https://fsadeghi.github.io/Sim2RealViewInvariantServoComment: Supplementary video: https://fsadeghi.github.io/Sim2RealViewInvariantServ

arXiv.org e-Print Archive

Crossref

Pseudo-Dolly-In Video Generation Combining 3D Modeling and Image Reconstruction

Author: Kameda Yoshinari
Kitahara Itaru
Shishido Hidehiko
Yamanaka Kazuki
亀田能成
北原格
宍戸英彦
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2017
Field of study

This paper proposes a pseudo-dolly-in video generation method that reproduces motion parallax by applying image reconstruction processing to multi-view videos. Since dolly-in video is taken by moving a camera forward to reproduce motion parallax, we can present a sense of immersion. However, at a sporting event in a large-scale space, moving a camera is difficult. Our research generates dolly-in video from multi-view images captured by fixed cameras. By applying the Image-Based Modeling technique, dolly-in video can be generated. Unfortunately, the video quality is often damaged by the 3D estimation error. On the other hand, Bullet-Time realizes high-quality video observation. However, moving the virtual-viewpoint from the capturing positions is difficult. To solve these problems, we propose a method to generate a pseudo-dolly-in image by installing 3D estimation and image reconstruction techniques into Bullet-Time and show its effectiveness by applying it to multi-view videos captured at an actual soccer stadium. In the experiment, we compared the proposed method with digital zoom images and with the dolly-in video generated from the Image-Based Modeling and Rendering method.Published in: 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct) Date of Conference: 9-13 Oct. 2017 Conference Location: Nantes, Franc

Tsukuba Repository

Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation

Author: Bai Yunfei
Fang Kuan
Hinterstoisser Stefan
Kalakrishnan Mrinal
Savarese Silvio
Publication venue
Publication date: 03/03/2018
Field of study

Learning-based approaches to robotic manipulation are limited by the scalability of data collection and accessibility of labels. In this paper, we present a multi-task domain adaptation framework for instance grasping in cluttered scenes by utilizing simulated robot experiments. Our neural network takes monocular RGB images and the instance segmentation mask of a specified target object as inputs, and predicts the probability of successfully grasping the specified object for each candidate motor command. The proposed transfer learning framework trains a model for instance grasping in simulation and uses a domain-adversarial loss to transfer the trained model to real robots using indiscriminate grasping data, which is available both in simulation and the real world. We evaluate our model in real-world robot experiments, comparing it with alternative model architectures as well as an indiscriminate grasping baseline.Comment: ICRA 201

arXiv.org e-Print Archive

Crossref

Xth Person View Video for Observation from Diverse Perspectives

Author: Kameda Yoshinari
Kitahara Itaru
Shimura Naoki
Shishido Hidehiko
Suzuki Kenji
亀田能成
北原格
宍戸英彦
鈴木健嗣
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2018
Field of study

Tsukuba Repository