15,763 research outputs found
MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation
Acquiring labeled 6D poses from real images is an expensive and
time-consuming task. Though massive amounts of synthetic RGB images are easy to
obtain, the models trained on them suffer from noticeable performance
degradation due to the synthetic-to-real domain gap. To mitigate this
degradation, we propose a practical self-supervised domain adaptation approach
that takes advantage of real RGB(-D) data without needing real pose labels. We
first pre-train the model with synthetic RGB images and then utilize real
RGB(-D) images to fine-tune the pre-trained model. The fine-tuning process is
self-supervised by the RGB-based pose-aware consistency and the depth-guided
object distance pseudo-label, which does not require the time-consuming online
differentiable rendering. We build our domain adaptation method based on the
recent pose estimator SC6D and evaluate it on the YCB-Video dataset. We
experimentally demonstrate that our method achieves comparable performance
against its fully-supervised counterpart while outperforming existing
state-of-the-art approaches.Comment: SCIA202
Adapting RGB pose estimation to new domains
2019 Spring.Includes bibliographical references.Many multi-modal human computer interaction (HCI) systems interact with users in real-time by estimating the user's pose. Generally, they estimate human poses using depth sensors such as the Microsoft Kinect.For multi-modal HCI interfaces to gain traction in the real world, however, it would be better for pose estimation to be based on data from RGB cameras, which are more common and less expensive than depth sensors. This has motivated research into pose estimation from RGB images. Convolutional Neural Networks (CNNs) represent the state-of-the-art in this literature, for example [1–5], and [6]. These systems estimate 2D human poses from RGB images. A problem with current CNN-based pose estimators is that they require large amounts of labeled data for training. If the goal is to train an RGB pose estimator for a new domain, the cost of collecting and more importantly labeling data can be prohibitive. A common solution is to train on publicly available pose data sets, but then the trained system is not tailored to the domain. We propose using RGB+D sensors to collect domain-specific data in the lab, and then training the RGB pose estimator using skeletons automatically extracted from the RGB+D data. This paper presents a case study of adapting the RMPE pose estimation network [4] to the domain of the DARPA Communicating with Computers (CWC) program [7], as represented by the EGGNOG data set [8]. We chose RMPE because it predicts both joint locations and Part Affinity Fields (PAFs) in real-time. Our adaptation of RMPE trained on automatically-labeled data outperforms the original RMPE on the EGGNOG data set
Zero-Shot Deep Domain Adaptation
Domain adaptation is an important tool to transfer knowledge about a task
(e.g. classification) learned in a source domain to a second, or target domain.
Current approaches assume that task-relevant target-domain data is available
during training. We demonstrate how to perform domain adaptation when no such
task-relevant target-domain data is available. To tackle this issue, we propose
zero-shot deep domain adaptation (ZDDA), which uses privileged information from
task-irrelevant dual-domain pairs. ZDDA learns a source-domain representation
which is not only tailored for the task of interest but also close to the
target-domain representation. Therefore, the source-domain task of interest
solution (e.g. a classifier for classification tasks) which is jointly trained
with the source-domain representation can be applicable to both the source and
target representations. Using the MNIST, Fashion-MNIST, NIST, EMNIST, and SUN
RGB-D datasets, we show that ZDDA can perform domain adaptation in
classification tasks without access to task-relevant target-domain training
data. We also extend ZDDA to perform sensor fusion in the SUN RGB-D scene
classification task by simulating task-relevant target-domain representations
with task-relevant source-domain data. To the best of our knowledge, ZDDA is
the first domain adaptation and sensor fusion method which requires no
task-relevant target-domain data. The underlying principle is not particular to
computer vision data, but should be extensible to other domains.Comment: This paper is accepted to the European Conference on Computer Vision
(ECCV), 201
Cross Modal Distillation for Supervision Transfer
In this work we propose a technique that transfers supervision between images
from different modalities. We use learned representations from a large labeled
modality as a supervisory signal for training representations for a new
unlabeled paired modality. Our method enables learning of rich representations
for unlabeled modalities and can be used as a pre-training procedure for new
modalities with limited labeled data. We show experimental results where we
transfer supervision from labeled RGB images to unlabeled depth and optical
flow images and demonstrate large improvements for both these cross modal
supervision transfers. Code, data and pre-trained models are available at
https://github.com/s-gupta/fast-rcnn/tree/distillationComment: Updated version (v2) contains additional experiments and result
Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation
Learning-based approaches to robotic manipulation are limited by the
scalability of data collection and accessibility of labels. In this paper, we
present a multi-task domain adaptation framework for instance grasping in
cluttered scenes by utilizing simulated robot experiments. Our neural network
takes monocular RGB images and the instance segmentation mask of a specified
target object as inputs, and predicts the probability of successfully grasping
the specified object for each candidate motor command. The proposed transfer
learning framework trains a model for instance grasping in simulation and uses
a domain-adversarial loss to transfer the trained model to real robots using
indiscriminate grasping data, which is available both in simulation and the
real world. We evaluate our model in real-world robot experiments, comparing it
with alternative model architectures as well as an indiscriminate grasping
baseline.Comment: ICRA 201
- …