24,426 research outputs found
Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes
The success of deep learning in computer vision is based on availability of
large annotated datasets. To lower the need for hand labeled images, virtually
rendered 3D worlds have recently gained popularity. Creating realistic 3D
content is challenging on its own and requires significant human effort. In
this work, we propose an alternative paradigm which combines real and synthetic
data for learning semantic instance segmentation and object detection models.
Exploiting the fact that not all aspects of the scene are equally important for
this task, we propose to augment real-world imagery with virtual objects of the
target category. Capturing real-world images at large scale is easy and cheap,
and directly provides real background appearances without the need for creating
complex 3D models of the environment. We present an efficient procedure to
augment real images with virtual objects. This allows us to create realistic
composite images which exhibit both realistic background appearance and a large
number of complex object arrangements. In contrast to modeling complete 3D
environments, our augmentation approach requires only a few user interactions
in combination with 3D shapes of the target object. Through extensive
experimentation, we conclude the right set of parameters to produce augmented
data which can maximally enhance the performance of instance segmentation
models. Further, we demonstrate the utility of our approach on training
standard deep models for semantic instance segmentation and object detection of
cars in outdoor driving scenes. We test the models trained on our augmented
data on the KITTI 2015 dataset, which we have annotated with pixel-accurate
ground truth, and on Cityscapes dataset. Our experiments demonstrate that
models trained on augmented imagery generalize better than those trained on
synthetic data or models trained on limited amount of annotated real data
LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes
Deep neural network (DNN) architectures have been shown to outperform
traditional pipelines for object segmentation and pose estimation using RGBD
data, but the performance of these DNN pipelines is directly tied to how
representative the training data is of the true data. Hence a key requirement
for employing these methods in practice is to have a large set of labeled data
for your specific robotic manipulation task, a requirement that is not
generally satisfied by existing datasets. In this paper we develop a pipeline
to rapidly generate high quality RGBD data with pixelwise labels and object
poses. We use an RGBD camera to collect video of a scene from multiple
viewpoints and leverage existing reconstruction techniques to produce a 3D
dense reconstruction. We label the 3D reconstruction using a human assisted
ICP-fitting of object meshes. By reprojecting the results of labeling the 3D
scene we can produce labels for each RGBD image of the scene. This pipeline
enabled us to collect over 1,000,000 labeled object instances in just a few
days. We use this dataset to answer questions related to how much training data
is required, and of what quality the data must be, to achieve high performance
from a DNN architecture
Person Transfer GAN to Bridge Domain Gap for Person Re-Identification
Although the performance of person Re-Identification (ReID) has been
significantly boosted, many challenging issues in real scenarios have not been
fully investigated, e.g., the complex scenes and lighting variations, viewpoint
and pose changes, and the large number of identities in a camera network. To
facilitate the research towards conquering those issues, this paper contributes
a new dataset called MSMT17 with many important features, e.g., 1) the raw
videos are taken by an 15-camera network deployed in both indoor and outdoor
scenes, 2) the videos cover a long period of time and present complex lighting
variations, and 3) it contains currently the largest number of annotated
identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe
that, domain gap commonly exists between datasets, which essentially causes
severe performance drop when training and testing on different datasets. This
results in that available training data cannot be effectively leveraged for new
testing domains. To relieve the expensive costs of annotating new training
samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to
bridge the domain gap. Comprehensive experiments show that the domain gap could
be substantially narrowed-down by the PTGAN.Comment: 10 pages, 9 figures; accepted in CVPR 201
- …