1,962 research outputs found
Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes
The success of deep learning in computer vision is based on availability of
large annotated datasets. To lower the need for hand labeled images, virtually
rendered 3D worlds have recently gained popularity. Creating realistic 3D
content is challenging on its own and requires significant human effort. In
this work, we propose an alternative paradigm which combines real and synthetic
data for learning semantic instance segmentation and object detection models.
Exploiting the fact that not all aspects of the scene are equally important for
this task, we propose to augment real-world imagery with virtual objects of the
target category. Capturing real-world images at large scale is easy and cheap,
and directly provides real background appearances without the need for creating
complex 3D models of the environment. We present an efficient procedure to
augment real images with virtual objects. This allows us to create realistic
composite images which exhibit both realistic background appearance and a large
number of complex object arrangements. In contrast to modeling complete 3D
environments, our augmentation approach requires only a few user interactions
in combination with 3D shapes of the target object. Through extensive
experimentation, we conclude the right set of parameters to produce augmented
data which can maximally enhance the performance of instance segmentation
models. Further, we demonstrate the utility of our approach on training
standard deep models for semantic instance segmentation and object detection of
cars in outdoor driving scenes. We test the models trained on our augmented
data on the KITTI 2015 dataset, which we have annotated with pixel-accurate
ground truth, and on Cityscapes dataset. Our experiments demonstrate that
models trained on augmented imagery generalize better than those trained on
synthetic data or models trained on limited amount of annotated real data
OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation of Road Scenes
Light field cameras can provide rich angular and spatial information to
enhance image semantic segmentation for scene understanding in the field of
autonomous driving. However, the extensive angular information of light field
cameras contains a large amount of redundant data, which is overwhelming for
the limited hardware resource of intelligent vehicles. Besides, inappropriate
compression leads to information corruption and data loss. To excavate
representative information, we propose an Omni-Aperture Fusion model (OAFuser),
which leverages dense context from the central view and discovers the angular
information from sub-aperture images to generate a semantically-consistent
result. To avoid feature loss during network propagation and simultaneously
streamline the redundant information from the light field camera, we present a
simple yet very effective Sub-Aperture Fusion Module (SAFM) to embed
sub-aperture images into angular features without any additional memory cost.
Furthermore, to address the mismatched spatial information across viewpoints,
we present Center Angular Rectification Module (CARM) realized feature
resorting and prevent feature occlusion caused by asymmetric information. Our
proposed OAFuser achieves state-of-the-art performance on the UrbanLF-Real and
-Syn datasets and sets a new record of 84.93% in mIoU on the UrbanLF-Real
Extended dataset, with a gain of +4.53%. The source code of OAFuser will be
made publicly available at https://github.com/FeiBryantkit/OAFuser.Comment: The source code of OAFuser will be made publicly available at
https://github.com/FeiBryantkit/OAFuse
CASENet: Deep Category-Aware Semantic Edge Detection
Boundary and edge cues are highly beneficial in improving a wide variety of
vision tasks such as semantic segmentation, object recognition, stereo, and
object proposal generation. Recently, the problem of edge detection has been
revisited and significant progress has been made with deep learning. While
classical edge detection is a challenging binary problem in itself, the
category-aware semantic edge detection by nature is an even more challenging
multi-label problem. We model the problem such that each edge pixel can be
associated with more than one class as they appear in contours or junctions
belonging to two or more semantic classes. To this end, we propose a novel
end-to-end deep semantic edge learning architecture based on ResNet and a new
skip-layer architecture where category-wise edge activations at the top
convolution layer share and are fused with the same set of bottom layer
features. We then propose a multi-label loss function to supervise the fused
activations. We show that our proposed architecture benefits this problem with
better performance, and we outperform the current state-of-the-art semantic
edge detection methods by a large margin on standard data sets such as SBD and
Cityscapes.Comment: Accepted to CVPR 201
- …