7,711 research outputs found
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
Cross Modal Distillation for Supervision Transfer
In this work we propose a technique that transfers supervision between images
from different modalities. We use learned representations from a large labeled
modality as a supervisory signal for training representations for a new
unlabeled paired modality. Our method enables learning of rich representations
for unlabeled modalities and can be used as a pre-training procedure for new
modalities with limited labeled data. We show experimental results where we
transfer supervision from labeled RGB images to unlabeled depth and optical
flow images and demonstrate large improvements for both these cross modal
supervision transfers. Code, data and pre-trained models are available at
https://github.com/s-gupta/fast-rcnn/tree/distillationComment: Updated version (v2) contains additional experiments and result
Layered Interpretation of Street View Images
We propose a layered street view model to encode both depth and semantic
information on street view images for autonomous driving. Recently, stixels,
stix-mantics, and tiered scene labeling methods have been proposed to model
street view images. We propose a 4-layer street view model, a compact
representation over the recently proposed stix-mantics model. Our layers encode
semantic classes like ground, pedestrians, vehicles, buildings, and sky in
addition to the depths. The only input to our algorithm is a pair of stereo
images. We use a deep neural network to extract the appearance features for
semantic classes. We use a simple and an efficient inference algorithm to
jointly estimate both semantic classes and layered depth values. Our method
outperforms other competing approaches in Daimler urban scene segmentation
dataset. Our algorithm is massively parallelizable, allowing a GPU
implementation with a processing speed about 9 fps.Comment: The paper will be presented in the 2015 Robotics: Science and Systems
Conference (RSS
BlitzNet: A Real-Time Deep Network for Scene Understanding
Real-time scene understanding has become crucial in many applications such as
autonomous driving. In this paper, we propose a deep architecture, called
BlitzNet, that jointly performs object detection and semantic segmentation in
one forward pass, allowing real-time computations. Besides the computational
gain of having a single network to perform several tasks, we show that object
detection and semantic segmentation benefit from each other in terms of
accuracy. Experimental results for VOC and COCO datasets show state-of-the-art
performance for object detection and segmentation among real time systems
- …