30,888 research outputs found
Exploring Context with Deep Structured models for Semantic Segmentation
State-of-the-art semantic image segmentation methods are mostly based on
training deep convolutional neural networks (CNNs). In this work, we proffer to
improve semantic segmentation with the use of contextual information. In
particular, we explore `patch-patch' context and `patch-background' context in
deep CNNs. We formulate deep structured models by combining CNNs and
Conditional Random Fields (CRFs) for learning the patch-patch context between
image regions. Specifically, we formulate CNN-based pairwise potential
functions to capture semantic correlations between neighboring patches.
Efficient piecewise training of the proposed deep structured model is then
applied in order to avoid repeated expensive CRF inference during the course of
back propagation. For capturing the patch-background context, we show that a
network design with traditional multi-scale image inputs and sliding pyramid
pooling is very effective for improving performance. We perform comprehensive
evaluation of the proposed method. We achieve new state-of-the-art performance
on a number of challenging semantic segmentation datasets including ,
-, , -, -,
-, and datasets. Particularly, we report an
intersection-over-union score of on the - dataset.Comment: 16 pages. Accepted to IEEE T. Pattern Analysis & Machine
Intelligence, 2017. Extended version of arXiv:1504.0101
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes
During the last half decade, convolutional neural networks (CNNs) have
triumphed over semantic segmentation, which is one of the core tasks in many
applications such as autonomous driving. However, to train CNNs requires a
considerable amount of data, which is difficult to collect and laborious to
annotate. Recent advances in computer graphics make it possible to train CNNs
on photo-realistic synthetic imagery with computer-generated annotations.
Despite this, the domain mismatch between the real images and the synthetic
data cripples the models' performance. Hence, we propose a curriculum-style
learning approach to minimize the domain gap in urban scenery semantic
segmentation. The curriculum domain adaptation solves easy tasks first to infer
necessary properties about the target domain; in particular, the first task is
to learn global label distributions over images and local distributions over
landmark superpixels. These are easy to estimate because images of urban scenes
have strong idiosyncrasies (e.g., the size and spatial relations of buildings,
streets, cars, etc.). We then train a segmentation network while regularizing
its predictions in the target domain to follow those inferred properties. In
experiments, our method outperforms the baselines on two datasets and two
backbone networks. We also report extensive ablation studies about our
approach.Comment: This is the extended version of the ICCV 2017 paper "Curriculum
Domain Adaptation for Semantic Segmentation of Urban Scenes" with additional
GTA experimen
Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling
We frame the task of predicting a semantic labeling as a sparse
reconstruction procedure that applies a target-specific learned transfer
function to a generic deep sparse code representation of an image. This
strategy partitions training into two distinct stages. First, in an
unsupervised manner, we learn a set of generic dictionaries optimized for
sparse coding of image patches. We train a multilayer representation via
recursive sparse dictionary learning on pooled codes output by earlier layers.
Second, we encode all training images with the generic dictionaries and learn a
transfer function that optimizes reconstruction of patches extracted from
annotated ground-truth given the sparse codes of their corresponding image
patches. At test time, we encode a novel image using the generic dictionaries
and then reconstruct using the transfer function. The output reconstruction is
a semantic labeling of the test image.
Applying this strategy to the task of contour detection, we demonstrate
performance competitive with state-of-the-art systems. Unlike almost all prior
work, our approach obviates the need for any form of hand-designed features or
filters. To illustrate general applicability, we also show initial results on
semantic part labeling of human faces.
The effectiveness of our approach opens new avenues for research on deep
sparse representations. Our classifiers utilize this representation in a novel
manner. Rather than acting on nodes in the deepest layer, they attach to nodes
along a slice through multiple layers of the network in order to make
predictions about local patches. Our flexible combination of a generatively
learned sparse representation with discriminatively trained transfer
classifiers extends the notion of sparse reconstruction to encompass arbitrary
semantic labeling tasks.Comment: to appear in Asian Conference on Computer Vision (ACCV), 201
Discriminative Training of Deep Fully-connected Continuous CRF with Task-specific Loss
Recent works on deep conditional random fields (CRF) have set new records on
many vision tasks involving structured predictions. Here we propose a
fully-connected deep continuous CRF model for both discrete and continuous
labelling problems. We exemplify the usefulness of the proposed model on
multi-class semantic labelling (discrete) and the robust depth estimation
(continuous) problems.
In our framework, we model both the unary and the pairwise potential
functions as deep convolutional neural networks (CNN), which are jointly
learned in an end-to-end fashion. The proposed method possesses the main
advantage of continuously-valued CRF, which is a closed-form solution for the
Maximum a posteriori (MAP) inference.
To better adapt to different tasks, instead of using the commonly employed
maximum likelihood CRF parameter learning protocol, we propose task-specific
loss functions for learning the CRF parameters.
It enables direct optimization of the quality of the MAP estimates during the
course of learning.
Specifically, we optimize the multi-class classification loss for the
semantic labelling task and the Turkey's biweight loss for the robust depth
estimation problem.
Experimental results on the semantic labelling and robust depth estimation
tasks demonstrate that the proposed method compare favorably against both
baseline and state-of-the-art methods.
In particular, we show that although the proposed deep CRF model is
continuously valued, with the equipment of task-specific loss, it achieves
impressive results even on discrete labelling tasks
- …