8,231 research outputs found
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes
During the last half decade, convolutional neural networks (CNNs) have
triumphed over semantic segmentation, which is one of the core tasks in many
applications such as autonomous driving. However, to train CNNs requires a
considerable amount of data, which is difficult to collect and laborious to
annotate. Recent advances in computer graphics make it possible to train CNNs
on photo-realistic synthetic imagery with computer-generated annotations.
Despite this, the domain mismatch between the real images and the synthetic
data cripples the models' performance. Hence, we propose a curriculum-style
learning approach to minimize the domain gap in urban scenery semantic
segmentation. The curriculum domain adaptation solves easy tasks first to infer
necessary properties about the target domain; in particular, the first task is
to learn global label distributions over images and local distributions over
landmark superpixels. These are easy to estimate because images of urban scenes
have strong idiosyncrasies (e.g., the size and spatial relations of buildings,
streets, cars, etc.). We then train a segmentation network while regularizing
its predictions in the target domain to follow those inferred properties. In
experiments, our method outperforms the baselines on two datasets and two
backbone networks. We also report extensive ablation studies about our
approach.Comment: This is the extended version of the ICCV 2017 paper "Curriculum
Domain Adaptation for Semantic Segmentation of Urban Scenes" with additional
GTA experimen
Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation
Most progress in semantic segmentation reports on daytime images taken under
favorable illumination conditions. We instead address the problem of semantic
segmentation of nighttime images and improve the state-of-the-art, by adapting
daytime models to nighttime without using nighttime annotations. Moreover, we
design a new evaluation framework to address the substantial uncertainty of
semantics in nighttime images. Our central contributions are: 1) a curriculum
framework to gradually adapt semantic segmentation models from day to night via
labeled synthetic images and unlabeled real images, both for progressively
darker times of day, which exploits cross-time-of-day correspondences for the
real images to guide the inference of their labels; 2) a novel
uncertainty-aware annotation and evaluation framework and metric for semantic
segmentation, designed for adverse conditions and including image regions
beyond human recognition capability in the evaluation in a principled fashion;
3) the Dark Zurich dataset, which comprises 2416 unlabeled nighttime and 2920
unlabeled twilight images with correspondences to their daytime counterparts
plus a set of 151 nighttime images with fine pixel-level annotations created
with our protocol, which serves as a first benchmark to perform our novel
evaluation. Experiments show that our guided curriculum adaptation
significantly outperforms state-of-the-art methods on real nighttime sets both
for standard metrics and our uncertainty-aware metric. Furthermore, our
uncertainty-aware evaluation reveals that selective invalidation of predictions
can lead to better results on data with ambiguous content such as our nighttime
benchmark and profit safety-oriented applications which involve invalid inputs.Comment: ICCV 2019 camera-read
Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation
We address the problem of semantic nighttime image segmentation and improve
the state-of-the-art, by adapting daytime models to nighttime without using
nighttime annotations. Moreover, we design a new evaluation framework to
address the substantial uncertainty of semantics in nighttime images. Our
central contributions are: 1) a curriculum framework to gradually adapt
semantic segmentation models from day to night through progressively darker
times of day, exploiting cross-time-of-day correspondences between daytime
images from a reference map and dark images to guide the label inference in the
dark domains; 2) a novel uncertainty-aware annotation and evaluation framework
and metric for semantic segmentation, including image regions beyond human
recognition capability in the evaluation in a principled fashion; 3) the Dark
Zurich dataset, comprising 2416 unlabeled nighttime and 2920 unlabeled twilight
images with correspondences to their daytime counterparts plus a set of 201
nighttime images with fine pixel-level annotations created with our protocol,
which serves as a first benchmark for our novel evaluation. Experiments show
that our map-guided curriculum adaptation significantly outperforms
state-of-the-art methods on nighttime sets both for standard metrics and our
uncertainty-aware metric. Furthermore, our uncertainty-aware evaluation reveals
that selective invalidation of predictions can improve results on data with
ambiguous content such as our benchmark and profit safety-oriented applications
involving invalid inputs.Comment: IEEE T-PAMI 202
Play and Learn: Using Video Games to Train Computer Vision Models
Video games are a compelling source of annotated data as they can readily
provide fine-grained groundtruth for diverse tasks. However, it is not clear
whether the synthetically generated data has enough resemblance to the
real-world images to improve the performance of computer vision models in
practice. We present experiments assessing the effectiveness on real-world data
of systems trained on synthetic RGB images that are extracted from a video
game. We collected over 60000 synthetic samples from a modern video game with
similar conditions to the real-world CamVid and Cityscapes datasets. We provide
several experiments to demonstrate that the synthetically generated RGB images
can be used to improve the performance of deep neural networks on both image
segmentation and depth estimation. These results show that a convolutional
network trained on synthetic data achieves a similar test error to a network
that is trained on real-world data for dense image classification. Furthermore,
the synthetically generated RGB images can provide similar or better results
compared to the real-world datasets if a simple domain adaptation technique is
applied. Our results suggest that collaboration with game developers for an
accessible interface to gather data is potentially a fruitful direction for
future work in computer vision.Comment: To appear in the British Machine Vision Conference (BMVC), September
2016. -v2: fixed a typo in the reference
- …