1,808 research outputs found
A Domain Agnostic Normalization Layer for Unsupervised Adversarial Domain Adaptation
We propose a normalization layer for unsupervised domain adaption in semantic
scene segmentation. Normalization layers are known to improve convergence and
generalization and are part of many state-of-the-art fully-convolutional neural
networks. We show that conventional normalization layers worsen the performance
of current Unsupervised Adversarial Domain Adaption (UADA), which is a method
to improve network performance on unlabeled datasets and the focus of our
research. Therefore, we propose a novel Domain Agnostic Normalization layer and
thereby unlock the benefits of normalization layers for unsupervised
adversarial domain adaptation. In our evaluation, we adapt from the synthetic
GTA5 data set to the real Cityscapes data set, a common benchmark experiment,
and surpass the state-of-the-art. As our normalization layer is domain agnostic
at test time, we furthermore demonstrate that UADA using Domain Agnostic
Normalization improves performance on unseen domains, specifically on
Apolloscape and Mapillary
Training of Convolutional Networks on Multiple Heterogeneous Datasets for Street Scene Semantic Segmentation
We propose a convolutional network with hierarchical classifiers for
per-pixel semantic segmentation, which is able to be trained on multiple,
heterogeneous datasets and exploit their semantic hierarchy. Our network is the
first to be simultaneously trained on three different datasets from the
intelligent vehicles domain, i.e. Cityscapes, GTSDB and Mapillary Vistas, and
is able to handle different semantic level-of-detail, class imbalances, and
different annotation types, i.e. dense per-pixel and sparse bounding-box
labels. We assess our hierarchical approach, by comparing against flat,
non-hierarchical classifiers and we show improvements in mean pixel accuracy of
13.0% for Cityscapes classes and 2.4% for Vistas classes and 32.3% for GTSDB
classes. Our implementation achieves inference rates of 17 fps at a resolution
of 520x706 for 108 classes running on a GPU.Comment: IEEE Intelligent Vehicles 201
Panoptic Segmentation
We propose and study a task we name panoptic segmentation (PS). Panoptic
segmentation unifies the typically distinct tasks of semantic segmentation
(assign a class label to each pixel) and instance segmentation (detect and
segment each object instance). The proposed task requires generating a coherent
scene segmentation that is rich and complete, an important step toward
real-world vision systems. While early work in computer vision addressed
related image/scene parsing tasks, these are not currently popular, possibly
due to lack of appropriate metrics or associated recognition challenges. To
address this, we propose a novel panoptic quality (PQ) metric that captures
performance for all classes (stuff and things) in an interpretable and unified
manner. Using the proposed metric, we perform a rigorous study of both human
and machine performance for PS on three existing datasets, revealing
interesting insights about the task. The aim of our work is to revive the
interest of the community in a more unified view of image segmentation.Comment: accepted to CVPR 201
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
In this work we present In-Place Activated Batch Normalization (InPlace-ABN)
- a novel approach to drastically reduce the training memory footprint of
modern deep neural networks in a computationally efficient way. Our solution
substitutes the conventionally used succession of BatchNorm + Activation layers
with a single plugin layer, hence avoiding invasive framework surgery while
providing straightforward applicability for existing deep learning frameworks.
We obtain memory savings of up to 50% by dropping intermediate results and by
recovering required information during the backward pass through the inversion
of stored forward results, with only minor increase (0.8-2%) in computation
time. Also, we demonstrate how frequently used checkpointing approaches can be
made computationally as efficient as InPlace-ABN. In our experiments on image
classification, we demonstrate on-par results on ImageNet-1k with
state-of-the-art approaches. On the memory-demanding task of semantic
segmentation, we report results for COCO-Stuff, Cityscapes and Mapillary
Vistas, obtaining new state-of-the-art results on the latter without additional
training data but in a single-scale and -model scenario. Code can be found at
https://github.com/mapillary/inplace_abn
- …
