853 research outputs found
Training of Convolutional Networks on Multiple Heterogeneous Datasets for Street Scene Semantic Segmentation
We propose a convolutional network with hierarchical classifiers for
per-pixel semantic segmentation, which is able to be trained on multiple,
heterogeneous datasets and exploit their semantic hierarchy. Our network is the
first to be simultaneously trained on three different datasets from the
intelligent vehicles domain, i.e. Cityscapes, GTSDB and Mapillary Vistas, and
is able to handle different semantic level-of-detail, class imbalances, and
different annotation types, i.e. dense per-pixel and sparse bounding-box
labels. We assess our hierarchical approach, by comparing against flat,
non-hierarchical classifiers and we show improvements in mean pixel accuracy of
13.0% for Cityscapes classes and 2.4% for Vistas classes and 32.3% for GTSDB
classes. Our implementation achieves inference rates of 17 fps at a resolution
of 520x706 for 108 classes running on a GPU.Comment: IEEE Intelligent Vehicles 201
A Domain Agnostic Normalization Layer for Unsupervised Adversarial Domain Adaptation
We propose a normalization layer for unsupervised domain adaption in semantic
scene segmentation. Normalization layers are known to improve convergence and
generalization and are part of many state-of-the-art fully-convolutional neural
networks. We show that conventional normalization layers worsen the performance
of current Unsupervised Adversarial Domain Adaption (UADA), which is a method
to improve network performance on unlabeled datasets and the focus of our
research. Therefore, we propose a novel Domain Agnostic Normalization layer and
thereby unlock the benefits of normalization layers for unsupervised
adversarial domain adaptation. In our evaluation, we adapt from the synthetic
GTA5 data set to the real Cityscapes data set, a common benchmark experiment,
and surpass the state-of-the-art. As our normalization layer is domain agnostic
at test time, we furthermore demonstrate that UADA using Domain Agnostic
Normalization improves performance on unseen domains, specifically on
Apolloscape and Mapillary
Functional Outcome of Radical Retropubic Prostatectomy: Sexual function and urinary continence
Prostate cancer is the most common non-dermatological cancer in men in the
Western world. During the last 2 decades, the widespread use of early detection
programs based-on prostate-specific antigen (PSA) screening has resulted in an
increase of diagnosed prostate cancer. Screening of prostate cancer has resulted in a
change of patient characteristics presenting with localized prostate cancer. Nowedays,
men diagnosed with this disease are at a younger age, have less comorbidities
and a longer life expe
Semantic Foreground Inpainting from Weak Supervision
Semantic scene understanding is an essential task for self-driving vehicles
and mobile robots. In our work, we aim to estimate a semantic segmentation map,
in which the foreground objects are removed and semantically inpainted with
background classes, from a single RGB image. This semantic foreground
inpainting task is performed by a single-stage convolutional neural network
(CNN) that contains our novel max-pooling as inpainting (MPI) module, which is
trained with weak supervision, i.e., it does not require manual background
annotations for the foreground regions to be inpainted. Our approach is
inherently more efficient than the previous two-stage state-of-the-art method,
and outperforms it by a margin of 3% IoU for the inpainted foreground regions
on Cityscapes. The performance margin increases to 6% IoU, when tested on the
unseen KITTI dataset. The code and the manually annotated datasets for testing
are shared with the research community at
https://github.com/Chenyang-Lu/semantic-foreground-inpainting.Comment: RA-L and ICRA'2
Image-Graph-Image Translation via Auto-Encoding
This work presents the first convolutional neural network that learns an
image-to-graph translation task without needing external supervision. Obtaining
graph representations of image content, where objects are represented as nodes
and their relationships as edges, is an important task in scene understanding.
Current approaches follow a fully-supervised approach thereby requiring
meticulous annotations. To overcome this, we are the first to present a
self-supervised approach based on a fully-differentiable auto-encoder in which
the bottleneck encodes the graph's nodes and edges. This self-supervised
approach can currently encode simple line drawings into graphs and obtains
comparable results to a fully-supervised baseline in terms of F1 score on
triplet matching. Besides these promising results, we provide several
directions for future research on how our approach can be extended to cover
more complex imagery
On Boosting Semantic Street Scene Segmentation with Weak Supervision
Training convolutional networks for semantic segmentation requires per-pixel
ground truth labels, which are very time consuming and hence costly to obtain.
Therefore, in this work, we research and develop a hierarchical deep network
architecture and the corresponding loss for semantic segmentation that can be
trained from weak supervision, such as bounding boxes or image level labels, as
well as from strong per-pixel supervision. We demonstrate that the hierarchical
structure and the simultaneous training on strong (per-pixel) and weak
(bounding boxes) labels, even from separate datasets, constantly increases the
performance against per-pixel only training. Moreover, we explore the more
challenging case of adding weak image-level labels. We collect street scene
images and weak labels from the immense Open Images dataset to generate the
OpenScapes dataset, and we use this novel dataset to increase segmentation
performance on two established per-pixel labeled datasets, Cityscapes and
Vistas. We report performance gains up to +13.2% mIoU on crucial street scene
classes, and inference speed of 20 fps on a Titan V GPU for Cityscapes at 512 x
1024 resolution. Our network and OpenScapes dataset are shared with the
research community.Comment: Oral presentation IEEE IV 201
Exploiting image translations via ensemble self-supervised learning for Unsupervised Domain Adaptation
Unsupervised Domain Adaptation (UDA) aims to improve the generalization capacity of models when they are tested on a real-world target domain by learning a model on a source labeled domain. Recently, a UDA method was proposed that addresses the adaptation problem by combining ensemble learning with self-supervised learning. However, this method uses only the source domain to pretrain the model and employs a limited amount of classifiers to create target pseudo labels. To mitigate these deficiencies, in this work, we explore the usage of image translations in combination with ensemble learning and self-supervised learning. To increase the model’s exposure to more variable pretraining data, our method creates multiple diverse image translations, which encourages the learning of domain-invariant features, desired to increase generalization. With these image translations, we are able to learn translation-specific classifiers, which also allows to maximize the amount of ensemble’s classifiers resulting in more robust target pseudo labels. In addition, we propose to use the target domain in pretraining stage to mitigate source domain bias in the network. We evaluate our method on the standard UDA benchmarks, i.e., adapting GTA V and Synthia to Cityscapes, and achieve state-of-the-art results on the mIoU metric. Extensive ablation experiments are reported to highlight the advantageous properties of our UDA strategy
- …