7 research outputs found
Training of Convolutional Networks on Multiple Heterogeneous Datasets for Street Scene Semantic Segmentation
We propose a convolutional network with hierarchical classifiers for
per-pixel semantic segmentation, which is able to be trained on multiple,
heterogeneous datasets and exploit their semantic hierarchy. Our network is the
first to be simultaneously trained on three different datasets from the
intelligent vehicles domain, i.e. Cityscapes, GTSDB and Mapillary Vistas, and
is able to handle different semantic level-of-detail, class imbalances, and
different annotation types, i.e. dense per-pixel and sparse bounding-box
labels. We assess our hierarchical approach, by comparing against flat,
non-hierarchical classifiers and we show improvements in mean pixel accuracy of
13.0% for Cityscapes classes and 2.4% for Vistas classes and 32.3% for GTSDB
classes. Our implementation achieves inference rates of 17 fps at a resolution
of 520x706 for 108 classes running on a GPU.Comment: IEEE Intelligent Vehicles 201
On Boosting Semantic Street Scene Segmentation with Weak Supervision
Training convolutional networks for semantic segmentation requires per-pixel
ground truth labels, which are very time consuming and hence costly to obtain.
Therefore, in this work, we research and develop a hierarchical deep network
architecture and the corresponding loss for semantic segmentation that can be
trained from weak supervision, such as bounding boxes or image level labels, as
well as from strong per-pixel supervision. We demonstrate that the hierarchical
structure and the simultaneous training on strong (per-pixel) and weak
(bounding boxes) labels, even from separate datasets, constantly increases the
performance against per-pixel only training. Moreover, we explore the more
challenging case of adding weak image-level labels. We collect street scene
images and weak labels from the immense Open Images dataset to generate the
OpenScapes dataset, and we use this novel dataset to increase segmentation
performance on two established per-pixel labeled datasets, Cityscapes and
Vistas. We report performance gains up to +13.2% mIoU on crucial street scene
classes, and inference speed of 20 fps on a Titan V GPU for Cityscapes at 512 x
1024 resolution. Our network and OpenScapes dataset are shared with the
research community.Comment: Oral presentation IEEE IV 201
Non-parametric spatially constrained local prior for scene parsing on real-world data
Scene parsing aims to recognize the object category of every pixel in scene
images, and it plays a central role in image content understanding and computer
vision applications. However, accurate scene parsing from unconstrained
real-world data is still a challenging task. In this paper, we present the
non-parametric Spatially Constrained Local Prior (SCLP) for scene parsing on
realistic data. For a given query image, the non-parametric SCLP is learnt by
first retrieving a subset of most similar training images to the query image
and then collecting prior information about object co-occurrence statistics
between spatial image blocks and between adjacent superpixels from the
retrieved subset. The SCLP is powerful in capturing both long- and short-range
context about inter-object correlations in the query image and can be
effectively integrated with traditional visual features to refine the
classification results. Our experiments on the SIFT Flow and PASCAL-Context
benchmark datasets show that the non-parametric SCLP used in conjunction with
superpixel-level visual features achieves one of the top performance compared
with state-of-the-art approaches.Comment: 10 pages, journa