Search CORE

13,628 research outputs found

The Cityscapes Dataset for Semantic Urban Scene Understanding

Author: Benenson Rodrigo
Cordts Marius
Enzweiler Markus
Franke Uwe
Omran Mohamed
Ramos Sebastian
Rehfeld Timo
Roth Stefan
Schiele Bernt
Publication venue
Publication date: 01/01/2016
Field of study

Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Impact of Ground Truth Annotation Quality on Performance of Semantic Image Segmentation of Traffic Conditions

Author: A Bouzid-Daho
A Garciagarcia
A Geiger
AS Chaudhary
C Flores
GJ Brostow
H Yu
M Abadi
M Everingham
M Müller
RK Sahoo
S Memon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2018
Field of study

Preparation of high-quality datasets for the urban scene understanding is a labor-intensive task, especially, for datasets designed for the autonomous driving applications. The application of the coarse ground truth (GT) annotations of these datasets without detriment to the accuracy of semantic image segmentation (by the mean intersection over union - mIoU) could simplify and speedup the dataset preparation and model fine tuning before its practical application. Here the results of the comparative analysis for semantic segmentation accuracy obtained by PSPNet deep learning architecture are presented for fine and coarse annotated images from Cityscapes dataset. Two scenarios were investigated: scenario 1 - the fine GT images for training and prediction, and scenario 2 - the fine GT images for training and the coarse GT images for prediction. The obtained results demonstrated that for the most important classes the mean accuracy values of semantic image segmentation for coarse GT annotations are higher than for the fine GT ones, and the standard deviation values are vice versa. It means that for some applications some unimportant classes can be excluded and the model can be tuned further for some classes and specific regions on the coarse GT dataset without loss of the accuracy even. Moreover, this opens the perspectives to use deep neural networks for the preparation of such coarse GT datasets.Comment: 10 pages, 6 figures, 2 tables, The Second International Conference on Computer Science, Engineering and Education Applications (ICCSEEA2019) 26-27 January 2019, Kiev, Ukrain

arXiv.org e-Print Archive

Crossref

Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime

Author: Dai Dengxin
Van Gool Luc
Publication venue
Publication date: 05/10/2018
Field of study

This work addresses the problem of semantic image segmentation of nighttime scenes. Although considerable progress has been made in semantic image segmentation, it is mainly related to daytime scenarios. This paper proposes a novel method to progressive adapt the semantic models trained on daytime scenes, along with large-scale annotations therein, to nighttime scenes via the bridge of twilight time -- the time between dawn and sunrise, or between sunset and dusk. The goal of the method is to alleviate the cost of human annotation for nighttime images by transferring knowledge from standard daytime conditions. In addition to the method, a new dataset of road scenes is compiled; it consists of 35,000 images ranging from daytime to twilight time and to nighttime. Also, a subset of the nighttime images are densely annotated for method evaluation. Our experiments show that our method is effective for model adaptation from daytime scenes to nighttime scenes, without using extra human annotation.Comment: Accepted to International Conference on Intelligent Transportation Systems (ITSC 2018

arXiv.org e-Print Archive

Repository for Publications and Research Data