6,335 research outputs found
The Cityscapes Dataset for Semantic Urban Scene Understanding
Visual understanding of complex urban street scenes is an enabling factor for
a wide range of applications. Object detection has benefited enormously from
large-scale datasets, especially in the context of deep learning. For semantic
urban scene understanding, however, no current dataset adequately captures the
complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale
dataset to train and test approaches for pixel-level and instance-level
semantic labeling. Cityscapes is comprised of a large, diverse set of stereo
video sequences recorded in streets from 50 different cities. 5000 of these
images have high quality pixel-level annotations; 20000 additional images have
coarse annotations to enable methods that leverage large volumes of
weakly-labeled data. Crucially, our effort exceeds previous attempts in terms
of dataset size, annotation richness, scene variability, and complexity. Our
accompanying empirical study provides an in-depth analysis of the dataset
characteristics, as well as a performance evaluation of several
state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia
Layered Interpretation of Street View Images
We propose a layered street view model to encode both depth and semantic
information on street view images for autonomous driving. Recently, stixels,
stix-mantics, and tiered scene labeling methods have been proposed to model
street view images. We propose a 4-layer street view model, a compact
representation over the recently proposed stix-mantics model. Our layers encode
semantic classes like ground, pedestrians, vehicles, buildings, and sky in
addition to the depths. The only input to our algorithm is a pair of stereo
images. We use a deep neural network to extract the appearance features for
semantic classes. We use a simple and an efficient inference algorithm to
jointly estimate both semantic classes and layered depth values. Our method
outperforms other competing approaches in Daimler urban scene segmentation
dataset. Our algorithm is massively parallelizable, allowing a GPU
implementation with a processing speed about 9 fps.Comment: The paper will be presented in the 2015 Robotics: Science and Systems
Conference (RSS
Impact of Ground Truth Annotation Quality on Performance of Semantic Image Segmentation of Traffic Conditions
Preparation of high-quality datasets for the urban scene understanding is a
labor-intensive task, especially, for datasets designed for the autonomous
driving applications. The application of the coarse ground truth (GT)
annotations of these datasets without detriment to the accuracy of semantic
image segmentation (by the mean intersection over union - mIoU) could simplify
and speedup the dataset preparation and model fine tuning before its practical
application. Here the results of the comparative analysis for semantic
segmentation accuracy obtained by PSPNet deep learning architecture are
presented for fine and coarse annotated images from Cityscapes dataset. Two
scenarios were investigated: scenario 1 - the fine GT images for training and
prediction, and scenario 2 - the fine GT images for training and the coarse GT
images for prediction. The obtained results demonstrated that for the most
important classes the mean accuracy values of semantic image segmentation for
coarse GT annotations are higher than for the fine GT ones, and the standard
deviation values are vice versa. It means that for some applications some
unimportant classes can be excluded and the model can be tuned further for some
classes and specific regions on the coarse GT dataset without loss of the
accuracy even. Moreover, this opens the perspectives to use deep neural
networks for the preparation of such coarse GT datasets.Comment: 10 pages, 6 figures, 2 tables, The Second International Conference on
Computer Science, Engineering and Education Applications (ICCSEEA2019) 26-27
January 2019, Kiev, Ukrain
- …