40,120 research outputs found
The Cityscapes Dataset for Semantic Urban Scene Understanding
Visual understanding of complex urban street scenes is an enabling factor for
a wide range of applications. Object detection has benefited enormously from
large-scale datasets, especially in the context of deep learning. For semantic
urban scene understanding, however, no current dataset adequately captures the
complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale
dataset to train and test approaches for pixel-level and instance-level
semantic labeling. Cityscapes is comprised of a large, diverse set of stereo
video sequences recorded in streets from 50 different cities. 5000 of these
images have high quality pixel-level annotations; 20000 additional images have
coarse annotations to enable methods that leverage large volumes of
weakly-labeled data. Crucially, our effort exceeds previous attempts in terms
of dataset size, annotation richness, scene variability, and complexity. Our
accompanying empirical study provides an in-depth analysis of the dataset
characteristics, as well as a performance evaluation of several
state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia
Automatic Image Segmentation by Dynamic Region Merging
This paper addresses the automatic image segmentation problem in a region
merging style. With an initially over-segmented image, in which the many
regions (or super-pixels) with homogeneous color are detected, image
segmentation is performed by iteratively merging the regions according to a
statistical test. There are two essential issues in a region merging algorithm:
order of merging and the stopping criterion. In the proposed algorithm, these
two issues are solved by a novel predicate, which is defined by the sequential
probability ratio test (SPRT) and the maximum likelihood criterion. Starting
from an over-segmented image, neighboring regions are progressively merged if
there is an evidence for merging according to this predicate. We show that the
merging order follows the principle of dynamic programming. This formulates
image segmentation as an inference problem, where the final segmentation is
established based on the observed image. We also prove that the produced
segmentation satisfies certain global properties. In addition, a faster
algorithm is developed to accelerate the region merging process, which
maintains a nearest neighbor graph in each iteration. Experiments on real
natural images are conducted to demonstrate the performance of the proposed
dynamic region merging algorithm.Comment: 28 pages. This paper is under review in IEEE TI
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic annotations are vital for training models for object recognition,
semantic segmentation or scene understanding. Unfortunately, pixelwise
annotation of images at very large scale is labor-intensive and only little
labeled data is available, particularly at instance level and for street
scenes. In this paper, we propose to tackle this problem by lifting the
semantic instance labeling task from 2D into 3D. Given reconstructions from
stereo or laser data, we annotate static 3D scene elements with rough bounding
primitives and develop a model which transfers this information into the image
domain. We leverage our method to obtain 2D labels for a novel suburban video
dataset which we have collected, resulting in 400k semantic and instance image
annotations. A comparison of our method to state-of-the-art label transfer
baselines reveals that 3D information enables more efficient annotation while
at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition
(CVPR), 201
- …