1,436 research outputs found
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic annotations are vital for training models for object recognition,
semantic segmentation or scene understanding. Unfortunately, pixelwise
annotation of images at very large scale is labor-intensive and only little
labeled data is available, particularly at instance level and for street
scenes. In this paper, we propose to tackle this problem by lifting the
semantic instance labeling task from 2D into 3D. Given reconstructions from
stereo or laser data, we annotate static 3D scene elements with rough bounding
primitives and develop a model which transfers this information into the image
domain. We leverage our method to obtain 2D labels for a novel suburban video
dataset which we have collected, resulting in 400k semantic and instance image
annotations. A comparison of our method to state-of-the-art label transfer
baselines reveals that 3D information enables more efficient annotation while
at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition
(CVPR), 201
The Cityscapes Dataset for Semantic Urban Scene Understanding
Visual understanding of complex urban street scenes is an enabling factor for
a wide range of applications. Object detection has benefited enormously from
large-scale datasets, especially in the context of deep learning. For semantic
urban scene understanding, however, no current dataset adequately captures the
complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale
dataset to train and test approaches for pixel-level and instance-level
semantic labeling. Cityscapes is comprised of a large, diverse set of stereo
video sequences recorded in streets from 50 different cities. 5000 of these
images have high quality pixel-level annotations; 20000 additional images have
coarse annotations to enable methods that leverage large volumes of
weakly-labeled data. Crucially, our effort exceeds previous attempts in terms
of dataset size, annotation richness, scene variability, and complexity. Our
accompanying empirical study provides an in-depth analysis of the dataset
characteristics, as well as a performance evaluation of several
state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes
Training perception systems for self-driving cars requires substantial
annotations. However, manual labeling in 2D images is highly labor-intensive.
While existing datasets provide rich annotations for pre-recorded sequences,
they fall short in labeling rarely encountered viewpoints, potentially
hampering the generalization ability for perception models. In this paper, we
present PanopticNeRF-360, a novel approach that combines coarse 3D annotations
with noisy 2D semantic cues to generate consistent panoptic labels and
high-quality images from any viewpoint. Our key insight lies in exploiting the
complementarity of 3D and 2D priors to mutually enhance geometry and semantics.
Specifically, we propose to leverage noisy semantic and instance labels in both
3D and 2D spaces to guide geometry optimization. Simultaneously, the improved
geometry assists in filtering noise present in the 3D and 2D annotations by
merging them in 3D space via a learned semantic field. To further enhance
appearance, we combine MLP and hash grids to yield hybrid scene features,
striking a balance between high-frequency appearance and predominantly
contiguous semantics. Our experiments demonstrate PanopticNeRF-360's
state-of-the-art performance over existing label transfer methods on the
challenging urban scenes of the KITTI-360 dataset. Moreover, PanopticNeRF-360
enables omnidirectional rendering of high-fidelity, multi-view and
spatiotemporally consistent appearance, semantic and instance labels. We make
our code and data available at https://github.com/fuxiao0719/PanopticNeRFComment: Project page: http://fuxiao0719.github.io/projects/panopticnerf360/.
arXiv admin note: text overlap with arXiv:2203.1522
- …