1,975 research outputs found
The Second Monocular Depth Estimation Challenge
This paper discusses the results for the second edition of the Monocular
Depth Estimation Challenge (MDEC). This edition was open to methods using any
form of supervision, including fully-supervised, self-supervised, multi-task or
proxy depth. The challenge was based around the SYNS-Patches dataset, which
features a wide diversity of environments with high-quality dense ground-truth.
This includes complex natural environments, e.g. forests or fields, which are
greatly underrepresented in current benchmarks.
The challenge received eight unique submissions that outperformed the
provided SotA baseline on any of the pointcloud- or image-based metrics. The
top supervised submission improved relative F-Score by 27.62%, while the top
self-supervised improved it by 16.61%. Supervised submissions generally
leveraged large collections of datasets to improve data diversity.
Self-supervised submissions instead updated the network architecture and
pretrained backbones. These results represent a significant progress in the
field, while highlighting avenues for future research, such as reducing
interpolation artifacts at depth boundaries, improving self-supervised indoor
performance and overall natural image accuracy.Comment: Published at CVPRW202
The Cityscapes Dataset for Semantic Urban Scene Understanding
Visual understanding of complex urban street scenes is an enabling factor for
a wide range of applications. Object detection has benefited enormously from
large-scale datasets, especially in the context of deep learning. For semantic
urban scene understanding, however, no current dataset adequately captures the
complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale
dataset to train and test approaches for pixel-level and instance-level
semantic labeling. Cityscapes is comprised of a large, diverse set of stereo
video sequences recorded in streets from 50 different cities. 5000 of these
images have high quality pixel-level annotations; 20000 additional images have
coarse annotations to enable methods that leverage large volumes of
weakly-labeled data. Crucially, our effort exceeds previous attempts in terms
of dataset size, annotation richness, scene variability, and complexity. Our
accompanying empirical study provides an in-depth analysis of the dataset
characteristics, as well as a performance evaluation of several
state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia
SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes
Self-supervised monocular depth estimation has shown impressive results in
static scenes. It relies on the multi-view consistency assumption for training
networks, however, that is violated in dynamic object regions and occlusions.
Consequently, existing methods show poor accuracy in dynamic scenes, and the
estimated depth map is blurred at object boundaries because they are usually
occluded in other training views. In this paper, we propose SC-DepthV3 for
addressing the challenges. Specifically, we introduce an external pretrained
monocular depth estimation model for generating single-image depth prior,
namely pseudo-depth, based on which we propose novel losses to boost
self-supervised training. As a result, our model can predict sharp and accurate
depth maps, even when training from monocular videos of highly-dynamic scenes.
We demonstrate the significantly superior performance of our method over
previous methods on six challenging datasets, and we provide detailed ablation
studies for the proposed terms. Source code and data will be released at
https://github.com/JiawangBian/sc_depth_plComment: Under Review; The code will be available at
https://github.com/JiawangBian/sc_depth_p
- …