5 research outputs found
SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Multiple Environments
Different environments pose a great challenge to the outdoor robust visual
perception for long-term autonomous driving and the generalization of
learning-based algorithms on different environmental effects is still an open
problem. Although monocular depth prediction has been well studied recently,
there is few work focusing on the robust learning-based depth prediction across
different environments, e.g. changing illumination and seasons, owing to the
lack of such a multi-environment real-world dataset and benchmark. To this end,
the first cross-season monocular depth prediction dataset and benchmark
SeasonDepth is built based on CMU Visual Localization dataset. To benchmark the
depth estimation performance under different environments, we investigate
representative and recent state-of-the-art open-source supervised,
self-supervised and domain adaptation depth prediction methods from KITTI
benchmark using several newly-formulated metrics. Through extensive
experimental evaluation on the proposed dataset, the influence of multiple
environments on performance and robustness is analyzed qualitatively and
quantitatively, showing that the long-term monocular depth prediction is still
challenging even with fine-tuning. We further give promising avenues that
self-supervised training and stereo geometry constraint help to enhance the
robustness to changing environments. The dataset is available on
https://seasondepth.github.io, and benchmark toolkit is available on
https://github.com/SeasonDepth/SeasonDepth.Comment: 19 pages, 13 figure
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints
We present a new learning-based method for multi-frame depth estimation from
a color video, which is a fundamental problem in scene understanding, robot
navigation or handheld 3D reconstruction. While recent learning-based methods
estimate depth at high accuracy, 3D point clouds exported from their depth maps
often fail to preserve important geometric feature (e.g., corners, edges,
planes) of man-made scenes. Widely-used pixel-wise depth errors do not
specifically penalize inconsistency on these features. These inaccuracies are
particularly severe when subsequent depth reconstructions are accumulated in an
attempt to scan a full environment with man-made objects with this kind of
features. Our depth estimation algorithm therefore introduces a Combined Normal
Map (CNM) constraint, which is designed to better preserve high-curvature
features and global planar regions. In order to further improve the depth
estimation accuracy, we introduce a new occlusion-aware strategy that
aggregates initial depth predictions from multiple adjacent views into one
final depth map and one occlusion probability map for the current reference
view. Our method outperforms the state-of-the-art in terms of depth estimation
accuracy, and preserves essential geometric features of man-made indoor scenes
much better than other algorithms.Comment: ECCV 202