168 research outputs found
ADU-Depth: Attention-based Distillation with Uncertainty Modeling for Depth Estimation
Monocular depth estimation is challenging due to its inherent ambiguity and
ill-posed nature, yet it is quite important to many applications. While recent
works achieve limited accuracy by designing increasingly complicated networks
to extract features with limited spatial geometric cues from a single RGB
image, we intend to introduce spatial cues by training a teacher network that
leverages left-right image pairs as inputs and transferring the learned 3D
geometry-aware knowledge to the monocular student network. Specifically, we
present a novel knowledge distillation framework, named ADU-Depth, with the
goal of leveraging the well-trained teacher network to guide the learning of
the student network, thus boosting the precise depth estimation with the help
of extra spatial scene information. To enable domain adaptation and ensure
effective and smooth knowledge transfer from teacher to student, we apply both
attention-adapted feature distillation and focal-depth-adapted response
distillation in the training stage. In addition, we explicitly model the
uncertainty of depth estimation to guide distillation in both feature space and
result space to better produce 3D-aware knowledge from monocular observations
and thus enhance the learning for hard-to-predict image regions. Our extensive
experiments on the real depth estimation datasets KITTI and DrivingStereo
demonstrate the effectiveness of the proposed method, which ranked 1st on the
challenging KITTI online benchmark.Comment: accepted by CoRL 202
Guided Stereo Matching
Stereo is a prominent technique to infer dense depth maps from images, and
deep learning further pushed forward the state-of-the-art, making end-to-end
architectures unrivaled when enough data is available for training. However,
deep networks suffer from significant drops in accuracy when dealing with new
environments. Therefore, in this paper, we introduce Guided Stereo Matching, a
novel paradigm leveraging a small amount of sparse, yet reliable depth
measurements retrieved from an external source enabling to ameliorate this
weakness. The additional sparse cues required by our method can be obtained
with any strategy (e.g., a LiDAR) and used to enhance features linked to
corresponding disparity hypotheses. Our formulation is general and fully
differentiable, thus enabling to exploit the additional sparse inputs in
pre-trained deep stereo networks as well as for training a new instance from
scratch. Extensive experiments on three standard datasets and two
state-of-the-art deep architectures show that even with a small set of sparse
input cues, i) the proposed paradigm enables significant improvements to
pre-trained networks. Moreover, ii) training from scratch notably increases
accuracy and robustness to domain shifts. Finally, iii) it is suited and
effective even with traditional stereo algorithms such as SGM.Comment: CVPR 201
- …