Depth estimation is a critical technology in autonomous driving, and
multi-camera systems are often used to achieve a 360β perception. These
360β camera sets often have limited or low-quality overlap regions,
making multi-view stereo methods infeasible for the entire image.
Alternatively, monocular methods may not produce consistent cross-view
predictions. To address these issues, we propose the Stereo Guided Depth
Estimation (SGDE) method, which enhances depth estimation of the full image by
explicitly utilizing multi-view stereo results on the overlap. We suggest
building virtual pinhole cameras to resolve the distortion problem of fisheye
cameras and unify the processing for the two types of 360β cameras. For
handling the varying noise on camera poses caused by unstable movement, the
approach employs a self-calibration method to obtain highly accurate relative
poses of the adjacent cameras with minor overlap. These enable the use of
robust stereo methods to obtain high-quality depth prior in the overlap region.
This prior serves not only as an additional input but also as pseudo-labels
that enhance the accuracy of depth estimation methods and improve cross-view
prediction consistency. The effectiveness of SGDE is evaluated on one fisheye
camera dataset, Synthetic Urban, and two pinhole camera datasets, DDAD and
nuScenes. Our experiments demonstrate that SGDE is effective for both
supervised and self-supervised depth estimation, and highlight the potential of
our method for advancing downstream autonomous driving technologies, such as 3D
object detection and occupancy prediction