112 research outputs found
CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity
Roadside camera-driven 3D object detection is a crucial task in intelligent
transportation systems, which extends the perception range beyond the
limitations of vision-centric vehicles and enhances road safety. While previous
studies have limitations in using only depth or height information, we find
both depth and height matter and they are in fact complementary. The depth
feature encompasses precise geometric cues, whereas the height feature is
primarily focused on distinguishing between various categories of height
intervals, essentially providing semantic context. This insight motivates the
development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D
object detection framework that integrates depth and height to construct robust
BEV representations. In essence, CoBEV estimates each pixel's depth and height
distribution and lifts the camera features into 3D space for lateral fusion
using the newly proposed two-stage complementary feature selection (CFS)
module. A BEV feature distillation framework is also seamlessly integrated to
further enhance the detection accuracy from the prior knowledge of the
fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D
detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as
the private Supremind-Road dataset, demonstrating that CoBEV not only achieves
the accuracy of the new state-of-the-art, but also significantly advances the
robustness of previous methods in challenging long-distance scenarios and noisy
camera disturbance, and enhances generalization by a large margin in
heterologous settings with drastic changes in scene and camera parameters. For
the first time, the vehicle AP score of a camera model reaches 80% on
DAIR-V2X-I in terms of easy mode. The source code will be made publicly
available at https://github.com/MasterHow/CoBEV.Comment: The source code will be made publicly available at
https://github.com/MasterHow/CoBE
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification
We introduce MultiDepth, a novel training strategy and convolutional neural
network (CNN) architecture that allows approaching single-image depth
estimation (SIDE) as a multi-task problem. SIDE is an important part of road
scene understanding. It, thus, plays a vital role in advanced driver assistance
systems and autonomous vehicles. Best results for the SIDE task so far have
been achieved using deep CNNs. However, optimization of regression problems,
such as estimating depth, is still a challenging task. For the related tasks of
image classification and semantic segmentation, numerous CNN-based methods with
robust training behavior have been proposed. Hence, in order to overcome the
notorious instability and slow convergence of depth value regression during
training, MultiDepth makes use of depth interval classification as an auxiliary
task. The auxiliary task can be disabled at test-time to predict continuous
depth values using the main regression branch more efficiently. We applied
MultiDepth to road scenes and present results on the KITTI depth prediction
dataset. In experiments, we were able to show that end-to-end multi-task
learning with both, regression and classification, is able to considerably
improve training and yield more accurate results.Comment: Accepted for presentation at the IEEE Intelligent Transportation
Systems Conference (ITSC) 201
NENet: Monocular Depth Estimation via Neural Ensembles
Depth estimation is getting a widespread popularity in the computer vision
community, and it is still quite difficult to recover an accurate depth map
using only one single RGB image. In this work, we observe a phenomenon that
existing methods tend to exhibit asymmetric errors, which might open up a new
direction for accurate and robust depth estimation. We carefully investigate
into the phenomenon, and construct a two-level ensemble scheme, NENet, to
integrate multiple predictions from diverse base predictors. The NENet forms a
more reliable depth estimator, which substantially boosts the performance over
base predictors. Notably, this is the first attempt to introduce ensemble
learning and evaluate its utility for monocular depth estimation to the best of
our knowledge. Extensive experiments demonstrate that the proposed NENet
achieves better results than previous state-of-the-art approaches on the
NYU-Depth-v2 and KITTI datasets. In particular, our method improves previous
state-of-the-art methods from 0.365 to 0.349 on the metric RMSE on the NYU
dataset. To validate the generalizability across cameras, we directly apply the
models trained on the NYU dataset to the SUN RGB-D dataset without any
fine-tuning, and achieve the superior results, which indicate its strong
generalizability. The source code and trained models will be publicly available
upon the acceptance
NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion
Over the past few years, monocular depth estimation and completion have been
paid more and more attention from the computer vision community because of
their widespread applications. In this paper, we introduce novel physics
(geometry)-driven deep learning frameworks for these two tasks by assuming that
3D scenes are constituted with piece-wise planes. Instead of directly
estimating the depth map or completing the sparse depth map, we propose to
estimate the surface normal and plane-to-origin distance maps or complete the
sparse surface normal and distance maps as intermediate outputs. To this end,
we develop a normal-distance head that outputs pixel-level surface normal and
distance. Meanwhile, the surface normal and distance maps are regularized by a
developed plane-aware consistency constraint, which are then transformed into
depth maps. Furthermore, we integrate an additional depth head to strengthen
the robustness of the proposed frameworks. Extensive experiments on the
NYU-Depth-v2, KITTI and SUN RGB-D datasets demonstrate that our method exceeds
in performance prior state-of-the-art monocular depth estimation and completion
competitors. The source code will be available at
https://github.com/ShuweiShao/NDDepth.Comment: Extension of previous work arXiv:2309.1059
- …