Search CORE

3,604 research outputs found

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

Author: Li Peixuan
Su Shun
Zhao Huaici
Publication venue
Publication date: 30/12/2020
Field of study

Although the recent image-based 3D object detection methods using Pseudo-LiDAR representation have shown great capabilities, a notable gap in efficiency and accuracy still exist compared with LiDAR-based methods. Besides, over-reliance on the stand-alone depth estimator, requiring a large number of pixel-wise annotations in the training stage and more computation in the inferencing stage, limits the scaling application in the real world. In this paper, we propose an efficient and accurate 3D object detection method from stereo images, named RTS3D. Different from the 3D occupancy space in the Pseudo-LiDAR similar methods, we design a novel 4D feature-consistent embedding (FCE) space as the intermediate representation of the 3D scene without depth supervision. The FCE space encodes the object's structural and semantic information by exploring the multi-scale feature consistency warped from stereo pair. Furthermore, a semantic-guided RBF (Radial Basis Function) and a structure-aware attention module are devised to reduce the influence of FCE space noise without instance mask supervision. Experiments on the KITTI benchmark show that RTS3D is the first true real-time system (FPS

>

24) for stereo image 3D detection meanwhile achieves

10\%

improvement in average precision comparing with the previous state-of-the-art method. The code will be available at https://github.com/Banconxuan/RTS3DComment: 9 pages,6 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Guided Stereo Matching

Author: Mattoccia Stefano
Pallotti Davide
Poggi Matteo
Tosi Fabio
Publication venue
Publication date: 01/01/2019
Field of study

Stereo is a prominent technique to infer dense depth maps from images, and deep learning further pushed forward the state-of-the-art, making end-to-end architectures unrivaled when enough data is available for training. However, deep networks suffer from significant drops in accuracy when dealing with new environments. Therefore, in this paper, we introduce Guided Stereo Matching, a novel paradigm leveraging a small amount of sparse, yet reliable depth measurements retrieved from an external source enabling to ameliorate this weakness. The additional sparse cues required by our method can be obtained with any strategy (e.g., a LiDAR) and used to enhance features linked to corresponding disparity hypotheses. Our formulation is general and fully differentiable, thus enabling to exploit the additional sparse inputs in pre-trained deep stereo networks as well as for training a new instance from scratch. Extensive experiments on three standard datasets and two state-of-the-art deep architectures show that even with a small set of sparse input cues, i) the proposed paradigm enables significant improvements to pre-trained networks. Moreover, ii) training from scratch notably increases accuracy and robustness to domain shifts. Finally, iii) it is suited and effective even with traditional stereo algorithms such as SGM.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

SGM3D: Stereo Guided Monocular 3D Object Detection

Author: Du Liang
Feng Jianfeng
Tan Xiao
Xue Xiangyang
Ye Xiaoqing
Zhang Li
Zhou Zheyuan
Zou Zhikang
Publication venue
Publication date: 24/02/2022
Field of study

Monocular 3D object detection aims to predict the object location, dimension and orientation in 3D space alongside the object category given only a monocular image. It poses a great challenge due to its ill-posed property which is critically lack of depth information in the 2D image plane. While there exist approaches leveraging off-the-shelve depth estimation or relying on LiDAR sensors to mitigate this problem, the dependence on the additional depth model or expensive equipment severely limits their scalability to generic 3D perception. In this paper, we propose a stereo-guided monocular 3D object detection framework, dubbed SGM3D, adapting the robust 3D features learned from stereo inputs to enhance the feature for monocular detection. We innovatively present a multi-granularity domain adaptation (MG-DA) mechanism to exploit the network's ability to generate stereo-mimicking features given only on monocular cues. Coarse BEV feature-level, as well as the fine anchor-level domain adaptation, are both leveraged for guidance in the monocular domain.In addition, we introduce an IoU matching-based alignment (IoU-MA) method for object-level domain adaptation between the stereo and monocular predictions to alleviate the mismatches while adopting the MG-DA. Extensive experiments demonstrate state-of-the-art results on KITTI and Lyft datasets.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive