DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

Chen, Yilun; Huang, Shijia; Jia, Jiaya; Liu, Shu; Yu, Bei

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

Authors: Yilun Chen
Shijia Huang
Jiaya Jia
Shu Liu
Bei Yu
Publication date: 9 April 2022
Publisher

Abstract

Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We revisit the prior stereo modeling DSGN about the stereo volume constructions for representing both 3D geometry and semantics. We polish the stereo modeling and propose our approach, DSGN++, aiming for improving information flow throughout the 2D-to-3D pipeline in the following three main aspects. First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features. Second, for better grasping differently spaced features, we present a novel stereo volume -- Dual-view Stereo Volume (DSV) that integrates front-view and top-view features and reconstructs sub-voxel depth in the camera frustum. Third, as the foreground region becomes less dominant in 3D space, we firstly propose a multi-modal data editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal alignment and improves data efficiency. Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that our method consistently outperforms other camera-based 3D detectors for all categories. Code will be released at https://github.com/chenyilun95/DSGN2

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2204.03039

Last time updated on 28/04/2022