9 research outputs found
Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation
Monocular 3D object detection task aims to predict the 3D bounding boxes of
objects based on monocular RGB images. Since the location recovery in 3D space
is quite difficult on account of absence of depth information, this paper
proposes a novel unified framework which decomposes the detection problem into
a structured polygon prediction task and a depth recovery task. Different from
the widely studied 2D bounding boxes, the proposed novel structured polygon in
the 2D image consists of several projected surfaces of the target object.
Compared to the widely-used 3D bounding box proposals, it is shown to be a
better representation for 3D detection. In order to inversely project the
predicted 2D structured polygon to a cuboid in the 3D physical world, the
following depth recovery task uses the object height prior to complete the
inverse projection transformation with the given camera projection matrix.
Moreover, a fine-grained 3D box refinement scheme is proposed to further
rectify the 3D detection results. Experiments are conducted on the challenging
KITTI benchmark, in which our method achieves state-of-the-art detection
accuracy.Comment: 11 pages, 8 figures, AAAI202
M3DSSD: Monocular 3D Single Stage Object Detector
In this paper, we propose a Monocular 3D Single Stage object Detector
(M3DSSD) with feature alignment and asymmetric non-local attention. Current
anchor-based monocular 3D object detection methods suffer from feature
mismatching. To overcome this, we propose a two-step feature alignment
approach. In the first step, the shape alignment is performed to enable the
receptive field of the feature map to focus on the pre-defined anchors with
high confidence scores. In the second step, the center alignment is used to
align the features at 2D/3D centers. Further, it is often difficult to learn
global information and capture long-range relationships, which are important
for the depth prediction of objects. Therefore, we propose a novel asymmetric
non-local attention block with multi-scale sampling to extract depth-wise
features. The proposed M3DSSD achieves significantly better performance than
the monocular 3D object detection methods on the KITTI dataset, in both 3D
object detection and bird's eye view tasks.Comment: Accepted to CVPR 202
Rethinking Pseudo-LiDAR Representation
The recently proposed pseudo-LiDAR based 3D detectors greatly improve the
benchmark of monocular/stereo 3D detection task. However, the underlying
mechanism remains obscure to the research community. In this paper, we perform
an in-depth investigation and observe that the efficacy of pseudo-LiDAR
representation comes from the coordinate transformation, instead of data
representation itself. Based on this observation, we design an image based CNN
detector named Patch-Net, which is more generalized and can be instantiated as
pseudo-LiDAR based 3D detectors. Moreover, the pseudo-LiDAR data in our
PatchNet is organized as the image representation, which means existing 2D CNN
designs can be easily utilized for extracting deep features from input data and
boosting 3D detection performance. We conduct extensive experiments on the
challenging KITTI dataset, where the proposed PatchNet outperforms all existing
pseudo-LiDAR based counterparts. Code has been made available at:
https://github.com/xinzhuma/patchnet.Comment: ECCV2020. Supplemental Material attache