1,493 research outputs found
See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation
Sampling discrepancies between different manufacturers and models of lidar
sensors result in inconsistent representations of objects. This leads to
performance degradation when 3D detectors trained for one lidar are tested on
other types of lidars. Remarkable progress in lidar manufacturing has brought
about advances in mechanical, solid-state, and recently, adjustable scan
pattern lidars. For the latter, existing works often require fine-tuning the
model each time scan patterns are adjusted, which is infeasible. We explicitly
deal with the sampling discrepancy by proposing a novel unsupervised
multi-target domain adaptation framework, SEE, for transferring the performance
of state-of-the-art 3D detectors across both fixed and flexible scan pattern
lidars without requiring fine-tuning of models by end-users. Our approach
interpolates the underlying geometry and normalizes the scan pattern of objects
from different lidars before passing them to the detection network. We
demonstrate the effectiveness of SEE on public datasets, achieving
state-of-the-art results, and additionally provide quantitative results on a
novel high-resolution lidar to prove the industry applications of our
framework. This dataset and our code will be made publicly available
GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving
for its powerful spatial representation ability. It is challenging to estimate
the BEV semantic maps from monocular images due to the spatial gap, since it is
implicitly required to realize both the perspective-to-BEV transformation and
segmentation. We present a novel two-stage Geometry Prior-based Transformation
framework named GitNet, consisting of (i) the geometry-guided pre-alignment and
(ii) ray-based transformer. In the first stage, we decouple the BEV
segmentation into the perspective image segmentation and geometric prior-based
mapping, with explicit supervision by projecting the BEV semantic labels onto
the image plane to learn visibility-aware features and learnable geometry to
translate into BEV space. Second, the pre-aligned coarse BEV features are
further deformed by ray-based transformers to take visibility knowledge into
account. GitNet achieves the leading performance on the challenging nuScenes
and Argoverse Datasets. The code will be publicly available
Multimodal Panoptic Segmentation of 3D Point Clouds
The understanding and interpretation of complex 3D environments is a key challenge of autonomous driving. Lidar sensors and their recorded point clouds are particularly interesting for this challenge since they provide accurate 3D information about the environment. This work presents a multimodal approach based on deep learning for panoptic segmentation of 3D point clouds. It builds upon and combines the three key aspects multi view architecture, temporal feature fusion, and deep sensor fusion
Moby: Empowering 2D Models for Efficient Point Cloud Analytics on the Edge
3D object detection plays a pivotal role in many applications, most notably
autonomous driving and robotics. These applications are commonly deployed on
edge devices to promptly interact with the environment, and often require near
real-time response. With limited computation power, it is challenging to
execute 3D detection on the edge using highly complex neural networks. Common
approaches such as offloading to the cloud induce significant latency overheads
due to the large amount of point cloud data during transmission. To resolve the
tension between wimpy edge devices and compute-intensive inference workloads,
we explore the possibility of empowering fast 2D detection to extrapolate 3D
bounding boxes. To this end, we present Moby, a novel system that demonstrates
the feasibility and potential of our approach. We design a transformation
pipeline for Moby that generates 3D bounding boxes efficiently and accurately
based on 2D detection results without running 3D detectors. Further, we devise
a frame offloading scheduler that decides when to launch the 3D detector
judiciously in the cloud to avoid the errors from accumulating. Extensive
evaluations on NVIDIA Jetson TX2 with real-world autonomous driving datasets
demonstrate that Moby offers up to 91.9% latency improvement with modest
accuracy loss over state of the art.Comment: Accepted to ACM International Conference on Multimedia (MM) 202
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
In this paper, we focus on category-level 6D pose and size estimation from
monocular RGB-D image. Previous methods suffer from inefficient category-level
pose feature extraction which leads to low accuracy and inference speed. To
tackle this problem, we propose a fast shape-based network (FS-Net) with
efficient category-level feature extraction for 6D pose estimation. First, we
design an orientation aware autoencoder with 3D graph convolution for latent
feature extraction. The learned latent feature is insensitive to point shift
and object size thanks to the shift and scale-invariance properties of the 3D
graph convolution. Then, to efficiently decode category-level rotation
information from the latent feature, we propose a novel decoupled rotation
mechanism that employs two decoders to complementarily access the rotation
information. Meanwhile, we estimate translation and size by two residuals,
which are the difference between the mean of object points and ground truth
translation, and the difference between the mean size of the category and
ground truth size, respectively. Finally, to increase the generalization
ability of FS-Net, we propose an online box-cage based 3D deformation mechanism
to augment the training data. Extensive experiments on two benchmark datasets
show that the proposed method achieves state-of-the-art performance in both
category- and instance-level 6D object pose estimation. Especially in
category-level pose estimation, without extra synthetic data, our method
outperforms existing methods by 6.3% on the NOCS-REAL dataset.Comment: accepted by CVPR2021, ora
- …