Search CORE

1,493 research outputs found

See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation

Author: Berrio Julie Stephany
Nebot Eduardo
Shan Mao
Tsai Darren
Worrall Stewart
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/11/2021
Field of study

Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the latter, existing works often require fine-tuning the model each time scan patterns are adjusted, which is infeasible. We explicitly deal with the sampling discrepancy by proposing a novel unsupervised multi-target domain adaptation framework, SEE, for transferring the performance of state-of-the-art 3D detectors across both fixed and flexible scan pattern lidars without requiring fine-tuning of models by end-users. Our approach interpolates the underlying geometry and normalizes the scan pattern of objects from different lidars before passing them to the detection network. We demonstrate the effectiveness of SEE on public datasets, achieving state-of-the-art results, and additionally provide quantitative results on a novel high-resolution lidar to prove the industry applications of our framework. This dataset and our code will be made publicly available

arXiv.org e-Print Archive

GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation

Author: Bai Xiang
Ding Errui
Gong Shi
Tan Xiao
Wang Jingdong
Ye Xiaoqing
Zhou Yu
Publication venue
Publication date: 16/04/2022
Field of study

Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving for its powerful spatial representation ability. It is challenging to estimate the BEV semantic maps from monocular images due to the spatial gap, since it is implicitly required to realize both the perspective-to-BEV transformation and segmentation. We present a novel two-stage Geometry Prior-based Transformation framework named GitNet, consisting of (i) the geometry-guided pre-alignment and (ii) ray-based transformer. In the first stage, we decouple the BEV segmentation into the perspective image segmentation and geometric prior-based mapping, with explicit supervision by projecting the BEV semantic labels onto the image plane to learn visibility-aware features and learnable geometry to translate into BEV space. Second, the pre-aligned coarse BEV features are further deformed by ray-based transformers to take visibility knowledge into account. GitNet achieves the leading performance on the challenging nuScenes and Argoverse Datasets. The code will be publicly available

arXiv.org e-Print Archive

Multimodal Panoptic Segmentation of 3D Point Clouds

Author: Dürr Fabian
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2023
Field of study

The understanding and interpretation of complex 3D environments is a key challenge of autonomous driving. Lidar sensors and their recorded point clouds are particularly interesting for this challenge since they provide accurate 3D information about the environment. This work presents a multimodal approach based on deep learning for panoptic segmentation of 3D point clouds. It builds upon and combines the three key aspects multi view architecture, temporal feature fusion, and deep sensor fusion

KITopen

Moby: Empowering 2D Models for Efficient Point Cloud Analytics on the Edge

Author: Cai Yik Hong
Li Jingzong
Liu Libin
Mao Yu
Xu Hong
Xue Chun Jason
Publication venue
Publication date: 04/09/2023
Field of study

3D object detection plays a pivotal role in many applications, most notably autonomous driving and robotics. These applications are commonly deployed on edge devices to promptly interact with the environment, and often require near real-time response. With limited computation power, it is challenging to execute 3D detection on the edge using highly complex neural networks. Common approaches such as offloading to the cloud induce significant latency overheads due to the large amount of point cloud data during transmission. To resolve the tension between wimpy edge devices and compute-intensive inference workloads, we explore the possibility of empowering fast 2D detection to extrapolate 3D bounding boxes. To this end, we present Moby, a novel system that demonstrates the feasibility and potential of our approach. We design a transformation pipeline for Moby that generates 3D bounding boxes efficiently and accurately based on 2D detection results without running 3D detectors. Further, we devise a frame offloading scheduler that decides when to launch the 3D detector judiciously in the cloud to avoid the errors from accumulating. Extensive evaluations on NVIDIA Jetson TX2 with real-world autonomous driving datasets demonstrate that Moby offers up to 91.9% latency improvement with modest accuracy loss over state of the art.Comment: Accepted to ACM International Conference on Multimedia (MM) 202

arXiv.org e-Print Archive

Multimodal Panoptic Segmentation of 3D Point Clouds

Author: Dürr Fabian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 27/07/2023
Field of study

KITopen

FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

Author: Chang Hyung Jin
Chen Wei
Duan Jinming
Jia Xi
Leonardis Ales
Shen Linlin
Publication venue
Publication date: 06/06/2021
Field of study

In this paper, we focus on category-level 6D pose and size estimation from monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware autoencoder with 3D graph convolution for latent feature extraction. The learned latent feature is insensitive to point shift and object size thanks to the shift and scale-invariance properties of the 3D graph convolution. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. Meanwhile, we estimate translation and size by two residuals, which are the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset.Comment: accepted by CVPR2021, ora

arXiv.org e-Print Archive

University of Birmingham Research Portal