UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D
  Representation for 3D Perception in Autonomous Driving

Guo, Zhenhua; Huang, Tianyu; Yang, Guanglei; Zou, Jian; Zuo, Wangmeng

UniM $^2$ AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Authors: Zhenhua Guo
Tianyu Huang
Guanglei Yang
Jian Zou
Wangmeng Zuo
Publication date: 20 August 2023
Publisher

Abstract

Masked Autoencoders (MAE) play a pivotal role in learning potent representations, delivering outstanding results across various 3D perception tasks essential for autonomous driving. In real-world driving scenarios, it's commonplace to deploy multiple sensors for comprehensive environment perception. While integrating multi-modal features from these sensors can produce rich and powerful features, there is a noticeable gap in MAE methods addressing this integration. This research delves into multi-modal Masked Autoencoders tailored for a unified representation space in autonomous driving, aiming to pioneer a more efficient fusion of two distinct modalities. To intricately marry the semantics inherent in images with the geometric intricacies of LiDAR point clouds, the UniM

^2

AE is proposed. This model stands as a potent yet straightforward, multi-modal self-supervised pre-training framework, mainly consisting of two designs. First, it projects the features from both modalities into a cohesive 3D volume space, ingeniously expanded from the bird's eye view (BEV) to include the height dimension. The extension makes it possible to back-project the informative features, obtained by fusing features from both modalities, into their native modalities to reconstruct the multiple masked inputs. Second, the Multi-modal 3D Interactive Module (MMIM) is invoked to facilitate the efficient inter-modal interaction during the interaction process. Extensive experiments conducted on the nuScenes Dataset attest to the efficacy of UniM

^2

AE, indicating enhancements in 3D object detection and BEV map segmentation by 1.2\%(NDS) and 6.5\% (mIoU), respectively. Code is available at https://github.com/hollow-503/UniM2AE.Comment: Code available at https://github.com/hollow-503/UniM2A

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.10421

Last time updated on 24/08/2023