523 research outputs found
Multimodal Panoptic Segmentation of 3D Point Clouds
The understanding and interpretation of complex 3D environments is a key challenge of autonomous driving. Lidar sensors and their recorded point clouds are particularly interesting for this challenge since they provide accurate 3D information about the environment. This work presents a multimodal approach based on deep learning for panoptic segmentation of 3D point clouds. It builds upon and combines the three key aspects multi view architecture, temporal feature fusion, and deep sensor fusion
Improved Multi-Scale Grid Rendering of Point Clouds for Radar Object Detection Networks
Architectures that first convert point clouds to a grid representation and
then apply convolutional neural networks achieve good performance for
radar-based object detection. However, the transfer from irregular point cloud
data to a dense grid structure is often associated with a loss of information,
due to the discretization and aggregation of points. In this paper, we propose
a novel architecture, multi-scale KPPillarsBEV, that aims to mitigate the
negative effects of grid rendering. Specifically, we propose a novel grid
rendering method, KPBEV, which leverages the descriptive power of kernel point
convolutions to improve the encoding of local point cloud contexts during grid
rendering. In addition, we propose a general multi-scale grid rendering
formulation to incorporate multi-scale feature maps into convolutional
backbones of detection networks with arbitrary grid rendering methods. We
perform extensive experiments on the nuScenes dataset and evaluate the methods
in terms of detection performance and computational complexity. The proposed
multi-scale KPPillarsBEV architecture outperforms the baseline by 5.37% and the
previous state of the art by 2.88% in Car AP4.0 (average precision for a
matching threshold of 4 meters) on the nuScenes validation set. Moreover, the
proposed single-scale KPBEV grid rendering improves the Car AP4.0 by 2.90% over
the baseline while maintaining the same inference speed.Comment: (c) 2023 IEEE. Personal use of this material is permitted. Permission
from IEEE must be obtained for all other uses, in any current or future
media, including reprinting/republishing this material for advertising or
promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Multimodal Content Delivery for Geo-services
This thesis describes a body of work carried out over several research projects in the area of multimodal interaction for location-based services. Research in this area has progressed from using simulated mobile environments to demonstrate the visual modality, to the ubiquitous delivery of rich media using multimodal interfaces (geo- services). To effectively deliver these services, research focused on innovative solutions to real-world problems in a number of disciplines including geo-location, mobile spatial interaction, location-based services, rich media interfaces and auditory user interfaces. My original contributions to knowledge are made in the areas of multimodal interaction underpinned by advances in geo-location technology and supported by the proliferation of mobile device technology into modern life. Accurate positioning is a known problem for location-based services, contributions in the area of mobile positioning demonstrate a hybrid positioning technology for mobile devices that uses terrestrial beacons to trilaterate position. Information overload is an active concern for location-based applications that struggle to manage large amounts of data, contributions in the area of egocentric visibility that filter data based on field-of-view demonstrate novel forms of multimodal input. One of the more pertinent characteristics of these applications is the delivery or output modality employed (auditory, visual or tactile). Further contributions in the area of multimodal content delivery are made, where multiple modalities are used to deliver information using graphical user interfaces, tactile interfaces and more notably auditory user interfaces. It is demonstrated how a combination of these interfaces can be used to synergistically deliver context sensitive rich media to users - in a responsive way - based on usage scenarios that consider the affordance of the device, the geographical position and bearing of the device and also the location of the device
Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection
LiDAR-based 3D Object detectors have achieved impressive performances in many
benchmarks, however, multisensors fusion-based techniques are promising to
further improve the results. PointPainting, as a recently proposed framework,
can add the semantic information from the 2D image into the 3D LiDAR point by
the painting operation to boost the detection performance. However, due to the
limited resolution of 2D feature maps, severe boundary-blurring effect happens
during re-projection of 2D semantic segmentation into the 3D point clouds. To
well handle this limitation, a general multimodal fusion framework MSF has been
proposed to fuse the semantic information from both the 2D image and 3D points
scene parsing results. Specifically, MSF includes three main modules. First,
SOTA off-the-shelf 2D/3D semantic segmentation approaches are employed to
generate the parsing results for 2D images and 3D point clouds. The 2D semantic
information is further re-projected into the 3D point clouds with calibrated
parameters. To handle the misalignment between the 2D and 3D parsing results,
an AAF module is proposed to fuse them by learning an adaptive fusion score.
Then the point cloud with the fused semantic label is sent to the following 3D
object detectors. Furthermore, we propose a DFF module to aggregate deep
features in different levels to boost the final detection performance. The
effectiveness of the framework has been verified on two public large-scale 3D
object detection benchmarks by comparing with different baselines. The
experimental results show that the proposed fusion strategies can significantly
improve the detection performance compared to the methods using only point
clouds and the methods using only 2D semantic information. Most importantly,
the proposed approach significantly outperforms other approaches and sets new
SOTA results on the nuScenes testing benchmark.Comment: Submitted to T-ITS Journa
- …