23,658 research outputs found
PointPillars Backbone Type Selection For Fast and Accurate LiDAR Object Detection
3D object detection from LiDAR sensor data is an important topic in the
context of autonomous cars and drones. In this paper, we present the results of
experiments on the impact of backbone selection of a deep convolutional neural
network on detection accuracy and computation speed. We chose the PointPillars
network, which is characterised by a simple architecture, high speed, and
modularity that allows for easy expansion. During the experiments, we paid
particular attention to the change in detection efficiency (measured by the mAP
metric) and the total number of multiply-addition operations needed to process
one point cloud. We tested 10 different convolutional neural network
architectures that are widely used in image-based detection problems. For a
backbone like MobilenetV1, we obtained an almost 4x speedup at the cost of a
1.13% decrease in mAP. On the other hand, for CSPDarknet we got an acceleration
of more than 1.5x at an increase in mAP of 0.33%. We have thus demonstrated
that it is possible to significantly speed up a 3D object detector in LiDAR
point clouds with a small decrease in detection efficiency. This result can be
used when PointPillars or similar algorithms are implemented in embedded
systems, including SoC FPGAs. The code is available at
https://github.com/vision-agh/pointpillars\_backbone.Comment: Accepted for the ICCVG 2022 conferenc
ꡬ쑰 κ°μν λ°μ΄ν° μ¦κ° κΈ°λ²κ³Ό νΌν© λ°λ μ κ²½λ§μ μ΄μ©ν λΌμ΄λ€ κΈ°λ° 3μ°¨μ κ°μ²΄ κ²μΆ κ°μ
νμλ
Όλ¬Έ(λ°μ¬) -- μμΈλνκ΅λνμ : μ΅ν©κ³ΌνκΈ°μ λνμ μ΅ν©κ³ΌνλΆ, 2023. 2. κ³½λ
Έμ€.μμ¨μ£Όνμλμ°¨, λ‘λ΄μ μΈμ μ₯λΉλ‘ λ§μ΄ νμ©λκ³ μλ λΌμ΄λ€ (LiDAR) λ λ μ΄μ νμ€λ₯Ό λ°©μΆνμ¬ λλμμ€λ μκ°μ κ³μ°νμ¬ ν¬μΈνΈ ν΄λΌμ°λ (point cloud) ννλ‘ μ£Όλ³ νκ²½μ κ°μ§νλ€. μ£Όλ³ νκ²½μ κ°μ§ν λ κ°μ₯ μ€ μν λΆλΆμ κ·Όμ²μ μ΄λ€ κ°μ²΄κ° μλμ§, μ΄λμ μμΉν΄ μλμ§λ₯Ό μΈμνλ κ²μ΄κ³ μ΄λ¬ν μμ
μ μννκΈ° μν΄ ν¬μΈνΈ ν΄λΌμ°λλ₯Ό νμ©νλ 3μ°¨μ κ° μ²΄ κ²μΆ κΈ°μ λ€μ΄ λ§μ΄ μ°κ΅¬λκ³ μλ€.
ν¬μΈνΈ ν΄λΌμ°λ λ°μ΄ν°μ μ μ²λ¦¬ λ°©λ²μ λ°λΌ λ§€μ° λ€μν ꡬ쑰μ λ°±λ³Έ λ€νΈμν¬ (backbone network) κ° μ°κ΅¬λκ³ μλ€. κ³ λνλ λ°±λ³Έ λ€νΈμν¬λ€λ‘ μΈν΄ μΈμ μ±λ₯μ ν° λ°μ μ μ΄λ£¨μμ§λ§, μ΄λ€μ ννκ° ν¬κ² λ€λ₯΄κΈ° λλ¬Έμ μλ‘ νΈνμ±μ΄ λΆμ‘±νμ¬ μ°κ΅¬λ€μ κ°λκ° λ§μ΄ λλμ΄μ§κ³ μλ€. λ³Έ λ
Όλ¬Έμ μ νκ³ μνλ λ¬Έμ λ ννΈνλ λ°±λ³Έ λ€νΈμν¬μ ꡬ쑰λ€μ ꡬμ λ°μ§ μκ³ 3μ°¨μ κ°μ²΄ κ²μΆκΈ°μ μ±λ₯μ ν₯μμν¬ λ°©λ²μ΄ μλκ° μ΄λ€. μ΄λ₯Ό μν΄ λ³Έ λ
Όλ¬Έ μμλ ν¬μΈνΈ ν΄λΌμ°λ λ°μ΄ν° κΈ°λ°μ 3μ°¨μ κ°μ²΄ κ²μΆ κΈ°μ μ ν₯μμν€λ λ κ°μ§ λ°©λ²μ μ μνλ€.
첫 λ²μ§Έλ 3μ°¨μ κ²½κ³ μμ (3D bounding box) μ ꡬ쑰μ μΈ μ 보μ νμ©μ μ΅λννλ ꡬ쑰 κ°μν λ°μ΄ν° μ¦κ° (PA-AUG) κΈ°λ²μ΄λ€. 3μ°¨μ κ²½κ³ μμ λΌλ²¨μ κ°μ²΄μ λ± λ§κ² μμ±λκ³ λ°©ν₯κ°μ ν¬ν¨νκΈ° λλ¬Έμ μμ λ΄μ κ°μ²΄μ ꡬ쑰 μ 보λ₯Ό ν¬ν¨νκ³ μλ€. μ΄λ₯Ό νμ©νκΈ° μν΄ μ°λ¦¬λ 3μ°¨μ κ²½κ³ μμλ₯Ό ꡬ쑰 κ°μν νν°μ
μΌλ‘ ꡬλΆνλ λ°©μμ μ μνκ³ , νν°μ
μμ€μμ μνλλ μλ‘μ΄ λ°©μμ λ°μ΄ν° μ¦κ° κΈ°λ²μ μ μνλ€. PA-AUGλ λ€μν ννμ 3μ°¨μ κ°μ²΄ κ²μΆκΈ°λ€μ μ±λ₯μ κ°μΈνκ² λ§λ€μ΄μ£Όκ³ , νμ΅ λ°μ΄ν°λ₯Ό 2.5λ°° μ¦ κ°μν€λ λ§νΌμ μΈμ μ±λ₯ ν₯μ ν¨κ³Όλ₯Ό 보μ¬μ€λ€.
λ λ²μ§Έλ νΌν© λ°λ μ κ²½λ§ κΈ°λ° 3μ°¨μ κ°μ²΄ κ²μΆ (MD3D) κΈ°λ²μ΄λ€. MD3Dλ κ°μ°μκ° νΌν© λͺ¨λΈ (Gaussian Mixture Model) μ μ΄μ©ν΄ 3μ°¨μ κ²½ κ³ μμ νκ· λ¬Έμ λ₯Ό λ°λ μμΈ‘ λ°©μμΌλ‘ μ¬μ μν κΈ°λ²μ΄λ€. μ΄λ¬ν λ°©μμ κΈ°μ‘΄μ λΌλ²¨ ν λΉμμ νμ΅ λ°©λ²λ€κ³Ό λ¬λ¦¬ ν¬μΈνΈ ν΄λΌμ°λ μ μ²λ¦¬ ννμ ꡬμ λ°μ§ μκ³ λμΌν νμ΅ λ°©μμ μ μ©ν μ μλ€. λν κΈ°μ‘΄ λ°©μ λλΉ νμ΅ μ νμν νμ΄νΌ νλΌλ―Έν°κ° νμ ν μ μ΄μ μ΅μ νκ° μ©μ΄νμ¬ μΈμ μ±λ₯μ ν¬κ² λμΌ μ μμ λΏλ§ μλλΌ κ°λ¨ν κ΅¬μ‘°λ‘ μΈν΄ μΈμ μλλ λΉ¨λΌμ§κ² λλ€.
PA-AUGμ MD3Dλ λͺ¨λ λ°±λ³Έ λ€νΈμν¬ κ΅¬μ‘°μ μκ΄μμ΄ λ€μν 3μ°¨μ κ°μ²΄ κ²μΆκΈ°μ 곡ν΅μ μΌλ‘ μ¬μ©λ μ μμΌλ©° λμ μΈμ μ±λ₯ ν₯μμ 보μ¬μ€λ€. λΏλ§ μλλΌ λ κΈ°λ²μ κ²μΆκΈ°μ μλ‘ λ€λ₯Έ μμμ μ μ©λλ κΈ°λ²μ΄λ―λ‘ ν¨κ» λμμ μ¬μ©ν μ μκ³ , ν¨κ» μ¬μ©νμλ μΈμ μ±λ₯μ΄ λμ± ν¬κ² ν₯μλλ€.LiDAR (Light Detection And Ranging), which is widely used as a sensing device for autonomous vehicles and robots, emits laser pulses and calculates the return time to sense the surrounding environment in the form of a point cloud. When recognizing the surrounding environment, the most important part is recognizing what objects are nearby and where they are located, and 3D object detection methods using point clouds have been actively studied to perform these tasks.
Various backbone networks for point cloud-based 3D object detection have been proposed according to the preprocessing method of point cloud data. Although advanced backbone networks have made great strides in detection performance, they are largely different in structure, so there is a lack of compatibility with each other. The problem to be solved in this dissertation is How to improve the performance of 3D object detectors regardless of their diverse backbone network structures?. This dissertation proposes two general methods to improve point cloud-based 3D object detectors.
First, we propose a part-aware data augmentation (PA-AUG) method which maximizes the utilization of structural information of 3D bounding boxes. Since the 3D bounding box labels fit the objects boundaries and include the orientation value, they contain the structural information of the object in the box. To fully utilize the intra-object structural information, we propose a novel partaware partitioning method which separates 3D bounding boxes with characteristic sub-parts. PA-AUG applies newly proposed data augmentation methods at the partition level. It makes various types of 3D object detectors robust and brings the equivalent effect of increasing the train data by about 2.5Γ.
Second, we propose a mixture-density-based 3D object detection (MD3D). MD3D predicts the distribution of 3D bounding boxes using a Gaussian mixture model (GMM). It reformulates the conventional regression methods as a density estimation problem. Thus, unlike conventional target assignment methods, it can be applied to any 3D object detector regardless of the point cloud preprocessing method. In addition, as it requires significantly fewer hyper-parameters compared to existing methods, it is easy to optimize the detection performance. MD3D also increases the detection speed due to its simple structure.
Both PA-AUG and MD3D can be applied to any 3D object detector and shows an impressive increase in detection performance. The two proposed methods cover different stages of the object detection pipeline. Thus, they can be used simultaneously, and the experimental results show they have a synergy effect when applied together.1 Introduction 1
1.1 Problem Definition 3
1.2 Challenges 6
1.3 Contributions 8
1.3.1 Part-Aware Data Augmentation (PA-AUG) 8
1.3.2 Mixture-Density-based 3D Object Detection (MD3D) 9
1.3.3 Combination of PA-AUG and MD3D 10
1.4 Outline 10
2 Related Works 11
2.1 Data augmentation for Object Detection 11
2.1.1 2D Data augmentation 11
2.1.2 3D Data augmentation 12
2.2 LiDAR-based 3D Object Detection 13
2.3 Mixture Density Networks in Computer Vision 15
2.4 Datasets 16
2.4.1 KITTI Dataset 16
2.4.2 Waymo Open Dataset 18
2.5 Evaluation metric 19
2.5.1 Average Precision (AP) 19
2.5.2 Average Orientation Similarity (AOS) 22
2.5.3 Average Precision weighted by Heading (APH) 22
3 Part-Aware Data Augmentation (PA-AUG) 24
3.1 Introduction 24
3.2 Methods 27
3.2.1 Part-Aware Partitioning 27
3.2.2 Part-Aware Data Augmentation 28
3.3 Experiments 33
3.3.1 Results on the KITTI Dataset 33
3.3.2 Robustness Test 36
3.3.3 Data Efficiency Test 38
3.3.4 Ablation Study 40
3.4 Discussion 41
3.5 Conclusion 42
4 Mixture-Density-based 3D Object Detection (MD3D) 43
4.1 Introduction 43
4.2 Methods 47
4.2.1 Modeling Point-cloud-based 3D Object Detection with Mixture Density Network 47
4.2.2 Network Architecture 49
4.2.3 Loss function 52
4.3 Experiments 53
4.3.1 Datasets 53
4.3.2 Experiment Settings 53
4.3.3 Results on the KITTI Dataset 54
4.3.4 Latency of Each Module 56
4.3.5 Results on the Waymo Open Dataset 58
4.3.6 Analyzing Recall by object size 59
4.3.7 Ablation Study 60
4.3.8 Discussion 65
4.4 Conclusion 66
5 Combination of PA-AUG and MD3D 71
5.1 Methods 71
5.2 Experiments 72
5.2.1 Settings 72
5.2.2 Results on the KITTI Dataset 73
5.3 Discussion 76
6 Conclusion 77
6.1 Summary 77
6.2 Limitations and Future works 78
6.2.1 Hyper-parameter-free PA-AUG 78
6.2.2 Redefinition of Part-aware Partitioning 79
6.2.3 Application to other tasks 79
Abstract (In Korean) 94
κ°μ¬μ κΈ 96λ°
MVFAN: Multi-View Feature Assisted Network for 4D Radar Object Detection
4D radar is recognized for its resilience and cost-effectiveness under
adverse weather conditions, thus playing a pivotal role in autonomous driving.
While cameras and LiDAR are typically the primary sensors used in perception
modules for autonomous vehicles, radar serves as a valuable supplementary
sensor. Unlike LiDAR and cameras, radar remains unimpaired by harsh weather
conditions, thereby offering a dependable alternative in challenging
environments. Developing radar-based 3D object detection not only augments the
competency of autonomous vehicles but also provides economic benefits. In
response, we propose the Multi-View Feature Assisted Network (\textit{MVFAN}),
an end-to-end, anchor-free, and single-stage framework for 4D-radar-based 3D
object detection for autonomous vehicles. We tackle the issue of insufficient
feature utilization by introducing a novel Position Map Generation module to
enhance feature learning by reweighing foreground and background points, and
their features, considering the irregular distribution of radar point clouds.
Additionally, we propose a pioneering backbone, the Radar Feature Assisted
backbone, explicitly crafted to fully exploit the valuable Doppler velocity and
reflectivity data provided by the 4D radar sensor. Comprehensive experiments
and ablation studies carried out on Astyx and VoD datasets attest to the
efficacy of our framework. The incorporation of Doppler velocity and RCS
reflectivity dramatically improves the detection performance for small moving
objects such as pedestrians and cyclists. Consequently, our approach culminates
in a highly optimized 4D-radar-based 3D object detection capability for
autonomous driving systems, setting a new standard in the field.Comment: 19 Pages, 7 figures, Accepted by ICONIP 202
3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection
3D visual perception tasks based on multi-camera images are essential for
autonomous driving systems. Latest work in this field performs 3D object
detection by leveraging multi-view images as an input and iteratively enhancing
object queries (object proposals) by cross-attending multi-view features.
However, individual backbone features are not updated with multi-view features
and it stays as a mere collection of the output of the single-image backbone
network. Therefore we propose 3M3D: A Multi-view, Multi-path,
Multi-representation for 3D Object Detection where we update both multi-view
features and query features to enhance the representation of the scene in both
fine panoramic view and coarse global view. Firstly, we update multi-view
features by multi-view axis self-attention. It will incorporate panoramic
information in the multi-view features and enhance understanding of the global
scene. Secondly, we update multi-view features by self-attention of the ROI
(Region of Interest) windows which encodes local finer details in the features.
It will help exchange the information not only along the multi-view axis but
also along the other spatial dimension. Lastly, we leverage the fact of
multi-representation of queries in different domains to further boost the
performance. Here we use sparse floating queries along with dense BEV (Bird's
Eye View) queries, which are later post-processed to filter duplicate
detections. Moreover, we show performance improvements on nuScenes benchmark
dataset on top of our baselines
- β¦