270 research outputs found
Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception
Vehicle-to-everything (V2X) autonomous driving opens up a promising direction
for developing a new generation of intelligent transportation systems.
Collaborative perception (CP) as an essential component to achieve V2X can
overcome the inherent limitations of individual perception, including occlusion
and long-range perception. In this survey, we provide a comprehensive review of
CP methods for V2X scenarios, bringing a profound and in-depth understanding to
the community. Specifically, we first introduce the architecture and workflow
of typical V2X systems, which affords a broader perspective to understand the
entire V2X system and the role of CP within it. Then, we thoroughly summarize
and analyze existing V2X perception datasets and CP methods. Particularly, we
introduce numerous CP methods from various crucial perspectives, including
collaboration stages, roadside sensors placement, latency compensation,
performance-bandwidth trade-off, attack/defense, pose alignment, etc. Moreover,
we conduct extensive experimental analyses to compare and examine current CP
methods, revealing some essential and unexplored insights. Specifically, we
analyze the performance changes of different methods under different
bandwidths, providing a deep insight into the performance-bandwidth trade-off
issue. Also, we examine methods under different LiDAR ranges. To study the
model robustness, we further investigate the effects of various simulated
real-world noises on the performance of different CP methods, covering
communication latency, lossy communication, localization errors, and mixed
noises. In addition, we look into the sim-to-real generalization ability of
existing CP methods. At last, we thoroughly discuss issues and challenges,
highlighting promising directions for future efforts. Our codes for
experimental analysis will be public at
https://github.com/memberRE/Collaborative-Perception.Comment: 19 page
Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges
Collaborative perception is essential to address occlusion and sensor failure
issues in autonomous driving. In recent years, theoretical and experimental
investigations of novel works for collaborative perception have increased
tremendously. So far, however, few reviews have focused on systematical
collaboration modules and large-scale collaborative perception datasets. This
work reviews recent achievements in this field to bridge this gap and motivate
future research. We start with a brief overview of collaboration schemes. After
that, we systematically summarize the collaborative perception methods for
ideal scenarios and real-world issues. The former focuses on collaboration
modules and efficiency, and the latter is devoted to addressing the problems in
actual application. Furthermore, we present large-scale public datasets and
summarize quantitative results on these benchmarks. Finally, we highlight gaps
and overlook challenges between current academic research and real-world
applications. The project page is
https://github.com/CatOneTwo/Collaborative-Perception-in-Autonomous-DrivingComment: 18 pages, 6 figures. Accepted by IEEE Intelligent Transportation
Systems Magazine. URL:
https://github.com/CatOneTwo/Collaborative-Perception-in-Autonomous-Drivin
Bridging the Domain Gap for Multi-Agent Perception
Existing multi-agent perception algorithms usually select to share deep
neural features extracted from raw sensing data between agents, achieving a
trade-off between accuracy and communication bandwidth limit. However, these
methods assume all agents have identical neural networks, which might not be
practical in the real world. The transmitted features can have a large domain
gap when the models differ, leading to a dramatic performance drop in
multi-agent perception. In this paper, we propose the first lightweight
framework to bridge such domain gaps for multi-agent perception, which can be a
plug-in module for most existing systems while maintaining confidentiality. Our
framework consists of a learnable feature resizer to align features in multiple
dimensions and a sparse cross-domain transformer for domain adaption. Extensive
experiments on the public multi-agent perception dataset V2XSet have
demonstrated that our method can effectively bridge the gap for features from
different domains and outperform other baseline methods significantly by at
least 8% for point-cloud-based 3D object detection.Comment: Accepted by ICRA2023.Code: https://github.com/DerrickXuNu/MPD
Collaboration Helps Camera Overtake LiDAR in 3D Detection
Camera-only 3D detection provides an economical solution with a simple
configuration for localizing objects in 3D space compared to LiDAR-based
detection systems. However, a major challenge lies in precise depth estimation
due to the lack of direct 3D measurements in the input. Many previous methods
attempt to improve depth estimation through network designs, e.g., deformable
layers and larger receptive fields. This work proposes an orthogonal direction,
improving the camera-only 3D detection by introducing multi-agent
collaborations. Our proposed collaborative camera-only 3D detection (CoCa3D)
enables agents to share complementary information with each other through
communication. Meanwhile, we optimize communication efficiency by selecting the
most informative cues. The shared messages from multiple viewpoints
disambiguate the single-agent estimated depth and complement the occluded and
long-range regions in the single-agent view. We evaluate CoCa3D in one
real-world dataset and two new simulation datasets. Results show that CoCa3D
improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+,
12.59% on CoPerception-UAVs+ for AP@70. Our preliminary results show a
potential that with sufficient collaboration, the camera might overtake LiDAR
in some practical scenarios. We released the dataset and code at
https://siheng-chen.github.io/dataset/CoPerception+ and
https://github.com/MediaBrain-SJTU/CoCa3D.Comment: Accepted by CVPR2
CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion
Autonomous Vehicles (AVs) use multiple sensors to gather information about
their surroundings. By sharing sensor data between Connected Autonomous
Vehicles (CAVs), the safety and reliability of these vehicles can be improved
through a concept known as cooperative perception. However, recent approaches
in cooperative perception only share single sensor information such as cameras
or LiDAR. In this research, we explore the fusion of multiple sensor data
sources and present a framework, called CoBEVFusion, that fuses LiDAR and
camera data to create a Bird's-Eye View (BEV) representation. The CAVs process
the multi-modal data locally and utilize a Dual Window-based Cross-Attention
(DWCA) module to fuse the LiDAR and camera features into a unified BEV
representation. The fused BEV feature maps are shared among the CAVs, and a 3D
Convolutional Neural Network is applied to aggregate the features from the
CAVs. Our CoBEVFusion framework was evaluated on the cooperative perception
dataset OPV2V for two perception tasks: BEV semantic segmentation and 3D object
detection. The results show that our DWCA LiDAR-camera fusion model outperforms
perception models with single-modal data and state-of-the-art BEV fusion
models. Our overall cooperative perception architecture, CoBEVFusion, also
achieves comparable performance with other cooperative perception models
Analyzing Infrastructure LiDAR Placement with Realistic LiDAR Simulation Library
Recently, Vehicle-to-Everything(V2X) cooperative perception has attracted
increasing attention. Infrastructure sensors play a critical role in this
research field; however, how to find the optimal placement of infrastructure
sensors is rarely studied. In this paper, we investigate the problem of
infrastructure sensor placement and propose a pipeline that can efficiently and
effectively find optimal installation positions for infrastructure sensors in a
realistic simulated environment. To better simulate and evaluate LiDAR
placement, we establish a Realistic LiDAR Simulation library that can simulate
the unique characteristics of different popular LiDARs and produce
high-fidelity LiDAR point clouds in the CARLA simulator. Through simulating
point cloud data in different LiDAR placements, we can evaluate the perception
accuracy of these placements using multiple detection models. Then, we analyze
the correlation between the point cloud distribution and perception accuracy by
calculating the density and uniformity of regions of interest. Experiments show
that when using the same number and type of LiDAR, the placement scheme
optimized by our proposed method improves the average precision by 15%,
compared with the conventional placement scheme in the standard lane scene. We
also analyze the correlation between perception performance in the region of
interest and LiDAR point cloud distribution and validate that density and
uniformity can be indicators of performance. Both the RLS Library and related
code will be released at
https://github.com/PJLab-ADG/LiDARSimLib-and-Placement-Evaluation.Comment: 7 pages, 6 figures, accepted to the IEEE International Conference on
Robotics and Automation (ICRA'23
LiDAR aided simulation pipeline for wireless communication in vehicular traffic scenarios
Abstract. Integrated Sensing and Communication (ISAC) is a modern technology under development for Sixth Generation (6G) systems. This thesis focuses on creating a simulation pipeline for dynamic vehicular traffic scenarios and a novel approach to reducing wireless communication overhead with a Light Detection and Ranging (LiDAR) based system. The simulation pipeline can be used to generate data sets for numerous problems. Additionally, the developed error model for vehicle detection algorithms can be used to identify LiDAR performance with respect to different parameters like LiDAR height, range, and laser point density. LiDAR behavior on traffic environment is provided as part of the results in this study. A periodic beam index map is developed by capturing antenna azimuth and elevation angles, which denote maximum Reference Signal Receive Power (RSRP) for a simulated receiver grid on the road and classifying areas using Support Vector Machine (SVM) algorithm to reduce the number of Synchronization Signal Blocks (SSBs) that are needed to be sent in Vehicle to Infrastructure (V2I) communication. This approach effectively reduces the wireless communication overhead in V2I communication
Point Cloud Processing Algorithms for Environment Understanding in Intelligent Vehicle Applications
Understanding the surrounding environment including both still and moving objects is crucial to the design and optimization of intelligent vehicles. In particular, acquiring the knowledge about the vehicle environment could facilitate reliable detection of moving objects for the purpose of avoiding collisions. In this thesis, we focus on developing point cloud processing algorithms to support intelligent vehicle applications. The contributions of this thesis are three-fold.;First, inspired by the analogy between point cloud and video data, we propose to formulate a problem of reconstructing the vehicle environment (e.g., terrains and buildings) from a sequence of point cloud sets. Built upon existing point cloud registration tool such as iterated closest point (ICP), we have developed an expectation-maximization (EM)-like technique that can automatically mosaic multiple point cloud sets into a larger one characterizing the still environment surrounding the vehicle.;Second, we propose to utilize the color information (from color images captured by the RGB camera) as a supplementary source to the three-dimensional point cloud data. Such joint color and depth representation has the potential of better characterizing the surrounding environment of a vehicle. Based on the novel joint RGBD representation, we propose training a convolution neural network on color images and depth maps generated from the point cloud data.;Finally, we explore a sensor fusion method that combines the results given by a Lidar based detection algorithm and vehicle to everything (V2X) communicated data. Since Lidar and V2X respectively characterize the environmental information from complementary sources, we propose to get a better localization of the surrounding vehicles by a linear sensor fusion method. The effectiveness of the proposed sensor fusion method is verified by comparing detection error profiles
DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving
Safety is the primary priority of autonomous driving. Nevertheless, no
published dataset currently supports the direct and explainable safety
evaluation for autonomous driving. In this work, we propose DeepAccident, a
large-scale dataset generated via a realistic simulator containing diverse
accident scenarios that frequently occur in real-world driving. The proposed
DeepAccident dataset contains 57K annotated frames and 285K annotated samples,
approximately 7 times more than the large-scale nuScenes dataset with 40k
annotated samples. In addition, we propose a new task, end-to-end motion and
accident prediction, based on the proposed dataset, which can be used to
directly evaluate the accident prediction ability for different autonomous
driving algorithms. Furthermore, for each scenario, we set four vehicles along
with one infrastructure to record data, thus providing diverse viewpoints for
accident scenarios and enabling V2X (vehicle-to-everything) research on
perception and prediction tasks. Finally, we present a baseline V2X model named
V2XFormer that demonstrates superior performance for motion and accident
prediction and 3D object detection compared to the single-vehicle model
- …