8,341 research outputs found
Dense Voxel Fusion for 3D Object Detection
Camera and LiDAR sensor modalities provide complementary appearance and
geometric information useful for detecting 3D objects for autonomous vehicle
applications. However, current end-to-end fusion methods are challenging to
train and underperform state-of-the-art LiDAR-only detectors. Sequential fusion
methods suffer from a limited number of pixel and point correspondences due to
point cloud sparsity, or their performance is strictly capped by the detections
of one of the modalities. Our proposed solution, Dense Voxel Fusion (DVF) is a
sequential fusion method that generates multi-scale dense voxel feature
representations, improving expressiveness in low point density regions. To
enhance multi-modal learning, we train directly with projected ground truth 3D
bounding box labels, avoiding noisy, detector-specific 2D predictions. Both DVF
and the multi-modal training approach can be applied to any voxel-based LiDAR
backbone. DVF ranks 3rd among published fusion methods on KITTI 3D car
detection benchmark without introducing additional trainable parameters, nor
requiring stereo images or dense depth labels. In addition, DVF significantly
improves 3D vehicle detection performance of voxel-based methods on the Waymo
Open Dataset.Comment: Accepted in WACV 202
Point Cloud Processing Algorithms for Environment Understanding in Intelligent Vehicle Applications
Understanding the surrounding environment including both still and moving objects is crucial to the design and optimization of intelligent vehicles. In particular, acquiring the knowledge about the vehicle environment could facilitate reliable detection of moving objects for the purpose of avoiding collisions. In this thesis, we focus on developing point cloud processing algorithms to support intelligent vehicle applications. The contributions of this thesis are three-fold.;First, inspired by the analogy between point cloud and video data, we propose to formulate a problem of reconstructing the vehicle environment (e.g., terrains and buildings) from a sequence of point cloud sets. Built upon existing point cloud registration tool such as iterated closest point (ICP), we have developed an expectation-maximization (EM)-like technique that can automatically mosaic multiple point cloud sets into a larger one characterizing the still environment surrounding the vehicle.;Second, we propose to utilize the color information (from color images captured by the RGB camera) as a supplementary source to the three-dimensional point cloud data. Such joint color and depth representation has the potential of better characterizing the surrounding environment of a vehicle. Based on the novel joint RGBD representation, we propose training a convolution neural network on color images and depth maps generated from the point cloud data.;Finally, we explore a sensor fusion method that combines the results given by a Lidar based detection algorithm and vehicle to everything (V2X) communicated data. Since Lidar and V2X respectively characterize the environmental information from complementary sources, we propose to get a better localization of the surrounding vehicles by a linear sensor fusion method. The effectiveness of the proposed sensor fusion method is verified by comparing detection error profiles
LiDAR and Camera Detection Fusion in a Real Time Industrial Multi-Sensor Collision Avoidance System
Collision avoidance is a critical task in many applications, such as ADAS
(advanced driver-assistance systems), industrial automation and robotics. In an
industrial automation setting, certain areas should be off limits to an
automated vehicle for protection of people and high-valued assets. These areas
can be quarantined by mapping (e.g., GPS) or via beacons that delineate a
no-entry area. We propose a delineation method where the industrial vehicle
utilizes a LiDAR {(Light Detection and Ranging)} and a single color camera to
detect passive beacons and model-predictive control to stop the vehicle from
entering a restricted space. The beacons are standard orange traffic cones with
a highly reflective vertical pole attached. The LiDAR can readily detect these
beacons, but suffers from false positives due to other reflective surfaces such
as worker safety vests. Herein, we put forth a method for reducing false
positive detection from the LiDAR by projecting the beacons in the camera
imagery via a deep learning method and validating the detection using a neural
network-learned projection from the camera to the LiDAR space. Experimental
data collected at Mississippi State University's Center for Advanced Vehicular
Systems (CAVS) shows the effectiveness of the proposed system in keeping the
true detection while mitigating false positives.Comment: 34 page
LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks
In this work, a deep learning approach has been developed to carry out road
detection by fusing LIDAR point clouds and camera images. An unstructured and
sparse point cloud is first projected onto the camera image plane and then
upsampled to obtain a set of dense 2D images encoding spatial information.
Several fully convolutional neural networks (FCNs) are then trained to carry
out road detection, either by using data from a single sensor, or by using
three fusion strategies: early, late, and the newly proposed cross fusion.
Whereas in the former two fusion approaches, the integration of multimodal
information is carried out at a predefined depth level, the cross fusion FCN is
designed to directly learn from data where to integrate information; this is
accomplished by using trainable cross connections between the LIDAR and the
camera processing branches.
To further highlight the benefits of using a multimodal system for road
detection, a data set consisting of visually challenging scenes was extracted
from driving sequences of the KITTI raw data set. It was then demonstrated
that, as expected, a purely camera-based FCN severely underperforms on this
data set. A multimodal system, on the other hand, is still able to provide high
accuracy. Finally, the proposed cross fusion FCN was evaluated on the KITTI
road benchmark where it achieved excellent performance, with a MaxF score of
96.03%, ranking it among the top-performing approaches
Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking
International audience—The accurate detection and classification of moving objects is a critical aspect of Advanced Driver Assistance Systems (ADAS). We believe that by including the objects classification from multiple sensors detections as a key component of the object's representation and the perception process, we can improve the perceived model of the environment. First, we define a composite object representation to include class information in the core object's description. Second , we propose a complete perception fusion architecture based on the Evidential framework to solve the Detection and Tracking of Moving Objects (DATMO) problem by integrating the composite representation and uncertainty management. Finally, we integrate our fusion approach in a real-time application inside a vehicle demonstrator from the interactIVe IP European project which includes three main sensors: radar, lidar and camera. We test our fusion approach using real data from different driving scenarios and focusing on four objects of interest: pedestrian, bike, car and truck
- …