8,341 research outputs found

    Dense Voxel Fusion for 3D Object Detection

    Full text link
    Camera and LiDAR sensor modalities provide complementary appearance and geometric information useful for detecting 3D objects for autonomous vehicle applications. However, current end-to-end fusion methods are challenging to train and underperform state-of-the-art LiDAR-only detectors. Sequential fusion methods suffer from a limited number of pixel and point correspondences due to point cloud sparsity, or their performance is strictly capped by the detections of one of the modalities. Our proposed solution, Dense Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations, improving expressiveness in low point density regions. To enhance multi-modal learning, we train directly with projected ground truth 3D bounding box labels, avoiding noisy, detector-specific 2D predictions. Both DVF and the multi-modal training approach can be applied to any voxel-based LiDAR backbone. DVF ranks 3rd among published fusion methods on KITTI 3D car detection benchmark without introducing additional trainable parameters, nor requiring stereo images or dense depth labels. In addition, DVF significantly improves 3D vehicle detection performance of voxel-based methods on the Waymo Open Dataset.Comment: Accepted in WACV 202

    Point Cloud Processing Algorithms for Environment Understanding in Intelligent Vehicle Applications

    Get PDF
    Understanding the surrounding environment including both still and moving objects is crucial to the design and optimization of intelligent vehicles. In particular, acquiring the knowledge about the vehicle environment could facilitate reliable detection of moving objects for the purpose of avoiding collisions. In this thesis, we focus on developing point cloud processing algorithms to support intelligent vehicle applications. The contributions of this thesis are three-fold.;First, inspired by the analogy between point cloud and video data, we propose to formulate a problem of reconstructing the vehicle environment (e.g., terrains and buildings) from a sequence of point cloud sets. Built upon existing point cloud registration tool such as iterated closest point (ICP), we have developed an expectation-maximization (EM)-like technique that can automatically mosaic multiple point cloud sets into a larger one characterizing the still environment surrounding the vehicle.;Second, we propose to utilize the color information (from color images captured by the RGB camera) as a supplementary source to the three-dimensional point cloud data. Such joint color and depth representation has the potential of better characterizing the surrounding environment of a vehicle. Based on the novel joint RGBD representation, we propose training a convolution neural network on color images and depth maps generated from the point cloud data.;Finally, we explore a sensor fusion method that combines the results given by a Lidar based detection algorithm and vehicle to everything (V2X) communicated data. Since Lidar and V2X respectively characterize the environmental information from complementary sources, we propose to get a better localization of the surrounding vehicles by a linear sensor fusion method. The effectiveness of the proposed sensor fusion method is verified by comparing detection error profiles

    LiDAR and Camera Detection Fusion in a Real Time Industrial Multi-Sensor Collision Avoidance System

    Full text link
    Collision avoidance is a critical task in many applications, such as ADAS (advanced driver-assistance systems), industrial automation and robotics. In an industrial automation setting, certain areas should be off limits to an automated vehicle for protection of people and high-valued assets. These areas can be quarantined by mapping (e.g., GPS) or via beacons that delineate a no-entry area. We propose a delineation method where the industrial vehicle utilizes a LiDAR {(Light Detection and Ranging)} and a single color camera to detect passive beacons and model-predictive control to stop the vehicle from entering a restricted space. The beacons are standard orange traffic cones with a highly reflective vertical pole attached. The LiDAR can readily detect these beacons, but suffers from false positives due to other reflective surfaces such as worker safety vests. Herein, we put forth a method for reducing false positive detection from the LiDAR by projecting the beacons in the camera imagery via a deep learning method and validating the detection using a neural network-learned projection from the camera to the LiDAR space. Experimental data collected at Mississippi State University's Center for Advanced Vehicular Systems (CAVS) shows the effectiveness of the proposed system in keeping the true detection while mitigating false positives.Comment: 34 page

    LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks

    Full text link
    In this work, a deep learning approach has been developed to carry out road detection by fusing LIDAR point clouds and camera images. An unstructured and sparse point cloud is first projected onto the camera image plane and then upsampled to obtain a set of dense 2D images encoding spatial information. Several fully convolutional neural networks (FCNs) are then trained to carry out road detection, either by using data from a single sensor, or by using three fusion strategies: early, late, and the newly proposed cross fusion. Whereas in the former two fusion approaches, the integration of multimodal information is carried out at a predefined depth level, the cross fusion FCN is designed to directly learn from data where to integrate information; this is accomplished by using trainable cross connections between the LIDAR and the camera processing branches. To further highlight the benefits of using a multimodal system for road detection, a data set consisting of visually challenging scenes was extracted from driving sequences of the KITTI raw data set. It was then demonstrated that, as expected, a purely camera-based FCN severely underperforms on this data set. A multimodal system, on the other hand, is still able to provide high accuracy. Finally, the proposed cross fusion FCN was evaluated on the KITTI road benchmark where it achieved excellent performance, with a MaxF score of 96.03%, ranking it among the top-performing approaches

    Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking

    No full text
    International audience—The accurate detection and classification of moving objects is a critical aspect of Advanced Driver Assistance Systems (ADAS). We believe that by including the objects classification from multiple sensors detections as a key component of the object's representation and the perception process, we can improve the perceived model of the environment. First, we define a composite object representation to include class information in the core object's description. Second , we propose a complete perception fusion architecture based on the Evidential framework to solve the Detection and Tracking of Moving Objects (DATMO) problem by integrating the composite representation and uncertainty management. Finally, we integrate our fusion approach in a real-time application inside a vehicle demonstrator from the interactIVe IP European project which includes three main sensors: radar, lidar and camera. We test our fusion approach using real data from different driving scenarios and focusing on four objects of interest: pedestrian, bike, car and truck
    • …
    corecore