14 research outputs found
LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks
In this work, a deep learning approach has been developed to carry out road
detection by fusing LIDAR point clouds and camera images. An unstructured and
sparse point cloud is first projected onto the camera image plane and then
upsampled to obtain a set of dense 2D images encoding spatial information.
Several fully convolutional neural networks (FCNs) are then trained to carry
out road detection, either by using data from a single sensor, or by using
three fusion strategies: early, late, and the newly proposed cross fusion.
Whereas in the former two fusion approaches, the integration of multimodal
information is carried out at a predefined depth level, the cross fusion FCN is
designed to directly learn from data where to integrate information; this is
accomplished by using trainable cross connections between the LIDAR and the
camera processing branches.
To further highlight the benefits of using a multimodal system for road
detection, a data set consisting of visually challenging scenes was extracted
from driving sequences of the KITTI raw data set. It was then demonstrated
that, as expected, a purely camera-based FCN severely underperforms on this
data set. A multimodal system, on the other hand, is still able to provide high
accuracy. Finally, the proposed cross fusion FCN was evaluated on the KITTI
road benchmark where it achieved excellent performance, with a MaxF score of
96.03%, ranking it among the top-performing approaches
Enhanced free space detection in multiple lanes based on single CNN with scene identification
Many systems for autonomous vehicles' navigation rely on lane detection.
Traditional algorithms usually estimate only the position of the lanes on the
road, but an autonomous control system may also need to know if a lane marking
can be crossed or not, and what portion of space inside the lane is free from
obstacles, to make safer control decisions. On the other hand, free space
detection algorithms only detect navigable areas, without information about
lanes. State-of-the-art algorithms use CNNs for both tasks, with significant
consumption of computing resources. We propose a novel approach that estimates
the free space inside each lane, with a single CNN. Additionally, adding only a
small requirement concerning GPU RAM, we infer the road type, that will be
useful for path planning. To achieve this result, we train a multi-task CNN.
Then, we further elaborate the output of the network, to extract polygons that
can be effectively used in navigation control. Finally, we provide a
computationally efficient implementation, based on ROS, that can be executed in
real time. Our code and trained models are available online.Comment: Will appear in the 2019 IEEE Intelligent Vehicles Symposium (IV 2019
Instance Segmentation and Object Detection in Road Scenes using Inverse Perspective Mapping of 3D Point Clouds and 2D Images
The instance segmentation and object detection are important tasks in smart car applications. Recently, a variety of neural network-based approaches have been proposed. One of the challenges is that there are various scales of objects in a scene, and it requires the neural network to have a large receptive field to deal with the scale variations. In other words, the neural network must have deep architectures which slow down computation. In smart car applications, the accuracy of detection and segmentation of vehicle and pedestrian is hugely critical. Besides, 2D images do not have distance information but enough visual appearance. On the other hand, 3D point clouds have strong evidence of existence of objects. The fusion of 2D images and 3D point clouds can provide more information to seek out objects in a scene. This paper proposes a series of fronto-parallel virtual planes and inverse perspective mapping of an input image to the planes, to deal with scale variations. I use 3D point clouds obtained from LiDAR sensor and 2D images obtained from stereo cameras on top of a vehicle to estimate the ground area of the scene and to define virtual planes. Certain height from the ground area in 2D images is cropped to focus on objects on flat roads. Then, the point cloud is used to filter out false-alarms among the over-detection results generated by an off-the-shelf deep neural network, Mask RCNN. The experimental result showed that the proposed approach outperforms Mask RCNN without pre-processing on a benchmark dataset, KITTI dataset [9]
A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection
Object detection in camera images, using deep learning has been proven
successfully in recent years. Rising detection rates and computationally
efficient network structures are pushing this technique towards application in
production vehicles. Nevertheless, the sensor quality of the camera is limited
in severe weather conditions and through increased sensor noise in sparsely lit
areas and at night. Our approach enhances current 2D object detection networks
by fusing camera data and projected sparse radar data in the network layers.
The proposed CameraRadarFusionNet (CRF-Net) automatically learns at which level
the fusion of the sensor data is most beneficial for the detection result.
Additionally, we introduce BlackIn, a training strategy inspired by Dropout,
which focuses the learning on a specific sensor type. We show that the fusion
network is able to outperform a state-of-the-art image-only network for two
different datasets. The code for this research will be made available to the
public at: https://github.com/TUMFTM/CameraRadarFusionNet.Comment: Accepted at 2019 Sensor Data Fusion: Trends, Solutions, Applications
(SDF
SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection
Freespace detection is an essential component of visual perception for
self-driving cars. The recent efforts made in data-fusion convolutional neural
networks (CNNs) have significantly improved semantic driving scene
segmentation. Freespace can be hypothesized as a ground plane, on which the
points have similar surface normals. Hence, in this paper, we first introduce a
novel module, named surface normal estimator (SNE), which can infer surface
normal information from dense depth/disparity images with high accuracy and
efficiency. Furthermore, we propose a data-fusion CNN architecture, referred to
as RoadSeg, which can extract and fuse features from both RGB images and the
inferred surface normal information for accurate freespace detection. For
research purposes, we publish a large-scale synthetic freespace detection
dataset, named Ready-to-Drive (R2D) road dataset, collected under different
illumination and weather conditions. The experimental results demonstrate that
our proposed SNE module can benefit all the state-of-the-art CNNs for freespace
detection, and our SNE-RoadSeg achieves the best overall performance among
different datasets.Comment: ECCV 202
Lidar–camera semi-supervised learning for semantic segmentation
In this work, we investigated two issues: (1) How the fusion of lidar and camera data can improve semantic segmentation performance compared with the individual sensor modalities in a supervised learning context; and (2) How fusion can also be leveraged for semi-supervised learning in order to further improve performance and to adapt to new domains without requiring any additional labelled data. A comparative study was carried out by providing an experimental evaluation on networks trained in different setups using various scenarios from sunny days to rainy night scenes. The networks were tested for challenging, and less common, scenarios where cameras or lidars individually would not provide a reliable prediction. Our results suggest that semi-supervised learning and fusion techniques increase the overall performance of the network in challenging scenarios using less data annotations
LOCATOR: Low-power ORB accelerator for autonomous cars
Simultaneous Localization And Mapping (SLAM) is crucial for autonomous navigation. ORB-SLAM is a state-of-the-art Visual SLAM system based on cameras used for self-driving cars. In this paper, we propose a high-performance, energy-efficient, and functionally accurate hardware accelerator for ORB-SLAM, focusing on its most time-consuming stage: Oriented FAST and Rotated BRIEF (ORB) feature extraction. The Rotated BRIEF (rBRIEF) descriptor generation is the main bottleneck in ORB computation, as it exhibits highly irregular access patterns to local on-chip memories causing a high-performance penalty due to bank conflicts. We introduce a technique to find an optimal static pattern to perform parallel accesses to banks based on a genetic algorithm. Furthermore, we propose the combination of an rBRIEF pixel duplication cache, selective ports replication, and pipelining to reduce latency without compromising cost. The accelerator achieves a reduction in energy consumption of 14597× and 9609×, with respect to high-end CPU and GPU platforms, respectively.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020- 113172RB-I00, the ICREA Academia program and the FPU grant FPU18/04413Peer ReviewedPostprint (published version