26 research outputs found
Object Detection and Classification in Occupancy Grid Maps using Deep Convolutional Networks
A detailed environment perception is a crucial component of automated
vehicles. However, to deal with the amount of perceived information, we also
require segmentation strategies. Based on a grid map environment
representation, well-suited for sensor fusion, free-space estimation and
machine learning, we detect and classify objects using deep convolutional
neural networks. As input for our networks we use a multi-layer grid map
efficiently encoding 3D range sensor information. The inference output consists
of a list of rotated bounding boxes with associated semantic classes. We
conduct extensive ablation studies, highlight important design considerations
when using grid maps and evaluate our models on the KITTI Bird's Eye View
benchmark. Qualitative and quantitative benchmark results show that we achieve
robust detection and state of the art accuracy solely using top-view grid maps
from range sensor data.Comment: 6 pages, 4 tables, 4 figure
Learned Enrichment of Top-View Grid Maps Improves Object Detection
We propose an object detector for top-view grid maps which is additionally
trained to generate an enriched version of its input. Our goal in the joint
model is to improve generalization by regularizing towards structural knowledge
in form of a map fused from multiple adjacent range sensor measurements. This
training data can be generated in an automatic fashion, thus does not require
manual annotations. We present an evidential framework to generate training
data, investigate different model architectures and show that predicting
enriched inputs as an additional task can improve object detection performance.Comment: 6 pages, 6 figures, 4 table
SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation
3D pedestrian detection is a challenging task in automated driving because
pedestrians are relatively small, frequently occluded and easily confused with
narrow vertical objects. LiDAR and camera are two commonly used sensor
modalities for this task, which should provide complementary information.
Unexpectedly, LiDAR-only detection methods tend to outperform multisensor
fusion methods in public benchmarks. Recently, PointPainting has been presented
to eliminate this performance drop by effectively fusing the output of a
semantic segmentation network instead of the raw image information. In this
paper, we propose a generalization of PointPainting to be able to apply fusion
at different levels. After the semantic augmentation of the point cloud, we
encode raw point data in pillars to get geometric features and semantic point
data in voxels to get semantic features and fuse them in an effective way.
Experimental results on the KITTI test set show that SemanticVoxels achieves
state-of-the-art performance in both 3D and bird's eye view pedestrian
detection benchmarks. In particular, our approach demonstrates its strength in
detecting challenging pedestrian cases and outperforms current state-of-the-art
approaches.Comment: Accepted to present in the 2020 IEEE International Conference on
Multisensor Fusion and Integration (MFI 2020
Semantic evidential grid mapping using monocular and stereo cameras
Accurately estimating the current state of local traffic scenes is one of the key problems in the development of software components for automated vehicles. In addition to details on free space and drivability, static and dynamic traffic participants and information on the semantics may also be included in the desired representation. Multi-layer grid maps allow the inclusion of all of this information in a common representation. However, most existing grid mapping approaches only process range sensor measurements such as Lidar and Radar and solely model occupancy without semantic states. In order to add sensor redundancy and diversity, it is desired to add vision-based sensor setups in a common grid map representation. In this work, we present a semantic evidential grid mapping pipeline, including estimates for eight semantic classes, that is designed for straightforward fusion with range sensor data. Unlike other publications, our representation explicitly models uncertainties in the evidential model. We present results of our grid mapping pipeline based on a monocular vision setup and a stereo vision setup. Our mapping results are accurate and dense mapping due to the incorporation of a disparity- or depth-based ground surface estimation in the inverse perspective mapping. We conclude this paper by providing a detailed quantitative evaluation based on real traffic scenarios in the KITTI odometry benchmark dataset and demonstrating the advantages compared to other semantic grid mapping approaches