305 research outputs found
SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving
In this paper, we introduce a deep encoder-decoder network, named SalsaNet,
for efficient semantic segmentation of 3D LiDAR point clouds. SalsaNet segments
the road, i.e. drivable free-space, and vehicles in the scene by employing the
Bird-Eye-View (BEV) image projection of the point cloud. To overcome the lack
of annotated point cloud data, in particular for the road segments, we
introduce an auto-labeling process which transfers automatically generated
labels from the camera to LiDAR. We also explore the role of imagelike
projection of LiDAR data in semantic segmentation by comparing BEV with
spherical-front-view projection and show that SalsaNet is projection-agnostic.
We perform quantitative and qualitative evaluations on the KITTI dataset, which
demonstrate that the proposed SalsaNet outperforms other state-of-the-art
semantic segmentation networks in terms of accuracy and computation time. Our
code and data are publicly available at
https://gitlab.com/aksoyeren/salsanet.git
SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection
LiDAR and camera are two essential sensors for 3D object detection in
autonomous driving. LiDAR provides accurate and reliable 3D geometry
information while the camera provides rich texture with color. Despite the
increasing popularity of fusing these two complementary sensors, the challenge
remains in how to effectively fuse 3D LiDAR point cloud with 2D camera images.
Recent methods focus on point-level fusion which paints the LiDAR point cloud
with camera features in the perspective view or bird's-eye view (BEV)-level
fusion which unifies multi-modality features in the BEV representation. In this
paper, we rethink these previous fusion strategies and analyze their
information loss and influences on geometric and semantic features. We present
SemanticBEVFusion to deeply fuse camera features with LiDAR features in a
unified BEV representation while maintaining per-modality strengths for 3D
object detection. Our method achieves state-of-the-art performance on the
large-scale nuScenes dataset, especially for challenging distant objects. The
code will be made publicly available.Comment: The first two authors contributed equally to this wor
Multi-View 3D Object Detection Network for Autonomous Driving
This paper aims at high-accuracy 3D object detection in autonomous driving
scenario. We propose Multi-View 3D networks (MV3D), a sensory-fusion framework
that takes both LIDAR point cloud and RGB images as input and predicts oriented
3D bounding boxes. We encode the sparse 3D point cloud with a compact
multi-view representation. The network is composed of two subnetworks: one for
3D object proposal generation and another for multi-view feature fusion. The
proposal network generates 3D candidate boxes efficiently from the bird's eye
view representation of 3D point cloud. We design a deep fusion scheme to
combine region-wise features from multiple views and enable interactions
between intermediate layers of different paths. Experiments on the challenging
KITTI benchmark show that our approach outperforms the state-of-the-art by
around 25% and 30% AP on the tasks of 3D localization and 3D detection. In
addition, for 2D detection, our approach obtains 10.3% higher AP than the
state-of-the-art on the hard data among the LIDAR-based methods.Comment: To appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
Deep Semantic Classification for 3D LiDAR Data
Robots are expected to operate autonomously in dynamic environments.
Understanding the underlying dynamic characteristics of objects is a key
enabler for achieving this goal. In this paper, we propose a method for
pointwise semantic classification of 3D LiDAR data into three classes:
non-movable, movable and dynamic. We concentrate on understanding these
specific semantics because they characterize important information required for
an autonomous system. Non-movable points in the scene belong to unchanging
segments of the environment, whereas the remaining classes corresponds to the
changing parts of the scene. The difference between the movable and dynamic
class is their motion state. The dynamic points can be perceived as moving,
whereas movable objects can move, but are perceived as static. To learn the
distinction between movable and non-movable points in the environment, we
introduce an approach based on deep neural network and for detecting the
dynamic points, we estimate pointwise motion. We propose a Bayes filter
framework for combining the learned semantic cues with the motion cues to infer
the required semantic classification. In extensive experiments, we compare our
approach with other methods on a standard benchmark dataset and report
competitive results in comparison to the existing state-of-the-art.
Furthermore, we show an improvement in the classification of points by
combining the semantic cues retrieved from the neural network with the motion
cues.Comment: 8 pages to be published in IROS 201
- …