290 research outputs found
Rethinking Pseudo-LiDAR Representation
The recently proposed pseudo-LiDAR based 3D detectors greatly improve the
benchmark of monocular/stereo 3D detection task. However, the underlying
mechanism remains obscure to the research community. In this paper, we perform
an in-depth investigation and observe that the efficacy of pseudo-LiDAR
representation comes from the coordinate transformation, instead of data
representation itself. Based on this observation, we design an image based CNN
detector named Patch-Net, which is more generalized and can be instantiated as
pseudo-LiDAR based 3D detectors. Moreover, the pseudo-LiDAR data in our
PatchNet is organized as the image representation, which means existing 2D CNN
designs can be easily utilized for extracting deep features from input data and
boosting 3D detection performance. We conduct extensive experiments on the
challenging KITTI dataset, where the proposed PatchNet outperforms all existing
pseudo-LiDAR based counterparts. Code has been made available at:
https://github.com/xinzhuma/patchnet.Comment: ECCV2020. Supplemental Material attache
Vulnerable road users and connected autonomous vehicles interaction: a survey
There is a group of users within the vehicular traffic ecosystem known as Vulnerable Road Users (VRUs). VRUs include pedestrians, cyclists, motorcyclists, among others. On the other hand, connected autonomous vehicles (CAVs) are a set of technologies that combines, on the one hand, communication technologies to stay always ubiquitous connected, and on the other hand, automated technologies to assist or replace the human driver during the driving process. Autonomous vehicles are being visualized as a viable alternative to solve road accidents providing a general safe environment for all the users on the road specifically to the most vulnerable. One of the problems facing autonomous vehicles is to generate mechanisms that facilitate their integration not only within the mobility environment, but also into the road society in a safe and efficient way. In this paper, we analyze and discuss how this integration can take place, reviewing the work that has been developed in recent years in each of the stages of the vehicle-human interaction, analyzing the challenges of vulnerable users and proposing solutions that contribute to solving these challenges.This work was partially funded by the Ministry of Economy, Industry, and Competitiveness
of Spain under Grant: Supervision of drone fleet and optimization of commercial operations flight
plans, PID2020-116377RB-C21.Peer ReviewedPostprint (published version
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection
Besides standard cameras, autonomous vehicles typically include multipleadditional sensors, such as lidars and radars, which help acquire richerinformation for perceiving the content of the driving scene. While severalrecent works focus on fusing certain pairs of sensors - such as camera andlidar or camera and radar - by using architectural components specific to theexamined setting, a generic and modular sensor fusion architecture is missingfrom the literature. In this work, we focus on 2D object detection, afundamental high-level task which is defined on the 2D image domain, andpropose HRFuser, a multi-resolution sensor fusion architecture that scalesstraightforwardly to an arbitrary number of input modalities. The design ofHRFuser is based on state-of-the-art high-resolution networks for image-onlydense prediction and incorporates a novel multi-window cross-attention block asthe means to perform fusion of multiple modalities at multiple resolutions.Even though cameras alone provide very informative features for 2D detection,we demonstrate via extensive experiments on the nuScenes and Seeing Through Fogdatasets that our model effectively leverages complementary features fromadditional modalities, substantially improving upon camera-only performance andconsistently outperforming state-of-the-art fusion methods for 2D detectionboth in normal and adverse conditions. The source code will be made publiclyavailable.<br
Paint and Distill: Boosting 3D Object Detection with Semantic Passing Network
3D object detection task from lidar or camera sensors is essential for
autonomous driving. Pioneer attempts at multi-modality fusion complement the
sparse lidar point clouds with rich semantic texture information from images at
the cost of extra network designs and overhead. In this work, we propose a
novel semantic passing framework, named SPNet, to boost the performance of
existing lidar-based 3D detection models with the guidance of rich context
painting, with no extra computation cost during inference. Our key design is to
first exploit the potential instructive semantic knowledge within the
ground-truth labels by training a semantic-painted teacher model and then guide
the pure-lidar network to learn the semantic-painted representation via
knowledge passing modules at different granularities: class-wise passing,
pixel-wise passing and instance-wise passing. Experimental results show that
the proposed SPNet can seamlessly cooperate with most existing 3D detection
frameworks with 1~5% AP gain and even achieve new state-of-the-art 3D detection
performance on the KITTI test benchmark. Code is available at:
https://github.com/jb892/SPNet.Comment: Accepted by ACMMM202
- …