42,403 research outputs found
A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View
The bird's-eye-view (BEV) representation allows robust learning of multiple
tasks for autonomous driving including road layout estimation and 3D object
detection. However, contemporary methods for unified road layout estimation and
3D object detection rarely handle the class imbalance of the training dataset
and multi-class learning to reduce the total number of networks required. To
overcome these limitations, we propose a unified model for road layout
estimation and 3D object detection inspired by the transformer architecture and
the CycleGAN learning framework. The proposed model deals with the performance
degradation due to the class imbalance of the dataset utilizing the focal loss
and the proposed dual cycle loss. Moreover, we set up extensive learning
scenarios to study the effect of multi-class learning for road layout
estimation in various situations. To verify the effectiveness of the proposed
model and the learning scheme, we conduct a thorough ablation study and a
comparative study. The experiment results attest the effectiveness of our
model; we achieve state-of-the-art performance in both the road layout
estimation and 3D object detection tasks
PIXOR: Real-time 3D Object Detection from Point Clouds
We address the problem of real-time 3D object detection from point clouds in
the context of autonomous driving. Computation speed is critical as detection
is a necessary component for safety. Existing approaches are, however,
expensive in computation due to high dimensionality of point clouds. We utilize
the 3D data more efficiently by representing the scene from the Bird's Eye View
(BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs
oriented 3D object estimates decoded from pixel-wise neural network
predictions. The input representation, network architecture, and model
optimization are especially designed to balance high accuracy and real-time
efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection
benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets
we show that the proposed detector surpasses other state-of-the-art methods
notably in terms of Average Precision (AP), while still runs at >28 FPS.Comment: Update of CVPR2018 paper: correct timing, fix typos, add
acknowledgemen
J-MOD: Joint Monocular Obstacle Detection and Depth Estimation
In this work, we propose an end-to-end deep architecture that jointly learns
to detect obstacles and estimate their depth for MAV flight applications. Most
of the existing approaches either rely on Visual SLAM systems or on depth
estimation models to build 3D maps and detect obstacles. However, for the task
of avoiding obstacles this level of complexity is not required. Recent works
have proposed multi task architectures to both perform scene understanding and
depth estimation. We follow their track and propose a specific architecture to
jointly estimate depth and obstacles, without the need to compute a global map,
but maintaining compatibility with a global SLAM system if needed. The network
architecture is devised to exploit the joint information of the obstacle
detection task, that produces more reliable bounding boxes, with the depth
estimation one, increasing the robustness of both to scenario changes. We call
this architecture J-MOD. We test the effectiveness of our approach with
experiments on sequences with different appearance and focal lengths and
compare it to SotA multi task methods that jointly perform semantic
segmentation and depth estimation. In addition, we show the integration in a
full system using a set of simulated navigation experiments where a MAV
explores an unknown scenario and plans safe trajectories by using our detection
model
- …