2,148 research outputs found
Joint 3D Proposal Generation and Object Detection from View Aggregation
We present AVOD, an Aggregate View Object Detection network for autonomous
driving scenarios. The proposed neural network architecture uses LIDAR point
clouds and RGB images to generate features that are shared by two subnetworks:
a region proposal network (RPN) and a second stage detector network. The
proposed RPN uses a novel architecture capable of performing multimodal feature
fusion on high resolution feature maps to generate reliable 3D object proposals
for multiple object classes in road scenes. Using these proposals, the second
stage detection network performs accurate oriented 3D bounding box regression
and category classification to predict the extents, orientation, and
classification of objects in 3D space. Our proposed architecture is shown to
produce state of the art results on the KITTI 3D object detection benchmark
while running in real time with a low memory footprint, making it a suitable
candidate for deployment on autonomous vehicles. Code is at:
https://github.com/kujason/avodComment: For any inquiries contact aharakeh(at)uwaterloo(dot)c
GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks
In the last decade, supervised deep learning approaches have been extensively
employed in visual odometry (VO) applications, which is not feasible in
environments where labelled data is not abundant. On the other hand,
unsupervised deep learning approaches for localization and mapping in unknown
environments from unlabelled data have received comparatively less attention in
VO research. In this study, we propose a generative unsupervised learning
framework that predicts 6-DoF pose camera motion and monocular depth map of the
scene from unlabelled RGB image sequences, using deep convolutional Generative
Adversarial Networks (GANs). We create a supervisory signal by warping view
sequences and assigning the re-projection minimization to the objective loss
function that is adopted in multi-view pose estimation and single-view depth
generation network. Detailed quantitative and qualitative evaluations of the
proposed framework on the KITTI and Cityscapes datasets show that the proposed
method outperforms both existing traditional and unsupervised deep VO methods
providing better results for both pose estimation and depth recovery.Comment: ICRA 2019 - accepte
- …