51,367 research outputs found
PIXOR: Real-time 3D Object Detection from Point Clouds
We address the problem of real-time 3D object detection from point clouds in
the context of autonomous driving. Computation speed is critical as detection
is a necessary component for safety. Existing approaches are, however,
expensive in computation due to high dimensionality of point clouds. We utilize
the 3D data more efficiently by representing the scene from the Bird's Eye View
(BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs
oriented 3D object estimates decoded from pixel-wise neural network
predictions. The input representation, network architecture, and model
optimization are especially designed to balance high accuracy and real-time
efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection
benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets
we show that the proposed detector surpasses other state-of-the-art methods
notably in terms of Average Precision (AP), while still runs at >28 FPS.Comment: Update of CVPR2018 paper: correct timing, fix typos, add
  acknowledgemen
J-MOD: Joint Monocular Obstacle Detection and Depth Estimation
In this work, we propose an end-to-end deep architecture that jointly learns
to detect obstacles and estimate their depth for MAV flight applications. Most
of the existing approaches either rely on Visual SLAM systems or on depth
estimation models to build 3D maps and detect obstacles. However, for the task
of avoiding obstacles this level of complexity is not required. Recent works
have proposed multi task architectures to both perform scene understanding and
depth estimation. We follow their track and propose a specific architecture to
jointly estimate depth and obstacles, without the need to compute a global map,
but maintaining compatibility with a global SLAM system if needed. The network
architecture is devised to exploit the joint information of the obstacle
detection task, that produces more reliable bounding boxes, with the depth
estimation one, increasing the robustness of both to scenario changes. We call
this architecture J-MOD. We test the effectiveness of our approach with
experiments on sequences with different appearance and focal lengths and
compare it to SotA multi task methods that jointly perform semantic
segmentation and depth estimation. In addition, we show the integration in a
full system using a set of simulated navigation experiments where a MAV
explores an unknown scenario and plans safe trajectories by using our detection
model
3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes
While deep convolutional neural networks (CNN) have been successfully applied
for 2D image analysis, it is still challenging to apply them to 3D anisotropic
volumes, especially when the within-slice resolution is much higher than the
between-slice resolution and when the amount of 3D volumes is relatively small.
On one hand, direct learning of CNN with 3D convolution kernels suffers from
the lack of data and likely ends up with poor generalization; insufficient GPU
memory limits the model size or representational power. On the other hand,
applying 2D CNN with generalizable features to 2D slices ignores between-slice
information. Coupling 2D network with LSTM to further handle the between-slice
information is not optimal due to the difficulty in LSTM learning. To overcome
the above challenges, we propose a 3D Anisotropic Hybrid Network (AH-Net) that
transfers convolutional features learned from 2D images to 3D anisotropic
volumes. Such a transfer inherits the desired strong generalization capability
for within-slice information while naturally exploiting between-slice
information for more effective modelling. The focal loss is further utilized
for more effective end-to-end learning. We experiment with the proposed 3D
AH-Net on two different medical image analysis tasks, namely lesion detection
from a Digital Breast Tomosynthesis volume, and liver and liver tumor
segmentation from a Computed Tomography volume and obtain the state-of-the-art
results
- …
