17,648 research outputs found
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Driven to Distraction: Self-Supervised Distractor Learning for Robust Monocular Visual Odometry in Urban Environments
We present a self-supervised approach to ignoring "distractors" in camera
images for the purposes of robustly estimating vehicle motion in cluttered
urban environments. We leverage offline multi-session mapping approaches to
automatically generate a per-pixel ephemerality mask and depth map for each
input image, which we use to train a deep convolutional network. At run-time we
use the predicted ephemerality and depth as an input to a monocular visual
odometry (VO) pipeline, using either sparse features or dense photometric
matching. Our approach yields metric-scale VO using only a single camera and
can recover the correct egomotion even when 90% of the image is obscured by
dynamic, independently moving objects. We evaluate our robust VO methods on
more than 400km of driving from the Oxford RobotCar Dataset and demonstrate
reduced odometry drift and significantly improved egomotion estimation in the
presence of large moving vehicles in urban traffic.Comment: International Conference on Robotics and Automation (ICRA), 2018.
Video summary: http://youtu.be/ebIrBn_nc-
A Deep Primal-Dual Network for Guided Depth Super-Resolution
In this paper we present a novel method to increase the spatial resolution of
depth images. We combine a deep fully convolutional network with a non-local
variational method in a deep primal-dual network. The joint network computes a
noise-free, high-resolution estimate from a noisy, low-resolution input depth
map. Additionally, a high-resolution intensity image is used to guide the
reconstruction in the network. By unrolling the optimization steps of a
first-order primal-dual algorithm and formulating it as a network, we can train
our joint method end-to-end. This not only enables us to learn the weights of
the fully convolutional network, but also to optimize all parameters of the
variational method and its optimization procedure. The training of such a deep
network requires a large dataset for supervision. Therefore, we generate
high-quality depth maps and corresponding color images with a physically based
renderer. In an exhaustive evaluation we show that our method outperforms the
state-of-the-art on multiple benchmarks.Comment: BMVC 201
Frustum PointNets for 3D Object Detection from RGB-D Data
In this work, we study 3D object detection from RGB-D data in both indoor and
outdoor scenes. While previous methods focus on images or 3D voxels, often
obscuring natural 3D patterns and invariances of 3D data, we directly operate
on raw point clouds by popping up RGB-D scans. However, a key challenge of this
approach is how to efficiently localize objects in point clouds of large-scale
scenes (region proposal). Instead of solely relying on 3D proposals, our method
leverages both mature 2D object detectors and advanced 3D deep learning for
object localization, achieving efficiency as well as high recall for even small
objects. Benefited from learning directly in raw point clouds, our method is
also able to precisely estimate 3D bounding boxes even under strong occlusion
or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection
benchmarks, our method outperforms the state of the art by remarkable margins
while having real-time capability.Comment: 15 pages, 12 figures, 14 table
Multi-View 3D Object Detection Network for Autonomous Driving
This paper aims at high-accuracy 3D object detection in autonomous driving
scenario. We propose Multi-View 3D networks (MV3D), a sensory-fusion framework
that takes both LIDAR point cloud and RGB images as input and predicts oriented
3D bounding boxes. We encode the sparse 3D point cloud with a compact
multi-view representation. The network is composed of two subnetworks: one for
3D object proposal generation and another for multi-view feature fusion. The
proposal network generates 3D candidate boxes efficiently from the bird's eye
view representation of 3D point cloud. We design a deep fusion scheme to
combine region-wise features from multiple views and enable interactions
between intermediate layers of different paths. Experiments on the challenging
KITTI benchmark show that our approach outperforms the state-of-the-art by
around 25% and 30% AP on the tasks of 3D localization and 3D detection. In
addition, for 2D detection, our approach obtains 10.3% higher AP than the
state-of-the-art on the hard data among the LIDAR-based methods.Comment: To appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
- …