85,318 research outputs found
High Speed and High Dynamic Range Video with an Event Camera
Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality (>20%), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos (> 5,000 frames per second) of high-speed phenomena (e.g. a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-t..
High Speed and High Dynamic Range Video with an Event Camera
Event cameras are novel sensors that report brightness changes in the form of
a stream of asynchronous "events" instead of intensity frames. They offer
significant advantages with respect to conventional cameras: high temporal
resolution, high dynamic range, and no motion blur. While the stream of events
encodes in principle the complete visual signal, the reconstruction of an
intensity image from a stream of events is an ill-posed problem in practice.
Existing reconstruction approaches are based on hand-crafted priors and strong
assumptions about the imaging process as well as the statistics of natural
images. In this work we propose to learn to reconstruct intensity images from
event streams directly from data instead of relying on any hand-crafted priors.
We propose a novel recurrent network to reconstruct videos from a stream of
events, and train it on a large amount of simulated event data. During training
we propose to use a perceptual loss to encourage reconstructions to follow
natural image statistics. We further extend our approach to synthesize color
images from color event streams. Our network surpasses state-of-the-art
reconstruction methods by a large margin in terms of image quality (> 20%),
while comfortably running in real-time. We show that the network is able to
synthesize high framerate videos (> 5,000 frames per second) of high-speed
phenomena (e.g. a bullet hitting an object) and is able to provide high dynamic
range reconstructions in challenging lighting conditions. We also demonstrate
the effectiveness of our reconstructions as an intermediate representation for
event data. We show that off-the-shelf computer vision algorithms can be
applied to our reconstructions for tasks such as object classification and
visual-inertial odometry and that this strategy consistently outperforms
algorithms that were specifically designed for event data.Comment: arXiv admin note: substantial text overlap with arXiv:1904.0829
Event-based, 6-DOF Camera Tracking from Photometric Depth Maps
Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high-speed motions or in scenes characterized by high dynamic range. These features, along with a very low power consumption, make event cameras an ideal complement to standard cameras for VR/AR and video game applications. With these applications in mind, this paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. We successfully evaluate the method in both indoor and outdoor scenes and show that—because of the technological advantages of the event camera—our pipeline works in scenes characterized by high-speed motion, which are still unaccessible to standard cameras
An Asynchronous Intensity Representation for Framed and Event Video Sources
Neuromorphic "event" cameras, designed to mimic the human vision system with
asynchronous sensing, unlock a new realm of high-speed and high dynamic range
applications. However, researchers often either revert to a framed
representation of event data for applications, or build bespoke applications
for a particular camera's event data type. To usher in the next era of video
systems, accommodate new event camera designs, and explore the benefits to
asynchronous video in classical applications, we argue that there is a need for
an asynchronous, source-agnostic video representation. In this paper, we
introduce a novel, asynchronous intensity representation for both framed and
non-framed data sources. We show that our representation can increase intensity
precision and greatly reduce the number of samples per pixel compared to
grid-based representations. With framed sources, we demonstrate that by
permitting a small amount of loss through the temporal averaging of similar
pixel values, we can reduce our representational sample rate by more than half,
while incurring a drop in VMAF quality score of only 4.5. We also demonstrate
lower latency than the state-of-the-art method for fusing and transcoding
framed and event camera data to an intensity representation, while maintaining
the temporal resolution. We argue that our method provides the
computational efficiency and temporal granularity necessary to build real-time
intensity-based applications for event cameras.Comment: 10 page
Semi-Dense 3D Reconstruction with a Stereo Event Camera
Event cameras are bio-inspired sensors that offer several advantages, such as
low latency, high-speed and high dynamic range, to tackle challenging scenarios
in computer vision. This paper presents a solution to the problem of 3D
reconstruction from data captured by a stereo event-camera rig moving in a
static scene, such as in the context of stereo Simultaneous Localization and
Mapping. The proposed method consists of the optimization of an energy function
designed to exploit small-baseline spatio-temporal consistency of events
triggered across both stereo image planes. To improve the density of the
reconstruction and to reduce the uncertainty of the estimation, a probabilistic
depth-fusion strategy is also developed. The resulting method has no special
requirements on either the motion of the stereo event-camera rig or on prior
knowledge about the scene. Experiments demonstrate our method can deal with
both texture-rich scenes as well as sparse scenes, outperforming
state-of-the-art stereo methods based on event data image representations.Comment: 19 pages, 8 figures, Video: https://youtu.be/Qrnpj2FD1e
EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras
Event-based cameras have shown great promise in a variety of situations where
frame based cameras suffer, such as high speed motions and high dynamic range
scenes. However, developing algorithms for event measurements requires a new
class of hand crafted algorithms. Deep learning has shown great success in
providing model free solutions to many problems in the vision community, but
existing networks have been developed with frame based images in mind, and
there does not exist the wealth of labeled data for events as there does for
images for supervised training. To these points, we present EV-FlowNet, a novel
self-supervised deep learning pipeline for optical flow estimation for event
based cameras. In particular, we introduce an image based representation of a
given event stream, which is fed into a self-supervised neural network as the
sole input. The corresponding grayscale images captured from the same camera at
the same time as the events are then used as a supervisory signal to provide a
loss function at training time, given the estimated flow from the network. We
show that the resulting network is able to accurately predict optical flow from
events only in a variety of different scenes, with performance competitive to
image based networks. This method not only allows for accurate estimation of
dense optical flow, but also provides a framework for the transfer of other
self-supervised methods to the event-based domain.Comment: 9 pages, 5 figures, 1 table. Accompanying video:
https://youtu.be/eMHZBSoq0sE. Dataset:
https://daniilidis-group.github.io/mvsec/, Robotics: Science and Systems 201
CED: Color Event Camera Dataset
Event cameras are novel, bio-inspired visual sensors, whose pixels output
asynchronous and independent timestamped spikes at local intensity changes,
called 'events'. Event cameras offer advantages over conventional frame-based
cameras in terms of latency, high dynamic range (HDR) and temporal resolution.
Until recently, event cameras have been limited to outputting events in the
intensity channel, however, recent advances have resulted in the development of
color event cameras, such as the Color-DAVIS346. In this work, we present and
release the first Color Event Camera Dataset (CED), containing 50 minutes of
footage with both color frames and events. CED features a wide variety of
indoor and outdoor scenes, which we hope will help drive forward event-based
vision research. We also present an extension of the event camera simulator
ESIM that enables simulation of color events. Finally, we present an evaluation
of three state-of-the-art image reconstruction methods that can be used to
convert the Color-DAVIS346 into a continuous-time, HDR, color video camera to
visualise the event stream, and for use in downstream vision applications.Comment: Conference on Computer Vision and Pattern Recognition Workshop
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
- …