814 research outputs found
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades
Creating datasets for Neuromorphic Vision is a challenging task. A lack of
available recordings from Neuromorphic Vision sensors means that data must
typically be recorded specifically for dataset creation rather than collecting
and labelling existing data. The task is further complicated by a desire to
simultaneously provide traditional frame-based recordings to allow for direct
comparison with traditional Computer Vision algorithms. Here we propose a
method for converting existing Computer Vision static image datasets into
Neuromorphic Vision datasets using an actuated pan-tilt camera platform. Moving
the sensor rather than the scene or image is a more biologically realistic
approach to sensing and eliminates timing artifacts introduced by monitor
updates when simulating motion on a computer monitor. We present conversion of
two popular image datasets (MNIST and Caltech101) which have played important
roles in the development of Computer Vision, and we provide performance metrics
on these datasets using spike-based recognition algorithms. This work
contributes datasets for future use in the field, as well as results from
spike-based algorithms against which future works can compare. Furthermore, by
converting datasets already popular in Computer Vision, we enable more direct
comparison with frame-based approaches.Comment: 10 pages, 6 figures in Frontiers in Neuromorphic Engineering, special
topic on Benchmarks and Challenges for Neuromorphic Engineering, 2015 (under
review
Optimizing the energy consumption of spiking neural networks for neuromorphic applications
In the last few years, spiking neural networks have been demonstrated to
perform on par with regular convolutional neural networks. Several works have
proposed methods to convert a pre-trained CNN to a Spiking CNN without a
significant sacrifice of performance. We demonstrate first that
quantization-aware training of CNNs leads to better accuracy in SNNs. One of
the benefits of converting CNNs to spiking CNNs is to leverage the sparse
computation of SNNs and consequently perform equivalent computation at a lower
energy consumption. Here we propose an efficient optimization strategy to train
spiking networks at lower energy consumption, while maintaining similar
accuracy levels. We demonstrate results on the MNIST-DVS and CIFAR-10 datasets
End-to-End Learning of Representations for Asynchronous Event-Based Data
Event cameras are vision sensors that record asynchronous streams of
per-pixel brightness changes, referred to as "events". They have appealing
advantages over frame-based cameras for computer vision, including high
temporal resolution, high dynamic range, and no motion blur. Due to the sparse,
non-uniform spatiotemporal layout of the event signal, pattern recognition
algorithms typically aggregate events into a grid-based representation and
subsequently process it by a standard vision pipeline, e.g., Convolutional
Neural Network (CNN). In this work, we introduce a general framework to convert
event streams into grid-based representations through a sequence of
differentiable operations. Our framework comes with two main advantages: (i)
allows learning the input event representation together with the task dedicated
network in an end to end manner, and (ii) lays out a taxonomy that unifies the
majority of extant event representations in the literature and identifies novel
ones. Empirically, we show that our approach to learning the event
representation end-to-end yields an improvement of approximately 12% on optical
flow estimation and object recognition over state-of-the-art methods.Comment: To appear at ICCV 201
- …