236 research outputs found
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
End-to-End Learning of Representations for Asynchronous Event-Based Data
Event cameras are vision sensors that record asynchronous streams of
per-pixel brightness changes, referred to as "events". They have appealing
advantages over frame-based cameras for computer vision, including high
temporal resolution, high dynamic range, and no motion blur. Due to the sparse,
non-uniform spatiotemporal layout of the event signal, pattern recognition
algorithms typically aggregate events into a grid-based representation and
subsequently process it by a standard vision pipeline, e.g., Convolutional
Neural Network (CNN). In this work, we introduce a general framework to convert
event streams into grid-based representations through a sequence of
differentiable operations. Our framework comes with two main advantages: (i)
allows learning the input event representation together with the task dedicated
network in an end to end manner, and (ii) lays out a taxonomy that unifies the
majority of extant event representations in the literature and identifies novel
ones. Empirically, we show that our approach to learning the event
representation end-to-end yields an improvement of approximately 12% on optical
flow estimation and object recognition over state-of-the-art methods.Comment: To appear at ICCV 201
Evaluation of deep learning-based classification and object detection algorithms for event cameras
The main objective of this Master Thesis is to analyze the effect of the introduction
of noise in event signals that are used in artificial vision applications based on Deep
learning. Specifically, we are going to focus on an application based on Deep learning,
which can solve two types of tasks, classification and object detection.
For this, we are going to use event signals, which are captured from event cameras.
These event cameras are a new type of cameras that have appeared a few years ago
and have the main characteristic of being based on the functioning of the human eye.
Event cameras have a series of intelligent pixels that are able to detect changes in intensity, and when this change is greater than a certain threshold, an event is generated.
Compared to traditional cameras, event cameras are characterized by a lower latency,
a higher dynamic range and thus avoid the problems of motion blur and saturation of
pixels when the difference between the maximum and minimum brightness level is very
high.
Due to the novelty of these cameras, there is a lack of stock in the market, which
leads to a lack of datasets of these types of signals and this has a direct relationship
with the development of artificial vision applications, especially those that use deep
learning. To solve this situation there are a series of simulators, which are capable
of generating an event signal from an rgb signal, thus providing a tool to transform
datasets that have been captured with traditional cameras into event signal datasets.
There is currently no way to measure the distortion that these simulators generate,
so VPULab is working on designing a set of metrics that are capable of measuring this
distortion. In order to verify whether these metrics work correctly, it is necessary to
measure their correlation with the results of computer vision tasks.
In this work, we are going to evaluate how the introduction of noise on a total of four
datasets of event signals affects the performance of object classification and detection
tasks. We will work with a total of four types of noise and during the different experiments, we will see how the behavior of both tasks is similar when noise is introduced,
and how spatial information is the most relevant in both cases
Asynchronous spiking neurons, the natural key to exploit temporal sparsity
Inference of Deep Neural Networks for stream signal (Video/Audio) processing in edge devices is still challenging. Unlike the most state of the art inference engines which are efficient for static signals, our brain is optimized for real-time dynamic signal processing. We believe one important feature of the brain (asynchronous state-full processing) is the key to its excellence in this domain. In this work, we show how asynchronous processing with state-full neurons allows exploitation of the existing sparsity in natural signals. This paper explains three different types of sparsity and proposes an inference algorithm which exploits all types of sparsities in the execution of already trained networks. Our experiments in three different applications (Handwritten digit recognition, Autonomous Steering and Hand-Gesture recognition) show that this model of inference reduces the number of required operations for sparse input data by a factor of one to two orders of magnitudes. Additionally, due to fully asynchronous processing this type of inference can be run on fully distributed and scalable neuromorphic hardware platforms
NU-AIR -- A Neuromorphic Urban Aerial Dataset for Detection and Localization of Pedestrians and Vehicles
This paper presents an open-source aerial neuromorphic dataset that captures
pedestrians and vehicles moving in an urban environment. The dataset, titled
NU-AIR, features 70.75 minutes of event footage acquired with a 640 x 480
resolution neuromorphic sensor mounted on a quadrotor operating in an urban
environment. Crowds of pedestrians, different types of vehicles, and street
scenes featuring busy urban environments are captured at different elevations
and illumination conditions. Manual bounding box annotations of vehicles and
pedestrians contained in the recordings are provided at a frequency of 30 Hz,
yielding 93,204 labels in total. Evaluation of the dataset's fidelity is
performed through comprehensive ablation study for three Spiking Neural
Networks (SNNs) and training ten Deep Neural Networks (DNNs) to validate the
quality and reliability of both the dataset and corresponding annotations. All
data and Python code to voxelize the data and subsequently train SNNs/DNNs has
been open-sourced.Comment: 20 pages, 5 figure
Event Probability Mask (EPM) and Event Denoising Convolutional Neural Network (EDnCNN) for Neuromorphic Cameras
This paper presents a novel method for labeling real-world neuromorphic
camera sensor data by calculating the likelihood of generating an event at each
pixel within a short time window, which we refer to as "event probability mask"
or EPM. Its applications include (i) objective benchmarking of event denoising
performance, (ii) training convolutional neural networks for noise removal
called "event denoising convolutional neural network" (EDnCNN), and (iii)
estimating internal neuromorphic camera parameters. We provide the first
dataset (DVSNOISE20) of real-world labeled neuromorphic camera events for noise
removal.Comment: submitted to CVPR 202
- …