236 research outputs found

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    End-to-End Learning of Representations for Asynchronous Event-Based Data

    Full text link
    Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations through a sequence of differentiable operations. Our framework comes with two main advantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.Comment: To appear at ICCV 201

    Evaluation of deep learning-based classification and object detection algorithms for event cameras

    Full text link
    The main objective of this Master Thesis is to analyze the effect of the introduction of noise in event signals that are used in artificial vision applications based on Deep learning. Specifically, we are going to focus on an application based on Deep learning, which can solve two types of tasks, classification and object detection. For this, we are going to use event signals, which are captured from event cameras. These event cameras are a new type of cameras that have appeared a few years ago and have the main characteristic of being based on the functioning of the human eye. Event cameras have a series of intelligent pixels that are able to detect changes in intensity, and when this change is greater than a certain threshold, an event is generated. Compared to traditional cameras, event cameras are characterized by a lower latency, a higher dynamic range and thus avoid the problems of motion blur and saturation of pixels when the difference between the maximum and minimum brightness level is very high. Due to the novelty of these cameras, there is a lack of stock in the market, which leads to a lack of datasets of these types of signals and this has a direct relationship with the development of artificial vision applications, especially those that use deep learning. To solve this situation there are a series of simulators, which are capable of generating an event signal from an rgb signal, thus providing a tool to transform datasets that have been captured with traditional cameras into event signal datasets. There is currently no way to measure the distortion that these simulators generate, so VPULab is working on designing a set of metrics that are capable of measuring this distortion. In order to verify whether these metrics work correctly, it is necessary to measure their correlation with the results of computer vision tasks. In this work, we are going to evaluate how the introduction of noise on a total of four datasets of event signals affects the performance of object classification and detection tasks. We will work with a total of four types of noise and during the different experiments, we will see how the behavior of both tasks is similar when noise is introduced, and how spatial information is the most relevant in both cases

    Asynchronous spiking neurons, the natural key to exploit temporal sparsity

    Get PDF
    Inference of Deep Neural Networks for stream signal (Video/Audio) processing in edge devices is still challenging. Unlike the most state of the art inference engines which are efficient for static signals, our brain is optimized for real-time dynamic signal processing. We believe one important feature of the brain (asynchronous state-full processing) is the key to its excellence in this domain. In this work, we show how asynchronous processing with state-full neurons allows exploitation of the existing sparsity in natural signals. This paper explains three different types of sparsity and proposes an inference algorithm which exploits all types of sparsities in the execution of already trained networks. Our experiments in three different applications (Handwritten digit recognition, Autonomous Steering and Hand-Gesture recognition) show that this model of inference reduces the number of required operations for sparse input data by a factor of one to two orders of magnitudes. Additionally, due to fully asynchronous processing this type of inference can be run on fully distributed and scalable neuromorphic hardware platforms

    NU-AIR -- A Neuromorphic Urban Aerial Dataset for Detection and Localization of Pedestrians and Vehicles

    Full text link
    This paper presents an open-source aerial neuromorphic dataset that captures pedestrians and vehicles moving in an urban environment. The dataset, titled NU-AIR, features 70.75 minutes of event footage acquired with a 640 x 480 resolution neuromorphic sensor mounted on a quadrotor operating in an urban environment. Crowds of pedestrians, different types of vehicles, and street scenes featuring busy urban environments are captured at different elevations and illumination conditions. Manual bounding box annotations of vehicles and pedestrians contained in the recordings are provided at a frequency of 30 Hz, yielding 93,204 labels in total. Evaluation of the dataset's fidelity is performed through comprehensive ablation study for three Spiking Neural Networks (SNNs) and training ten Deep Neural Networks (DNNs) to validate the quality and reliability of both the dataset and corresponding annotations. All data and Python code to voxelize the data and subsequently train SNNs/DNNs has been open-sourced.Comment: 20 pages, 5 figure

    Event Probability Mask (EPM) and Event Denoising Convolutional Neural Network (EDnCNN) for Neuromorphic Cameras

    Full text link
    This paper presents a novel method for labeling real-world neuromorphic camera sensor data by calculating the likelihood of generating an event at each pixel within a short time window, which we refer to as "event probability mask" or EPM. Its applications include (i) objective benchmarking of event denoising performance, (ii) training convolutional neural networks for noise removal called "event denoising convolutional neural network" (EDnCNN), and (iii) estimating internal neuromorphic camera parameters. We provide the first dataset (DVSNOISE20) of real-world labeled neuromorphic camera events for noise removal.Comment: submitted to CVPR 202
    • …
    corecore