158 research outputs found
DDD17: End-To-End DAVIS Driving Dataset
Event cameras, such as dynamic vision sensors (DVS), and dynamic and
active-pixel vision sensors (DAVIS) can supplement other autonomous driving
sensors by providing a concurrent stream of standard active pixel sensor (APS)
images and DVS temporal contrast events. The APS stream is a sequence of
standard grayscale global-shutter image sensor frames. The DVS events represent
brightness changes occurring at a particular moment, with a jitter of about a
millisecond under most lighting conditions. They have a dynamic range of >120
dB and effective frame rates >1 kHz at data rates comparable to 30 fps
(frames/second) image sensors. To overcome some of the limitations of current
image acquisition technology, we investigate in this work the use of the
combined DVS and APS streams in end-to-end driving applications. The dataset
DDD17 accompanying this paper is the first open dataset of annotated DAVIS
driving recordings. DDD17 has over 12 h of a 346x260 pixel DAVIS sensor
recording highway and city driving in daytime, evening, night, dry and wet
weather conditions, along with vehicle speed, GPS position, driver steering,
throttle, and brake captured from the car's on-board diagnostics interface. As
an example application, we performed a preliminary end-to-end learning study of
using a convolutional neural network that is trained to predict the
instantaneous steering angle from DVS and APS visual data.Comment: Presented at the ICML 2017 Workshop on Machine Learning for
Autonomous Vehicle
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
PCA-RECT: An Energy-efficient Object Detection Approach for Event Cameras
We present the first purely event-based, energy-efficient approach for object
detection and categorization using an event camera. Compared to traditional
frame-based cameras, choosing event cameras results in high temporal resolution
(order of microseconds), low power consumption (few hundred mW) and wide
dynamic range (120 dB) as attractive properties. However, event-based object
recognition systems are far behind their frame-based counterparts in terms of
accuracy. To this end, this paper presents an event-based feature extraction
method devised by accumulating local activity across the image frame and then
applying principal component analysis (PCA) to the normalized neighborhood
region. Subsequently, we propose a backtracking-free k-d tree mechanism for
efficient feature matching by taking advantage of the low-dimensionality of the
feature representation. Additionally, the proposed k-d tree mechanism allows
for feature selection to obtain a lower-dimensional dictionary representation
when hardware resources are limited to implement dimensionality reduction.
Consequently, the proposed system can be realized on a field-programmable gate
array (FPGA) device leading to high performance over resource ratio. The
proposed system is tested on real-world event-based datasets for object
categorization, showing superior classification performance and relevance to
state-of-the-art algorithms. Additionally, we verified the object detection
method and real-time FPGA performance in lab settings under non-controlled
illumination conditions with limited training data and ground truth
annotations.Comment: Accepted in ACCV 2018 Workshops, to appea
Neuromorphic Event-based Action Recognition
An action can be viewed as spike trains or streams of events when observed and captured by neuromorphic imaging hardware such as the iniLabs DVS128. These streams are unique to each action enabling them to be used to form descriptors. This paper describes an approach for detecting specific actions based on space-time template matching by forming such descriptors and using them as comparative tools. The developed approach is used to detect symbols from the popular RoShambo (rock, paper and scissors) game. The results demonstrate that the developed approach can be used to correctly detect the motions involved in producing RoShambo symbols
TimeLens: Event-based Video Frame Interpolation
State-of-the-art frame interpolation methods generate intermediate frames by
inferring object motions in the image from consecutive key-frames. In the
absence of additional information, first-order approximations, i.e. optical
flow, must be used, but this choice restricts the types of motions that can be
modeled, leading to errors in highly dynamic scenarios. Event cameras are novel
sensors that address this limitation by providing auxiliary visual information
in the blind-time between frames. They asynchronously measure per-pixel
brightness changes and do this with high temporal resolution and low latency.
Event-based frame interpolation methods typically adopt a synthesis-based
approach, where predicted frame residuals are directly applied to the
key-frames. However, while these approaches can capture non-linear motions they
suffer from ghosting and perform poorly in low-texture regions with few events.
Thus, synthesis-based and flow-based approaches are complementary. In this
work, we introduce Time Lens, a novel indicates equal contribution method that
leverages the advantages of both. We extensively evaluate our method on three
synthetic and two real benchmarks where we show an up to 5.21 dB improvement in
terms of PSNR over state-of-the-art frame-based and event-based methods.
Finally, we release a new large-scale dataset in highly dynamic scenarios,
aimed at pushing the limits of existing methods
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification
Deep-learning is a cutting edge theory that is being applied to many fields.
For vision applications the Convolutional Neural Networks (CNN) are demanding
significant accuracy for classification tasks. Numerous hardware accelerators
have populated during the last years to improve CPU or GPU based solutions.
This technology is commonly prototyped and tested over FPGAs before being
considered for ASIC fabrication for mass production. The use of commercial
typical cameras (30fps) limits the capabilities of these systems for high speed
applications. The use of dynamic vision sensors (DVS) that emulate the behavior
of a biological retina is taking an incremental importance to improve this
applications due to its nature, where the information is represented by a
continuous stream of spikes and the frames to be processed by the CNN are
constructed collecting a fixed number of these spikes (called events). The
faster an object is, the more events are produced by DVS, so the higher is the
equivalent frame rate. Therefore, these DVS utilization allows to compute a
frame at the maximum speed a CNN accelerator can offer. In this paper we
present a VHDL/HLS description of a pipelined design for FPGA able to collect
events from an Address-Event-Representation (AER) DVS retina to obtain a
normalized histogram to be used by a particular CNN accelerator, called
NullHop. VHDL is used to describe the circuit, and HLS for computation blocks,
which are used to perform the normalization of a frame needed for the CNN.
Results outperform previous implementations of frames collection and
normalization using ARM processors running at 800MHz on a Zynq7100 in both
latency and power consumption. A measured 67% speedup factor is presented for a
Roshambo CNN real-time experiment running at 160fps peak rate.Comment: 7 page
ColibriUAV: An Ultra-Fast, Energy-Efficient Neuromorphic Edge Processing UAV-Platform with Event-Based and Frame-Based Cameras
The interest in dynamic vision sensor (DVS)-powered unmanned aerial vehicles
(UAV) is raising, especially due to the microsecond-level reaction time of the
bio-inspired event sensor, which increases robustness and reduces latency of
the perception tasks compared to a RGB camera. This work presents ColibriUAV, a
UAV platform with both frame-based and event-based cameras interfaces for
efficient perception and near-sensor processing. The proposed platform is
designed around Kraken, a novel low-power RISC-V System on Chip with two
hardware accelerators targeting spiking neural networks and deep ternary neural
networks.Kraken is capable of efficiently processing both event data from a DVS
camera and frame data from an RGB camera. A key feature of Kraken is its
integrated, dedicated interface with a DVS camera. This paper benchmarks the
end-to-end latency and power efficiency of the neuromorphic and event-based UAV
subsystem, demonstrating state-of-the-art event data with a throughput of 7200
frames of events per second and a power consumption of 10.7 \si{\milli\watt},
which is over 6.6 times faster and a hundred times less power-consuming than
the widely-used data reading approach through the USB interface. The overall
sensing and processing power consumption is below 50 mW, achieving latency in
the milliseconds range, making the platform suitable for low-latency autonomous
nano-drones as well
DDD20 End-to-End Event Camera Driving Dataset: Fusing Frames and Events with Deep Learning for Improved Steering Prediction
Neuromorphic event cameras are useful for dynamic vision problems under
difficult lighting conditions. To enable studies of using event cameras in
automobile driving applications, this paper reports a new end-to-end driving
dataset called DDD20. The dataset was captured with a DAVIS camera that
concurrently streams both dynamic vision sensor (DVS) brightness change events
and active pixel sensor (APS) intensity frames. DDD20 is the longest event
camera end-to-end driving dataset to date with 51h of DAVIS event+frame camera
and vehicle human control data collected from 4000km of highway and urban
driving under a variety of lighting conditions. Using DDD20, we report the
first study of fusing brightness change events and intensity frame data using a
deep learning approach to predict the instantaneous human steering wheel angle.
Over all day and night conditions, the explained variance for human steering
prediction from a Resnet-32 is significantly better from the fused DVS+APS
frames (0.88) than using either DVS (0.67) or APS (0.77) data alone.Comment: Accepted in The 23rd IEEE International Conference on Intelligent
Transportation Systems (Special Session: Beyond Traditional Sensing for
Intelligent Transportation
- …