607 research outputs found
Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing
Neuromorphic vision sensing (NVS)\ devices represent visual information as
sequences of asynchronous discrete events (a.k.a., "spikes") in response to
changes in scene reflectance. Unlike conventional active pixel sensing (APS),
NVS allows for significantly higher event sampling rates at substantially
increased energy efficiency and robustness to illumination changes. However,
feature representation for NVS is far behind its APS-based counterparts,
resulting in lower performance in high-level computer vision tasks. To fully
utilize its sparse and asynchronous nature, we propose a compact graph
representation for NVS, which allows for end-to-end learning with graph
convolution neural networks. We couple this with a novel end-to-end feature
learning framework that accommodates both appearance-based and motion-based
tasks. The core of our framework comprises a spatial feature learning module,
which utilizes residual-graph convolutional neural networks (RG-CNN), for
end-to-end learning of appearance-based features directly from graphs. We
extend this with our proposed Graph2Grid block and temporal feature learning
module for efficiently modelling temporal dependencies over multiple graphs and
a long temporal extent. We show how our framework can be configured for object
classification, action recognition and action similarity labeling. Importantly,
our approach preserves the spatial and temporal coherence of spike events,
while requiring less computation and memory. The experimental validation shows
that our proposed framework outperforms all recent methods on standard
datasets. Finally, to address the absence of large real-world NVS datasets for
complex recognition tasks, we introduce, evaluate and make available the
American Sign Language letters (ASL-DVS), as well as human action dataset
(UCF101-DVS, HMDB51-DVS and ASLAN-DVS).Comment: 16 pages, 5 figures. This work is a journal extension of our ICCV'19
paper arXiv:1908.0664
Graph-Based Spatio-Temporal Feature Learning for Neuromorphic Vision Sensing
Neuromorphic vision sensing (NVS) devices represent visual information as sequences of asynchronous discrete events (a.k.a., “spikes”) in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind its APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize its sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearance-based and motion-based tasks. The core of our framework comprises a spatial feature learning module, which utilizes residual-graph convolutional neural networks (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show how our framework can be configured for object classification, action recognition and action similarity labeling. Importantly, our approach preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we introduce, evaluate and make available the American Sign Language letters (ASL-DVS), as well as human action dataset (UCF101-DVS, HMDB51-DVS and ASLAN-DVS)
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Graph-Based Object Classification for Neuromorphic Vision Sensing
Neuromorphic vision sensing (NVS) devices represent visual information as sequences of asynchronous discrete events (a.k.a., "spikes'") in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, object classification with NVS streams cannot leverage on state-of-the-art convolutional neural networks (CNNs), since NVS does not produce frame representations. To circumvent this mismatch between sensing and processing with CNNs, we propose a compact graph representation for NVS. We couple this with novel residual graph CNN architectures and show that, when trained on spatio-temporal NVS data for object classification, such residual graph CNNs preserve the spatial and temporal coherence of spike events, while requiring less computation and memory. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we present and make available a 100k dataset of NVS recordings of the American sign language letters, acquired with an iniLabs DAVIS240c device under real-world conditions
Event Encryption: Rethinking Privacy Exposure for Neuromorphic Imaging
Bio-inspired neuromorphic cameras sense illumination changes on a per-pixel
basis and generate spatiotemporal streaming events within microseconds in
response, offering visual information with high temporal resolution over a high
dynamic range. Such devices often serve in surveillance systems due to their
applicability and robustness in environments with high dynamics and strong or
weak lighting, where they can still supply clearer recordings than traditional
imaging. In other words, when it comes to privacy-relevant cases, neuromorphic
cameras also expose more sensitive data and thus pose serious security threats.
Therefore, asynchronous event streams also necessitate careful encryption
before transmission and usage. This letter discusses several potential attack
scenarios and approaches event encryption from the perspective of neuromorphic
noise removal, in which we inversely introduce well-crafted noise into raw
events until they are obfuscated. Evaluations show that the encrypted events
can effectively protect information from the attacks of low-level visual
reconstruction and high-level neuromorphic reasoning, and thus feature
dependable privacy-preserving competence. Our solution gives impetus to the
security of event data and paves the way to a highly encrypted technique for
privacy-protective neuromorphic imaging
Event-Driven Tactile Learning with Location Spiking Neurons
The sense of touch is essential for a variety of daily tasks. New advances in
event-based tactile sensors and Spiking Neural Networks (SNNs) spur the
research in event-driven tactile learning. However, SNN-enabled event-driven
tactile learning is still in its infancy due to the limited representative
abilities of existing spiking neurons and high spatio-temporal complexity in
the data. In this paper, to improve the representative capabilities of existing
spiking neurons, we propose a novel neuron model called "location spiking
neuron", which enables us to extract features of event-based data in a novel
way. Moreover, based on the classical Time Spike Response Model (TSRM), we
develop a specific location spiking neuron model - Location Spike Response
Model (LSRM) that serves as a new building block of SNNs. Furthermore, we
propose a hybrid model which combines an SNN with TSRM neurons and an SNN with
LSRM neurons to capture the complex spatio-temporal dependencies in the data.
Extensive experiments demonstrate the significant improvements of our models
over other works on event-driven tactile learning and show the superior energy
efficiency of our models and location spiking neurons, which may unlock their
potential on neuromorphic hardware.Comment: accepted by IJCNN 2022 (oral), the source code is available at
https://github.com/pkang2017/TactileLocNeuron
AEGNN: Asynchronous Event-based Graph Neural Networks
The best performing learning algorithms devised for event cameras work by first converting events into dense representations that are then processed using standard CNNs. However, these steps discard both the sparsity and high temporal resolution of events, leading to high computational burden and latency. For this reason, recent works have adopted Graph Neural Networks (GNNs), which process events as “static” spatio-temporal graphs, which are inherently “sparse”. We take this trend one step further by introducing Asynchronous, Event-based Graph Neural Networks (AEGNNs), a novel event-processing paradigm that generalizes standard GNNs to process events as “evolving” spatio-temporal graphs. AEGNNs follow efficient update rules that restrict recomputation of network activations only to the nodes affected by each new event, thereby significantly reducing both computation and latency for event-by-event processing. AEGNNs are easily trained on synchronous inputs and can be converted to efficient, “asynchronous” networks at test time. We thoroughly validate our method on object classification and detection tasks, where we show an up to a 200-fold reduction in computational complexity (FLOPs), with similar or even better performance than state-of-the-art asynchronous methods. This reduction in computation directly translates to an 8-fold reduction in computational latency when compared to standard GNNs, which opens the door to low-latency event-based processing
RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network
Event-based cameras are inspired by the sparse and asynchronous spike
representation of the biological visual system. However, processing the event
data requires either using expensive feature descriptors to transform spikes
into frames, or using spiking neural networks that are expensive to train. In
this work, we propose a neural network architecture, Reservoir Nodes-enabled
neuromorphic vision sensing Network (RN-Net), based on simple convolution
layers integrated with dynamic temporal encoding reservoirs for local and
global spatiotemporal feature detection with low hardware and training costs.
The RN-Net allows efficient processing of asynchronous temporal features, and
achieves the highest accuracy of 99.2% for DVS128 Gesture reported to date, and
one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller
network size. By leveraging the internal device and circuit dynamics,
asynchronous temporal feature encoding can be implemented at very low hardware
cost without preprocessing and dedicated memory and arithmetic units. The use
of simple DNN blocks and standard backpropagation-based training rules further
reduces implementation costs.Comment: 12 pages, 5 figures, 4 table
- …