2,890 research outputs found
A Bio-Inspired Vision Sensor With Dual Operation and Readout Modes
This paper presents a novel event-based vision sensor with two operation modes: intensity mode and spatial contrast detection. They can be combined with two different readout approaches: pulse density modulation and time-to-first spike. The sensor is conceived to be a node of an smart camera network made up of several independent an autonomous nodes that send information to a central one. The user can toggle the operation and the readout modes with two control bits. The sensor has low latency (below 1 ms under average illumination conditions), low power consumption (19 mA), and reduced data flow, when detecting spatial contrast. A new approach to compute the spatial contrast based on inter-pixel event communication less prone to mismatch effects than diffusive networks is proposed. The sensor was fabricated in the standard AMS4M2P 0.35-um process. A detailed system-level description and experimental results are provided.Office of Naval Research (USA) N00014-14-1-0355Ministerio de EconomĂa y Competitividad TEC2012- 38921-C02-02, P12-TIC-2338, IPT-2011-1625-43000
An Event-Driven Multi-Kernel Convolution Processor Module for Event-Driven Vision Sensors
Event-Driven vision sensing is a new way of sensing
visual reality in a frame-free manner. This is, the vision sensor
(camera) is not capturing a sequence of still frames, as in conventional
video and computer vision systems. In Event-Driven sensors
each pixel autonomously and asynchronously decides when to
send its address out. This way, the sensor output is a continuous
stream of address events representing reality dynamically continuously
and without constraining to frames. In this paper we present
an Event-Driven Convolution Module for computing 2D convolutions
on such event streams. The Convolution Module has been
designed to assemble many of them for building modular and hierarchical
Convolutional Neural Networks for robust shape and
pose invariant object recognition. The Convolution Module has
multi-kernel capability. This is, it will select the convolution kernel
depending on the origin of the event. A proof-of-concept test prototype
has been fabricated in a 0.35 m CMOS process and extensive
experimental results are provided. The Convolution Processor has
also been combined with an Event-Driven Dynamic Vision Sensor
(DVS) for high-speed recognition examples. The chip can discriminate
propellers rotating at 2 k revolutions per second, detect symbols
on a 52 card deck when browsing all cards in 410 ms, or detect
and follow the center of a phosphor oscilloscope trace rotating at
5 KHz.UniĂłn Europea 216777 (NABAB)Ministerio de Ciencia e InnovaciĂłn TEC2009-10639-C04-0
Event-based Face Detection and Tracking in the Blink of an Eye
We present the first purely event-based method for face detection using the
high temporal resolution of an event-based camera. We will rely on a new
feature that has never been used for such a task that relies on detecting eye
blinks. Eye blinks are a unique natural dynamic signature of human faces that
is captured well by event-based sensors that rely on relative changes of
luminance. Although an eye blink can be captured with conventional cameras, we
will show that the dynamics of eye blinks combined with the fact that two eyes
act simultaneously allows to derive a robust methodology for face detection at
a low computational cost and high temporal resolution. We show that eye blinks
have a unique temporal signature over time that can be easily detected by
correlating the acquired local activity with a generic temporal model of eye
blinks that has been generated from a wide population of users. We furthermore
show that once the face is reliably detected it is possible to apply a
probabilistic framework to track the spatial position of a face for each
incoming event while updating the position of trackers. Results are shown for
several indoor and outdoor experiments. We will also release an annotated data
set that can be used for future work on the topic
Comparison between Frame-Constrained Fix-Pixel-Value and Frame-Free Spiking-Dynamic-Pixel ConvNets for Visual Processing
Most scene segmentation and categorization architectures for the extraction of features in images and patches make exhaustive use of 2D convolution operations for template matching, template search, and denoising. Convolutional Neural Networks (ConvNets) are one example of such architectures that can implement general-purpose bio-inspired vision systems. In standard digital computers 2D convolutions are usually expensive in terms of resource consumption and impose severe limitations for efficient real-time applications. Nevertheless, neuro-cortex inspired solutions, like dedicated Frame-Based or Frame-Free Spiking ConvNet Convolution Processors, are advancing real-time visual processing. These two approaches share the neural inspiration, but each of them solves the problem in different ways. Frame-Based ConvNets process frame by frame video information in a very robust and fast way that requires to use and share the available hardware resources (such as: multipliers, adders). Hardware resources are fixed- and time-multiplexed by fetching data in and out. Thus memory bandwidth and size is important for good performance. On the other hand, spike-based convolution processors are a frame-free alternative that is able to perform convolution of a spike-based source of visual information with very low latency, which makes ideal for very high-speed applications. However, hardware resources need to be available all the time and cannot be time-multiplexed. Thus, hardware should be modular, reconfigurable, and expansible. Hardware implementations in both VLSI custom integrated circuits (digital and analog) and FPGA have been already used to demonstrate the performance of these systems. In this paper we present a comparison study of these two neuro-inspired solutions. A brief description of both systems is presented and also discussions about their differences, pros and cons
Object detection and recognition with event driven cameras
This thesis presents study, analysis and implementation of algorithms
to perform object detection and recognition using an event-based cam
era. This sensor represents a novel paradigm which opens a wide range
of possibilities for future developments of computer vision. In partic
ular it allows to produce a fast, compressed, illumination invariant
output, which can be exploited for robotic tasks, where fast dynamics
and signi\ufb01cant illumination changes are frequent. The experiments
are carried out on the neuromorphic version of the iCub humanoid
platform. The robot is equipped with a novel dual camera setup
mounted directly in the robot\u2019s eyes, used to generate data with a
moving camera. The motion causes the presence of background clut
ter in the event stream.
In such scenario the detection problem has been addressed with an at
tention mechanism, speci\ufb01cally designed to respond to the presence of
objects, while discarding clutter. The proposed implementation takes
advantage of the nature of the data to simplify the original proto
object saliency model which inspired this work.
Successively, the recognition task was \ufb01rst tackled with a feasibility
study to demonstrate that the event stream carries su\ufb03cient informa
tion to classify objects and then with the implementation of a spiking
neural network. The feasibility study provides the proof-of-concept
that events are informative enough in the context of object classi\ufb01
cation, whereas the spiking implementation improves the results by
employing an architecture speci\ufb01cally designed to process event data.
The spiking network was trained with a three-factor local learning rule
which overcomes weight transport, update locking and non-locality
problem.
The presented results prove that both detection and classi\ufb01cation can
be carried-out in the target application using the event data
Hardware-Algorithm Co-design Enabling Processing-in-Pixel-in-Memory (P2M) for Neuromorphic Vision Sensors
The high volume of data transmission between the edge sensor and the cloud
processor leads to energy and throughput bottlenecks for resource-constrained
edge devices focused on computer vision. Hence, researchers are investigating
different approaches (e.g., near-sensor processing, in-sensor processing,
in-pixel processing) by executing computations closer to the sensor to reduce
the transmission bandwidth. Specifically, in-pixel processing for neuromorphic
vision sensors (e.g., dynamic vision sensors (DVS)) involves incorporating
asynchronous multiply-accumulate (MAC) operations within the pixel array,
resulting in improved energy efficiency. In a CMOS implementation, low overhead
energy-efficient analog MAC accumulates charges on a passive capacitor;
however, the capacitor's limited charge retention time affects the algorithmic
integration time choices, impacting the algorithmic accuracy, bandwidth,
energy, and training efficiency. Consequently, this results in a design
trade-off on the hardware aspect-creating a need for a low-leakage compute unit
while maintaining the area and energy benefits. In this work, we present a
holistic analysis of the hardware-algorithm co-design trade-off based on the
limited integration time posed by the hardware and techniques to improve the
leakage performance of the in-pixel analog MAC operations.Comment: 6 pages, 4 figures, 1 tabl
Neuromorphic-P2M: Processing-in-Pixel-in-Memory Paradigm for Neuromorphic Image Sensors
Edge devices equipped with computer vision must deal with vast amounts of
sensory data with limited computing resources. Hence, researchers have been
exploring different energy-efficient solutions such as near-sensor processing,
in-sensor processing, and in-pixel processing, bringing the computation closer
to the sensor. In particular, in-pixel processing embeds the computation
capabilities inside the pixel array and achieves high energy efficiency by
generating low-level features instead of the raw data stream from CMOS image
sensors. Many different in-pixel processing techniques and approaches have been
demonstrated on conventional frame-based CMOS imagers, however, the
processing-in-pixel approach for neuromorphic vision sensors has not been
explored so far. In this work, we for the first time, propose an asynchronous
non-von-Neumann analog processing-in-pixel paradigm to perform convolution
operations by integrating in-situ multi-bit multi-channel convolution inside
the pixel array performing analog multiply and accumulate (MAC) operations that
consume significantly less energy than their digital MAC alternative. To make
this approach viable, we incorporate the circuit's non-ideality, leakage, and
process variations into a novel hardware-algorithm co-design framework that
leverages extensive HSpice simulations of our proposed circuit using the GF22nm
FD-SOI technology node. We verified our framework on state-of-the-art
neuromorphic vision sensor datasets and show that our solution consumes ~2x
lower backend-processor energy while maintaining almost similar front-end
(sensor) energy on the IBM DVS128-Gesture dataset than the state-of-the-art
while maintaining a high test accuracy of 88.36%.Comment: 17 pages, 11 figures, 2 table
- âŠ