121 research outputs found
MorphIC: A 65-nm 738k-Synapse/mm Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning
Recent trends in the field of neural network accelerators investigate weight
quantization as a means to increase the resource- and power-efficiency of
hardware devices. As full on-chip weight storage is necessary to avoid the high
energy cost of off-chip memory accesses, memory reduction requirements for
weight storage pushed toward the use of binary weights, which were demonstrated
to have a limited accuracy reduction on many applications when
quantization-aware training techniques are used. In parallel, spiking neural
network (SNN) architectures are explored to further reduce power when
processing sparse event-based data streams, while on-chip spike-based online
learning appears as a key feature for applications constrained in power and
resources during the training phase. However, designing power- and
area-efficient spiking neural networks still requires the development of
specific techniques in order to leverage on-chip online learning on binary
weights without compromising the synapse density. In this work, we demonstrate
MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a
stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning
rule and a hierarchical routing fabric for large-scale chip interconnection.
The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF)
neurons and more than two million plastic synapses for an active silicon area
of 2.86mm in 65nm CMOS, achieving a high density of 738k synapses/mm.
MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy
tradeoff on the MNIST classification task compared to previously-proposed SNNs,
while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE
Transactions on Biomedical Circuits and Systems journal (2019), the
fully-edited paper is available at
https://ieeexplore.ieee.org/document/876400
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
Neural Network (NN) accelerators with emerging ReRAM (resistive random access
memory) technologies have been investigated as one of the promising solutions
to address the \textit{memory wall} challenge, due to the unique capability of
\textit{processing-in-memory} within ReRAM-crossbar-based processing elements
(PEs). However, the high efficiency and high density advantages of ReRAM have
not been fully utilized due to the huge communication demands among PEs and the
overhead of peripheral circuits.
In this paper, we propose a full system stack solution, composed of a
reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and
its software system including neural synthesizer, temporal-to-spatial mapper,
and placement & routing. We highly leverage the software system to make the
hardware design compact and efficient. To satisfy the high-performance
communication demand, we optimize it with a reconfigurable routing architecture
and the placement & routing tool. To improve the computational density, we
greatly simplify the PE circuit with the spiking schema and then adopt neural
synthesizer to enable the high density computation-resources to support
different kinds of NN operations. In addition, we provide spiking memory blocks
(SMBs) and configurable logic blocks (CLBs) in hardware and leverage the
temporal-to-spatial mapper to utilize them to balance the storage and
computation requirements of NN. Owing to the end-to-end software system, we can
efficiently deploy existing deep neural networks to FPSA. Evaluations show
that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME,
the computational density of FPSA improves by 31x; for representative NNs, its
inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201
Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning
Cortical plasticity is one of the main features that enable our ability to
learn and adapt in our environment. Indeed, the cerebral cortex self-organizes
itself through structural and synaptic plasticity mechanisms that are very
likely at the basis of an extremely interesting characteristic of the human
brain development: the multimodal association. In spite of the diversity of the
sensory modalities, like sight, sound and touch, the brain arrives at the same
concepts (convergence). Moreover, biological observations show that one
modality can activate the internal representation of another modality when both
are correlated (divergence). In this work, we propose the Reentrant
Self-Organizing Map (ReSOM), a brain-inspired neural system based on the
reentry theory using Self-Organizing Maps and Hebbian-like learning. We propose
and compare different computational methods for unsupervised learning and
inference, then quantify the gain of the ReSOM in a multimodal classification
task. The divergence mechanism is used to label one modality based on the
other, while the convergence mechanism is used to improve the overall accuracy
of the system. We perform our experiments on a constructed written/spoken
digits database and a DVS/EMG hand gestures database. The proposed model is
implemented on a cellular neuromorphic architecture that enables distributed
computing with local connectivity. We show the gain of the so-called hardware
plasticity induced by the ReSOM, where the system's topology is not fixed by
the user but learned along the system's experience through self-organization.Comment: Preprin
ColibriES: A Milliwatts RISC-V Based Embedded System Leveraging Neuromorphic and Neural Networks Hardware Accelerators for Low-Latency Closed-loop Control Applications
End-to-end event-based computation has the potential to push the envelope in
latency and energy efficiency for edge AI applications. Unfortunately,
event-based sensors (e.g., DVS cameras) and neuromorphic spike-based processors
(e.g., Loihi) have been designed in a decoupled fashion, thereby missing major
streamlining opportunities. This paper presents ColibriES, the first-ever
neuromorphic hardware embedded system platform with dedicated event-sensor
interfaces and full processing pipelines. ColibriES includes event and frame
interfaces and data processing, aiming at efficient and long-life embedded
systems in edge scenarios. ColibriES is based on the Kraken system-on-chip and
contains a heterogeneous parallel ultra-low power (PULP) processor, frame-based
and event-based camera interfaces, and two hardware accelerators for the
computation of both event-based spiking neural networks and frame-based ternary
convolutional neural networks. This paper explores and accurately evaluates the
performance of event data processing on the example of gesture recognition on
ColibriES, as the first step of full-system evaluation. In our experiments, we
demonstrate a chip energy consumption of 7.7 \si{\milli\joule} and latency of
164.5 \si{\milli\second} of each inference with the DVS Gesture event data set
as an example for closed-loop data processing, showcasing the potential of
ColibriES for battery-powered applications such as wearable devices and UAVs
that require low-latency closed-loop control
- …