12,442 research outputs found
Dynamic Power Management for Neuromorphic Many-Core Systems
This work presents a dynamic power management architecture for neuromorphic
many core systems such as SpiNNaker. A fast dynamic voltage and frequency
scaling (DVFS) technique is presented which allows the processing elements (PE)
to change their supply voltage and clock frequency individually and
autonomously within less than 100 ns. This is employed by the neuromorphic
simulation software flow, which defines the performance level (PL) of the PE
based on the actual workload within each simulation cycle. A test chip in 28 nm
SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled
from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct
PLs. By measurement of three neuromorphic benchmarks it is shown that the total
PE power consumption can be reduced by 75%, with 80% baseline power reduction
and a 50% reduction of energy per neuron and synapse computation, all while
maintaining temporary peak system performance to achieve biological real-time
operation of the system. A numerical model of this power management model is
derived which allows DVFS architecture exploration for neuromorphics. The
proposed technique is to be used for the second generation SpiNNaker
neuromorphic many core system
Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
Neuromorphic computing systems comprise networks of neurons that use
asynchronous events for both computation and communication. This type of
representation offers several advantages in terms of bandwidth and power
consumption in neuromorphic electronic systems. However, managing the traffic
of asynchronous events in large scale systems is a daunting task, both in terms
of circuit complexity and memory requirements. Here we present a novel routing
methodology that employs both hierarchical and mesh routing strategies and
combines heterogeneous memory structures for minimizing both memory
requirements and latency, while maximizing programming flexibility to support a
wide range of event-based neural network architectures, through parameter
configuration. We validated the proposed scheme in a prototype multi-core
neuromorphic processor chip that employs hybrid analog/digital circuits for
emulating synapse and neuron dynamics together with asynchronous digital
circuits for managing the address-event traffic. We present a theoretical
analysis of the proposed connectivity scheme, describe the methods and circuits
used to implement such scheme, and characterize the prototype chip. Finally, we
demonstrate the use of the neuromorphic processor with a convolutional neural
network for the real-time classification of visual symbols being flashed to a
dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure
Serialized Asynchronous Links for NoC
This paper proposes an asynchronous serialized link for NoC that can achieve the same levels of performance in terms of flits per second as a synchronous link but with a reduced number of wires in the point to point switch links and reduced power consumption. This is achieved by employing serialization in the asynchronous domain as opposed to synchronous to facilitate the removal of global clocking on the serial links. Based on transistor level simulations using 0.12 ?m foundry models it has been shown that it is possible to achieve the same level of performance as synchronous but with 75% reduction in wires and 65% reduction in power for a 300 MFlit/s link with 8 buffers with a switch clock speed of 300 MHz. Furthermore the paper presents the design requirements arising from interfacing switches of synchronous NoC and asynchronous serial links
Scaling Deep Learning on GPU and Knights Landing clusters
The speed of deep neural networks training has become a big bottleneck of
deep learning research and development. For example, training GoogleNet by
ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training
process, the current deep learning systems heavily rely on the hardware
accelerators. However, these accelerators have limited on-chip memory compared
with CPUs. To handle large datasets, they need to fetch data from either CPU
memory or remote processors. We use both self-hosted Intel Knights Landing
(KNL) clusters and multi-GPU clusters as our target platforms. From an
algorithm aspect, current distributed machine learning systems are mainly
designed for cloud systems. These methods are asynchronous because of the slow
network and high fault-tolerance requirement on cloud systems. We focus on
Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original
EASGD used round-robin method for communication and updating. The communication
is ordered by the machine rank ID, which is inefficient on HPC clusters.
First, we redesign four efficient algorithms for HPC systems to improve
EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD
are faster \textcolor{black}{than} their existing counterparts (Async SGD,
Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design
Sync EASGD, which ties for the best performance among all the methods while
being deterministic. In addition to the algorithmic improvements, we use some
system-algorithm codesign techniques to scale up the algorithms. By reducing
the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x
speedup over original EASGD on the same platform. We get 91.5% weak scaling
efficiency on 4253 KNL cores, which is higher than the state-of-the-art
implementation
MorphIC: A 65-nm 738k-Synapse/mm Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning
Recent trends in the field of neural network accelerators investigate weight
quantization as a means to increase the resource- and power-efficiency of
hardware devices. As full on-chip weight storage is necessary to avoid the high
energy cost of off-chip memory accesses, memory reduction requirements for
weight storage pushed toward the use of binary weights, which were demonstrated
to have a limited accuracy reduction on many applications when
quantization-aware training techniques are used. In parallel, spiking neural
network (SNN) architectures are explored to further reduce power when
processing sparse event-based data streams, while on-chip spike-based online
learning appears as a key feature for applications constrained in power and
resources during the training phase. However, designing power- and
area-efficient spiking neural networks still requires the development of
specific techniques in order to leverage on-chip online learning on binary
weights without compromising the synapse density. In this work, we demonstrate
MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a
stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning
rule and a hierarchical routing fabric for large-scale chip interconnection.
The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF)
neurons and more than two million plastic synapses for an active silicon area
of 2.86mm in 65nm CMOS, achieving a high density of 738k synapses/mm.
MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy
tradeoff on the MNIST classification task compared to previously-proposed SNNs,
while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE
Transactions on Biomedical Circuits and Systems journal (2019), the
fully-edited paper is available at
https://ieeexplore.ieee.org/document/876400
A CMOS Spiking Neuron for Brain-Inspired Neural Networks with Resistive Synapses and In-Situ Learning
Nanoscale resistive memories are expected to fuel dense integration of
electronic synapses for large-scale neuromorphic system. To realize such a
brain-inspired computing chip, a compact CMOS spiking neuron that performs
in-situ learning and computing while driving a large number of resistive
synapses is desired. This work presents a novel leaky integrate-and-fire neuron
design which implements the dual-mode operation of current integration and
synaptic drive, with a single opamp and enables in-situ learning with crossbar
resistive synapses. The proposed design was implemented in a 0.18 m CMOS
technology. Measurements show neuron's ability to drive a thousand resistive
synapses, and demonstrate an in-situ associative learning. The neuron circuit
occupies a small area of 0.01 mm and has an energy-efficiency of 9.3
pJspikesynapse
- …