123 research outputs found
Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System
Emulating spiking neural networks on analog neuromorphic hardware offers
several advantages over simulating them on conventional computers, particularly
in terms of speed and energy consumption. However, this usually comes at the
cost of reduced control over the dynamics of the emulated networks. In this
paper, we demonstrate how iterative training of a hardware-emulated network can
compensate for anomalies induced by the analog substrate. We first convert a
deep neural network trained in software to a spiking network on the BrainScaleS
wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10
000 compared to the biological time domain. This mapping is followed by the
in-the-loop training, where in each training step, the network activity is
first recorded in hardware and then used to compute the parameter updates in
software via backpropagation. An essential finding is that the parameter
updates do not have to be precise, but only need to approximately follow the
correct gradient, which simplifies the computation of updates. Using this
approach, after only several tens of iterations, the spiking network shows an
accuracy close to the ideal software-emulated prototype. The presented
techniques show that deep spiking networks emulated on analog neuromorphic
devices can attain good computational performance despite the inherent
variations of the analog substrate.Comment: 8 pages, 10 figures, submitted to IJCNN 201
Compact Functional Testing for Neuromorphic Computing Circuits
We address the problem of testing artificial intelligence (AI) hardware accelerators implementing spiking neural networks (SNNs). We define a metric to quickly rank available samples for training and testing based on their fault detection capability. The metric measures the interclass spike count difference of a sample for the fault-free design. In particular, each sample is assigned a score equal to the spike count difference between the first two top classes. The hypothesis is that samples with small scores achieve high fault coverage because they are prone to misclassification, i.e., a small perturbation in the network due to a fault will result in these samples being misclassified with high probability. We show that the proposed metric correlates with the per-sample fault coverage and that retaining a set of high-ranked samples in the order of ten achieves near-perfect fault coverage for critical faults that affect the SNN accuracy. The proposed test generation approach is demonstrated on two SNNs modeled in Python and on actual neuromorphic hardware. We discuss fault modeling and perform an analysis to reduce the fault space so as to speed up test generation time. © 1982-2012 IEEE
Bio-inspired learning and hardware acceleration with emerging memories
Machine Learning has permeated many aspects of engineering, ranging from the Internet of Things (IoT) applications to big data analytics. While computing resources available to implement these algorithms have become more powerful, both in terms of the complexity of problems that can be solved and the overall computing speed, the huge energy costs involved remains a significant challenge. The human brain, which has evolved over millions of years, is widely accepted as the most efficient control and cognitive processing platform. Neuro-biological studies have established that information processing in the human brain relies on impulse like signals emitted by neurons called action potentials. Motivated by these facts, the Spiking Neural Networks (SNNs), which are a bio-plausible version of neural networks have been proposed as an alternative computing paradigm where the timing of spikes generated by artificial neurons is central to its learning and inference capabilities. This dissertation demonstrates the computational power of the SNNs using conventional CMOS and emerging nanoscale hardware platforms.
The first half of this dissertation presents an SNN architecture which is trained using a supervised spike-based learning algorithm for the handwritten digit classification problem. This network achieves an accuracy of 98.17% on the MNIST test data-set, with about 4X fewer parameters compared to the state-of-the-art neural networks achieving over 99% accuracy. In addition, a scheme for parallelizing and speeding up the SNN simulation on a GPU platform is presented. The second half of this dissertation presents an optimal hardware design for accelerating SNN inference and training with SRAM (Static Random Access Memory) and nanoscale non-volatile memory (NVM) crossbar arrays. Three prominent NVM devices are studied for realizing hardware accelerators for SNNs: Phase Change Memory (PCM), Spin Transfer Torque RAM (STT-RAM) and Resistive RAM (RRAM). The analysis shows that a spike-based inference engine with crossbar arrays of STT-RAM bit-cells is 2X and 5X more efficient compared to PCM and RRAM memories, respectively. Furthermore, the STT-RAM design has nearly 6X higher throughput per unit Watt per unit area than that of an equivalent SRAM-based (Static Random Access Memory) design. A hardware accelerator with on-chip learning on an STT-RAM memory array is also designed, requiring bits of floating-point synaptic weight precision to reach the baseline SNN algorithmic performance on the MNIST dataset. The complete design with STT-RAM crossbar array achieves nearly 20X higher throughput per unit Watt per unit mm^2 than an equivalent design with SRAM memory.
In summary, this work demonstrates the potential of spike-based neuromorphic computing algorithms and its efficient realization in hardware based on conventional CMOS as well as emerging technologies. The schemes presented here can be further extended to design spike-based systems that can be ubiquitously deployed for energy and memory constrained edge computing applications
Assessment and Improvement of the Pattern Recognition Performance of Memdiode-Based Cross-Point Arrays with Randomly Distributed Stuck-at-Faults
In this work, the effect of randomly distributed stuck-at faults (SAFs) in memristive crosspoint array (CPA)-based single and multi-layer perceptrons (SLPs and MLPs, respectively) intended for pattern recognition tasks is investigated by means of realistic SPICE simulations. The quasi-static memdiode model (QMM) is considered here for the modelling of the synaptic weights implemented with memristors. Following the standard memristive approach, the QMM comprises two coupled equations, one for the electron transport based on the double-diode equation with a single series resistance and a second equation for the internal memory state of the device based on the so-called logistic hysteron. By modifying the state parameter in the current-voltage characteristic, SAFs of different severeness are simulated and the final outcome is analysed. Supervised ex-situ training and two well-known image datasets involving hand-written digits and human faces are employed to assess the inference accuracy of the SLP as a function of the faulty device ratio. The roles played by the memristor’s electrical parameters, line resistance, mapping strategy, image pixelation, and fault type (stuck-at-ON or stuck-at-OFF) on the CPA performance are statistically analysed following a Monte-Carlo approach. Three different re-mapping schemes to help mitigate the effect of the SAFs in the SLP inference phase are thoroughly investigated.In this work, the effect of randomly distributed stuck-at faults (SAFs) in memristive cross-point array (CPA)-based single and multi-layer perceptrons (SLPs and MLPs, respectively) intended for pattern recognition tasks is investigated by means of realistic SPICE simulations. The quasi-static memdiode model (QMM) is considered here for the modelling of the synaptic weights implemented with memristors. Following the standard memristive approach, the QMM comprises two coupled equations, one for the electron transport based on the double-diode equation with a single series resistance and a second equation for the internal memory state of the device based on the so-called logistic hysteron. By modifying the state parameter in the current-voltage characteristic, SAFs of different severeness are simulated and the final outcome is analysed. Supervised ex-situ training and two well-known image datasets involving hand-written digits and human faces are employed to assess the inference accuracy of the SLP as a function of the faulty device ratio. The roles played by the memristor?s electrical parameters, line resistance, mapping strategy, image pixelation, and fault type (stuck-at-ON or stuck-at-OFF) on the CPA performance are statistically analysed following a Monte-Carlo approach. Three different re-mapping schemes to help mitigate the effect of the SAFs in the SLP inference phase are thoroughly investigated.Fil: Aguirre, Fernando Leonel. Universidad Tecnológica Nacional. Facultad Regional Buenos Aires. Unidad de Investigación y Desarrollo de las IngenierÃas; Argentina. Universitat Autònoma de Barcelona; España. Consejo Nacional de Investigaciones CientÃficas y Técnicas; ArgentinaFil: Pazos, Sebastián MatÃas. Universidad Tecnológica Nacional. Facultad Regional Buenos Aires. Unidad de Investigación y Desarrollo de las IngenierÃas; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas; ArgentinaFil: Palumbo, Félix Roberto Mario. Universidad Tecnológica Nacional. Facultad Regional Buenos Aires. Unidad de Investigación y Desarrollo de las IngenierÃas; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas; ArgentinaFil: Morell, Antoni. Universitat Autònoma de Barcelona; EspañaFil: Suñé, Jordi. Universitat Autònoma de Barcelona; EspañaFil: Miranda, Enrique. Universitat Autònoma de Barcelona; Españ
A Method for Image Classification Using Low-Precision Analog Computing Arrays
Computing with analog micro electronics can offer several advantages over standard digital technology, most notably: Low space and power consumption and massive parallelization. On the other hand, analog computation lacks the exactness of digital calculations due to inevitable device variations introduced during the chip production, but also due to electric noise in the analog signals. Artificial neural networks are well suited for parallel analog implementations, first, because of their inherent parallelity and second, because they can adapt to device imperfections by training. This thesis evaluates the feasibility of implementing a convolutional neural network for image classification on a massively parallel low-power hardware system. A particular, mixed analogdigital, hardware model is considered, featuring simple threshold neurons. Appropriate, gradient-free, training algorithms, combining self-organization and supervised learning are developed and tested with two benchmark problems (MNIST hand-written digits and traffic signs). Software simulations evaluate the methods under various defined computation faults. A model-free closed-loop technique is shown to compensate for rather serious computation errors without the need for explicit error quantification. Last but not least, the developed networks and the training techniques are verified on a real prototype chip
RescueSNN: Enabling Reliable Executions on Spiking Neural Network Accelerators under Permanent Faults
To maximize the performance and energy efficiency of Spiking Neural Network
(SNN) processing on resource-constrained embedded systems, specialized hardware
accelerators/chips are employed. However, these SNN chips may suffer from
permanent faults which can affect the functionality of weight memory and neuron
behavior, thereby causing potentially significant accuracy degradation and
system malfunctioning. Such permanent faults may come from manufacturing
defects during the fabrication process, and/or from device/transistor damages
(e.g., due to wear out) during the run-time operation. However, the impact of
permanent faults in SNN chips and the respective mitigation techniques have not
been thoroughly investigated yet. Toward this, we propose RescueSNN, a novel
methodology to mitigate permanent faults in the compute engine of SNN chips
without requiring additional retraining, thereby significantly cutting down the
design time and retraining costs, while maintaining the throughput and quality.
The key ideas of our RescueSNN methodology are (1) analyzing the
characteristics of SNN under permanent faults; (2) leveraging this analysis to
improve the SNN fault-tolerance through effective fault-aware mapping (FAM);
and (3) devising lightweight hardware enhancements to support FAM. Our FAM
technique leverages the fault map of SNN compute engine for (i) minimizing
weight corruption when mapping weight bits on the faulty memory cells, and (ii)
selectively employing faulty neurons that do not cause significant accuracy
degradation to maintain accuracy and throughput, while considering the SNN
operations and processing dataflow. The experimental results show that our
RescueSNN improves accuracy by up to 80% while maintaining the throughput
reduction below 25% in high fault rate (e.g., 0.5 of the potential fault
locations), as compared to running SNNs on the faulty chip without mitigation.Comment: Accepted for publication at Frontiers in Neuroscience - Section
Neuromorphic Engineerin
- …