69 research outputs found
Embracing the Unreliability of Memory Devices for Neuromorphic Computing
The emergence of resistive non-volatile memories opens the way to highly
energy-efficient computation near- or in-memory. However, this type of
computation is not compatible with conventional ECC, and has to deal with
device unreliability. Inspired by the architecture of animal brains, we present
a manufactured differential hybrid CMOS/RRAM memory architecture suitable for
neural network implementation that functions without formal ECC. We also show
that using low-energy but error-prone programming conditions only slightly
reduces network accuracy
Simulation and implementation of novel deep learning hardware architectures for resource constrained devices
Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
Binary Neural Networks in FPGAs: Architectures, Tool Flows and Hardware Comparisons.
Binary neural networks (BNNs) are variations of artificial/deep neural network (ANN/DNN) architectures that constrain the real values of weights to the binary set of numbers {-1,1}. By using binary values, BNNs can convert matrix multiplications into bitwise operations, which accelerates both training and inference and reduces hardware complexity and model sizes for implementation. Compared to traditional deep learning architectures, BNNs are a good choice for implementation in resource-constrained devices like FPGAs and ASICs. However, BNNs have the disadvantage of reduced performance and accuracy because of the tradeoff due to binarization. Over the years, this has attracted the attention of the research community to overcome the performance gap of BNNs, and several architectures have been proposed. In this paper, we provide a comprehensive review of BNNs for implementation in FPGA hardware. The survey covers different aspects, such as BNN architectures and variants, design and tool flows for FPGAs, and various applications for BNNs. The final part of the paper gives some benchmark works and design tools for implementing BNNs in FPGAs based on established datasets used by the research community
Spatial-SpinDrop: Spatial Dropout-based Binary Bayesian Neural Network with Spintronics Implementation
Recently, machine learning systems have gained prominence in real-time,
critical decision-making domains, such as autonomous driving and industrial
automation. Their implementations should avoid overconfident predictions
through uncertainty estimation. Bayesian Neural Networks (BayNNs) are
principled methods for estimating predictive uncertainty. However, their
computational costs and power consumption hinder their widespread deployment in
edge AI. Utilizing Dropout as an approximation of the posterior distribution,
binarizing the parameters of BayNNs, and further to that implementing them in
spintronics-based computation-in-memory (CiM) hardware arrays provide can be a
viable solution. However, designing hardware Dropout modules for convolutional
neural network (CNN) topologies is challenging and expensive, as they may
require numerous Dropout modules and need to use spatial information to drop
certain elements. In this paper, we introduce MC-SpatialDropout, a spatial
dropout-based approximate BayNNs with spintronics emerging devices. Our method
utilizes the inherent stochasticity of spintronic devices for efficient
implementation of the spatial dropout module compared to existing
implementations. Furthermore, the number of dropout modules per network layer
is reduced by a factor of and energy consumption by a factor of
, while still achieving comparable predictive performance and
uncertainty estimates compared to related works
Bio-inspired learning and hardware acceleration with emerging memories
Machine Learning has permeated many aspects of engineering, ranging from the Internet of Things (IoT) applications to big data analytics. While computing resources available to implement these algorithms have become more powerful, both in terms of the complexity of problems that can be solved and the overall computing speed, the huge energy costs involved remains a significant challenge. The human brain, which has evolved over millions of years, is widely accepted as the most efficient control and cognitive processing platform. Neuro-biological studies have established that information processing in the human brain relies on impulse like signals emitted by neurons called action potentials. Motivated by these facts, the Spiking Neural Networks (SNNs), which are a bio-plausible version of neural networks have been proposed as an alternative computing paradigm where the timing of spikes generated by artificial neurons is central to its learning and inference capabilities. This dissertation demonstrates the computational power of the SNNs using conventional CMOS and emerging nanoscale hardware platforms.
The first half of this dissertation presents an SNN architecture which is trained using a supervised spike-based learning algorithm for the handwritten digit classification problem. This network achieves an accuracy of 98.17% on the MNIST test data-set, with about 4X fewer parameters compared to the state-of-the-art neural networks achieving over 99% accuracy. In addition, a scheme for parallelizing and speeding up the SNN simulation on a GPU platform is presented. The second half of this dissertation presents an optimal hardware design for accelerating SNN inference and training with SRAM (Static Random Access Memory) and nanoscale non-volatile memory (NVM) crossbar arrays. Three prominent NVM devices are studied for realizing hardware accelerators for SNNs: Phase Change Memory (PCM), Spin Transfer Torque RAM (STT-RAM) and Resistive RAM (RRAM). The analysis shows that a spike-based inference engine with crossbar arrays of STT-RAM bit-cells is 2X and 5X more efficient compared to PCM and RRAM memories, respectively. Furthermore, the STT-RAM design has nearly 6X higher throughput per unit Watt per unit area than that of an equivalent SRAM-based (Static Random Access Memory) design. A hardware accelerator with on-chip learning on an STT-RAM memory array is also designed, requiring bits of floating-point synaptic weight precision to reach the baseline SNN algorithmic performance on the MNIST dataset. The complete design with STT-RAM crossbar array achieves nearly 20X higher throughput per unit Watt per unit mm^2 than an equivalent design with SRAM memory.
In summary, this work demonstrates the potential of spike-based neuromorphic computing algorithms and its efficient realization in hardware based on conventional CMOS as well as emerging technologies. The schemes presented here can be further extended to design spike-based systems that can be ubiquitously deployed for energy and memory constrained edge computing applications
Energy Efficient Learning with Low Resolution Stochastic Domain Wall Synapse Based Deep Neural Networks
We demonstrate that extremely low resolution quantized (nominally 5-state)
synapses with large stochastic variations in Domain Wall (DW) position can be
both energy efficient and achieve reasonably high testing accuracies compared
to Deep Neural Networks (DNNs) of similar sizes using floating precision
synaptic weights. Specifically, voltage controlled DW devices demonstrate
stochastic behavior as modeled rigorously with micromagnetic simulations and
can only encode limited states; however, they can be extremely energy efficient
during both training and inference. We show that by implementing suitable
modifications to the learning algorithms, we can address the stochastic
behavior as well as mitigate the effect of their low-resolution to achieve high
testing accuracies. In this study, we propose both in-situ and ex-situ training
algorithms, based on modification of the algorithm proposed by Hubara et al.
[1] which works well with quantization of synaptic weights. We train several
5-layer DNNs on MNIST dataset using 2-, 3- and 5-state DW device as synapse.
For in-situ training, a separate high precision memory unit is adopted to
preserve and accumulate the weight gradients, which are then quantized to
program the low precision DW devices. Moreover, a sizeable noise tolerance
margin is used during the training to address the intrinsic programming noise.
For ex-situ training, a precursor DNN is first trained based on the
characterized DW device model and a noise tolerance margin, which is similar to
the in-situ training. Remarkably, for in-situ inference the energy dissipation
to program the devices is only 13 pJ per inference given that the training is
performed over the entire MNIST dataset for 10 epochs
- …