69 research outputs found

    Embracing the Unreliability of Memory Devices for Neuromorphic Computing

    Full text link
    The emergence of resistive non-volatile memories opens the way to highly energy-efficient computation near- or in-memory. However, this type of computation is not compatible with conventional ECC, and has to deal with device unreliability. Inspired by the architecture of animal brains, we present a manufactured differential hybrid CMOS/RRAM memory architecture suitable for neural network implementation that functions without formal ECC. We also show that using low-energy but error-prone programming conditions only slightly reduces network accuracy

    Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

    Get PDF
    Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

    Binary Neural Networks in FPGAs: Architectures, Tool Flows and Hardware Comparisons.

    Get PDF
    Binary neural networks (BNNs) are variations of artificial/deep neural network (ANN/DNN) architectures that constrain the real values of weights to the binary set of numbers {-1,1}. By using binary values, BNNs can convert matrix multiplications into bitwise operations, which accelerates both training and inference and reduces hardware complexity and model sizes for implementation. Compared to traditional deep learning architectures, BNNs are a good choice for implementation in resource-constrained devices like FPGAs and ASICs. However, BNNs have the disadvantage of reduced performance and accuracy because of the tradeoff due to binarization. Over the years, this has attracted the attention of the research community to overcome the performance gap of BNNs, and several architectures have been proposed. In this paper, we provide a comprehensive review of BNNs for implementation in FPGA hardware. The survey covers different aspects, such as BNN architectures and variants, design and tool flows for FPGAs, and various applications for BNNs. The final part of the paper gives some benchmark works and design tools for implementing BNNs in FPGAs based on established datasets used by the research community

    Spatial-SpinDrop: Spatial Dropout-based Binary Bayesian Neural Network with Spintronics Implementation

    Full text link
    Recently, machine learning systems have gained prominence in real-time, critical decision-making domains, such as autonomous driving and industrial automation. Their implementations should avoid overconfident predictions through uncertainty estimation. Bayesian Neural Networks (BayNNs) are principled methods for estimating predictive uncertainty. However, their computational costs and power consumption hinder their widespread deployment in edge AI. Utilizing Dropout as an approximation of the posterior distribution, binarizing the parameters of BayNNs, and further to that implementing them in spintronics-based computation-in-memory (CiM) hardware arrays provide can be a viable solution. However, designing hardware Dropout modules for convolutional neural network (CNN) topologies is challenging and expensive, as they may require numerous Dropout modules and need to use spatial information to drop certain elements. In this paper, we introduce MC-SpatialDropout, a spatial dropout-based approximate BayNNs with spintronics emerging devices. Our method utilizes the inherent stochasticity of spintronic devices for efficient implementation of the spatial dropout module compared to existing implementations. Furthermore, the number of dropout modules per network layer is reduced by a factor of 9×9\times and energy consumption by a factor of 94.11×94.11\times, while still achieving comparable predictive performance and uncertainty estimates compared to related works

    Bio-inspired learning and hardware acceleration with emerging memories

    Get PDF
    Machine Learning has permeated many aspects of engineering, ranging from the Internet of Things (IoT) applications to big data analytics. While computing resources available to implement these algorithms have become more powerful, both in terms of the complexity of problems that can be solved and the overall computing speed, the huge energy costs involved remains a significant challenge. The human brain, which has evolved over millions of years, is widely accepted as the most efficient control and cognitive processing platform. Neuro-biological studies have established that information processing in the human brain relies on impulse like signals emitted by neurons called action potentials. Motivated by these facts, the Spiking Neural Networks (SNNs), which are a bio-plausible version of neural networks have been proposed as an alternative computing paradigm where the timing of spikes generated by artificial neurons is central to its learning and inference capabilities. This dissertation demonstrates the computational power of the SNNs using conventional CMOS and emerging nanoscale hardware platforms. The first half of this dissertation presents an SNN architecture which is trained using a supervised spike-based learning algorithm for the handwritten digit classification problem. This network achieves an accuracy of 98.17% on the MNIST test data-set, with about 4X fewer parameters compared to the state-of-the-art neural networks achieving over 99% accuracy. In addition, a scheme for parallelizing and speeding up the SNN simulation on a GPU platform is presented. The second half of this dissertation presents an optimal hardware design for accelerating SNN inference and training with SRAM (Static Random Access Memory) and nanoscale non-volatile memory (NVM) crossbar arrays. Three prominent NVM devices are studied for realizing hardware accelerators for SNNs: Phase Change Memory (PCM), Spin Transfer Torque RAM (STT-RAM) and Resistive RAM (RRAM). The analysis shows that a spike-based inference engine with crossbar arrays of STT-RAM bit-cells is 2X and 5X more efficient compared to PCM and RRAM memories, respectively. Furthermore, the STT-RAM design has nearly 6X higher throughput per unit Watt per unit area than that of an equivalent SRAM-based (Static Random Access Memory) design. A hardware accelerator with on-chip learning on an STT-RAM memory array is also designed, requiring 1616 bits of floating-point synaptic weight precision to reach the baseline SNN algorithmic performance on the MNIST dataset. The complete design with STT-RAM crossbar array achieves nearly 20X higher throughput per unit Watt per unit mm^2 than an equivalent design with SRAM memory. In summary, this work demonstrates the potential of spike-based neuromorphic computing algorithms and its efficient realization in hardware based on conventional CMOS as well as emerging technologies. The schemes presented here can be further extended to design spike-based systems that can be ubiquitously deployed for energy and memory constrained edge computing applications

    Energy Efficient Learning with Low Resolution Stochastic Domain Wall Synapse Based Deep Neural Networks

    Full text link
    We demonstrate that extremely low resolution quantized (nominally 5-state) synapses with large stochastic variations in Domain Wall (DW) position can be both energy efficient and achieve reasonably high testing accuracies compared to Deep Neural Networks (DNNs) of similar sizes using floating precision synaptic weights. Specifically, voltage controlled DW devices demonstrate stochastic behavior as modeled rigorously with micromagnetic simulations and can only encode limited states; however, they can be extremely energy efficient during both training and inference. We show that by implementing suitable modifications to the learning algorithms, we can address the stochastic behavior as well as mitigate the effect of their low-resolution to achieve high testing accuracies. In this study, we propose both in-situ and ex-situ training algorithms, based on modification of the algorithm proposed by Hubara et al. [1] which works well with quantization of synaptic weights. We train several 5-layer DNNs on MNIST dataset using 2-, 3- and 5-state DW device as synapse. For in-situ training, a separate high precision memory unit is adopted to preserve and accumulate the weight gradients, which are then quantized to program the low precision DW devices. Moreover, a sizeable noise tolerance margin is used during the training to address the intrinsic programming noise. For ex-situ training, a precursor DNN is first trained based on the characterized DW device model and a noise tolerance margin, which is similar to the in-situ training. Remarkably, for in-situ inference the energy dissipation to program the devices is only 13 pJ per inference given that the training is performed over the entire MNIST dataset for 10 epochs
    • …
    corecore