273 research outputs found

    A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation

    Full text link
    The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption are required. To address these challenges, spin orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, none-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra energy efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system throughput at the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficiency and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area \& power) and throughput of SOT-MRAM PIM engine is significantly improved

    MRAM Co-designed Processing-in-Memory CNN Accelerator for Mobile and IoT Applications

    Full text link
    We designed a device for Convolution Neural Network applications with non-volatile MRAM memory and computing-in-memory co-designed architecture. It has been successfully fabricated using 22nm technology node CMOS Si process. More than 40MB MRAM density with 9.9TOPS/W are provided. It enables multiple models within one single chip for mobile and IoT device applications.Comment: 4 pages, 4 figures, 1 table. Accepted by NIPS 2018 MLPCD worksho

    PIMBALL: Binary Neural Networks in Spintronic Memory

    Full text link
    Neural networks span a wide range of applications of industrial and commercial significance. Binary neural networks (BNN) are particularly effective in trading accuracy for performance, energy efficiency or hardware/software complexity. Here, we introduce a spintronic, re-configurable in-memory BNN accelerator, PIMBALL: Processing In Memory BNN AcceL(L)erator, which allows for massively parallel and energy efficient computation. PIMBALL is capable of being used as a standard spintronic memory (STT-MRAM) array and a computational substrate simultaneously. We evaluate PIMBALL using multiple image classifiers and a genomics kernel. Our simulation results show that PIMBALL is more energy efficient than alternative CPU, GPU, and FPGA based implementations while delivering higher throughput

    Transfer and Online Reinforcement Learning in STT-MRAM Based Embedded Systems for Autonomous Drones

    Full text link
    In this paper we present an algorithm-hardware codesign for camera-based autonomous flight in small drones. We show that the large write-latency and write-energy for nonvolatile memory (NVM) based embedded systems makes them unsuitable for real-time reinforcement learning (RL). We address this by performing transfer learning (TL) on metaenvironments and RL on the last few layers of a deep convolutional network. While the NVM stores the meta-model from TL, an on-die SRAM stores the weights of the last few layers. Thus all the real-time updates via RL are carried out on the SRAM arrays. This provides us with a practical platform with comparable performance as end-to-end RL and 83.4% lower energy per image fram

    Reliable and Energy Efficient MLC STT-RAM Buffer for CNN Accelerators

    Get PDF
    We propose a lightweight scheme where the formation of a data block is changed in such a way that it can tolerate soft errors significantly better than the baseline. The key insight behind our work is that CNN weights are normalized between -1 and 1 after each convolutional layer, and this leaves one bit unused in half-precision floating-point representation. By taking advantage of the unused bit, we create a backup for the most significant bit to protect it against the soft errors. Also, considering the fact that in MLC STT-RAMs the cost of memory operations (read and write), and reliability of a cell are content-dependent (some patterns take larger current and longer time, while they are more susceptible to soft error), we rearrange the data block to minimize the number of costly bit patterns. Combining these two techniques provides the same level of accuracy compared to an error-free baseline while improving the read and write energy by 9% and 6%, respectively

    FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

    Full text link
    Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80% average sparsity.Comment: 14 page

    NAND-SPIN-Based Processing-in-MRAM Architecture for Convolutional Neural Network Acceleration

    Full text link
    The performance and efficiency of running large-scale datasets on traditional computing systems exhibit critical bottlenecks due to the existing "power wall" and "memory wall" problems. To resolve those problems, processing-in-memory (PIM) architectures are developed to bring computation logic in or near memory to alleviate the bandwidth limitations during data transmission. NAND-like spintronics memory (NAND-SPIN) is one kind of promising magnetoresistive random-access memory (MRAM) with low write energy and high integration density, and it can be employed to perform efficient in-memory computation operations. In this work, we propose a NAND-SPIN-based PIM architecture for efficient convolutional neural network (CNN) acceleration. A straightforward data mapping scheme is exploited to improve the parallelism while reducing data movements. Benefiting from the excellent characteristics of NAND-SPIN and in-memory processing architecture, experimental results show that the proposed approach can achieve ∼\sim2.6×\times speedup and ∼\sim1.4×\times improvement in energy efficiency over state-of-the-art PIM solutions.Comment: 15 pages, accepted by SCIENCE CHINA Information Sciences (SCIS) 202

    Stochastic Computing for Hardware Implementation of Binarized Neural Networks

    Full text link
    Binarized Neural Networks, a recently discovered class of neural networks with minimal memory requirements and no reliance on multiplication, are a fantastic opportunity for the realization of compact and energy efficient inference hardware. However, such neural networks are generally not entirely binarized: their first layer remains with fixed point input. In this work, we propose a stochastic computing version of Binarized Neural Networks, where the input is also binarized. Simulations on the example of the Fashion-MNIST and CIFAR-10 datasets show that such networks can approach the performance of conventional Binarized Neural Networks. We evidence that the training procedure should be adapted for use with stochastic computing. Finally, the ASIC implementation of our scheme is investigated, in a system that closely associates logic and memory, implemented by Spin Torque Magnetoresistive Random Access Memory. This analysis shows that the stochastic computing approach can allow considerable savings with regards to conventional Binarized Neural networks in terms of area (62% area reduction on the Fashion-MNIST task). It can also allow important savings in terms of energy consumption, if we accept reasonable reduction of accuracy: for example a factor 2.1 can be saved, with the cost of 1.4% in Fashion-MNIST test accuracy. These results highlight the high potential of Binarized Neural Networks for hardware implementation, and that adapting them to hardware constrains can provide important benefits

    Thermal Aware Design Automation of the Electronic Control System for Autonomous Vehicles

    Get PDF
    The autonomous vehicle (AV) technology, due to its tremendous social and economical benefits, is transforming the entire world in the coming decades. However, significant technical challenges still need to be overcome until AVs can be safely, reliably, and massively deployed. Temperature plays a key role in the safety and reliability of an AV, not only because a vehicle is subjected to extreme operating temperatures but also because the increasing computations demand more powerful IC chips, which can lead to higher operating temperature and large thermal gradient. In particular, as the underpinning technology for AV, artificial intelligence (AI) requires substantially increased computation and memory resources, which have been growing exponentially through recent years and further exacerbated the thermal problems. High operating temperature and large thermal gradient can reduce the performance, degrade the reliability, and even cause an IC to fail catastrophically. We believe that dealing with thermal issues must be coupled closely in the design phase of the AVs’ electronic control system (ECS). To this end, first, we study how to map vehicle applications to ECS with heterogeneous architecture to satisfy peak temperature constraints and optimize latency and system-level reliability. We present a mathematical programming model to bound the peak temperature for the ECS. We also develop an approach based on the genetic algorithm to bound the peak temperature under varying execution time scenarios and optimize the system-level reliability of the ECS. We present several computationally efficient techniques for system-level mean-time-to-failure (MTTF) computation, which show several orders-of-magnitude speed-up over the state-of-the-art method. Second, we focus on studying the thermal impacts of AI techniques. Specifically, we study how the thermal impacts for the memory bit flipping can affect the prediction accuracy of a deep neural network (DNN). We develop a neuron-level analytical sensitivity estimation framework to quantify this impact and study its effectiveness with popular DNN architectures. Third, we study the problem of incorporating thermal impacts into mapping the parameters for DNN neurons to memory banks to improve prediction accuracy. Based on our developed sensitivity metric, we develop a bin-packing-based approach to map DNN neuron parameters to memory banks with different temperature profiles. We also study the problem of identifying the optimal temperature profiles for memory systems that can minimize the thermal impacts. We show that the thermal aware mapping of DNN neuron parameters on memory banks can significantly improve the prediction accuracy at a high-temperature range than the thermal ignorant for state-of-the-art DNNs

    Electrically-Tunable Stochasticity for Spin-based Neuromorphic Circuits: Self-Adjusting to Variation

    Full text link
    Energy-efficient methods are addressed for leveraging low energy barrier nanomagnetic devices within neuromorphic architectures. Using a Magnetoresistive Random Access Memory (MRAM) probabilistic device (p-bit) as the basis of neuronal structures in Deep Belief Networks (DBNs), the impact of reducing the Magnetic Tunnel Junction's (MTJ's) energy barrier is assessed and optimized for the resulting stochasticity present in the learning system. This can mitigate the process variation sensitivity of stochastic DBNs which encounter a sharp drop-off when energy barriers exceed near-zero kT. As evaluated for the MNIST dataset for energy barriers at near-zero kT to 2.0 kT in increments of 0.5 kT, it is shown that the stability factor changes by 5 orders of magnitude. The self-compensating circuit developed herein provides a compact, and low complexity approach to mitigating process variation impacts towards practical implementation and fabrication
    • …
    corecore