Search CORE

448 research outputs found

Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators

Author: Kim Hyungjun
Kim Jae-Joon
Kim Yulhwa
Publication venue
Publication date: 14/04/2019
Field of study

Recently, RRAM-based Binary Neural Network (BNN) hardware has been gaining interests as it requires 1-bit sense-amp only and eliminates the need for high-resolution ADC and DAC. However, RRAM-based BNN hardware still requires high-resolution ADC for partial sum calculation to implement large-scale neural network using multiple memory arrays. We propose a neural network-hardware co-design approach to split input to fit each split network on a RRAM array so that the reconstructed BNNs calculate 1-bit output neuron in each array. As a result, ADC can be completely eliminated from the design even for large-scale neural network. Simulation results show that the proposed network reconstruction and retraining recovers the inference accuracy of the original BNN. The accuracy loss of the proposed scheme in the CIFAR-10 testcase was less than 1.1% compared to the original network. The code for training and running proposed BNN models is available at: https://github.com/YulhwaKim/RRAMScalable_BNN

arXiv.org e-Print Archive

PIMBALL: Binary Neural Networks in Spintronic Memory

Author: Chowdhury Zamshed Iqbal
Karpuzcu Ulya R.
Khatamifard S. Karen
Resch Salonik
Sapatnekar Sachin S.
Wang Jian-Ping
Zabihi Masoud
Zhao Zhengyang
Publication venue
Publication date: 20/08/2019
Field of study

Neural networks span a wide range of applications of industrial and commercial significance. Binary neural networks (BNN) are particularly effective in trading accuracy for performance, energy efficiency or hardware/software complexity. Here, we introduce a spintronic, re-configurable in-memory BNN accelerator, PIMBALL: Processing In Memory BNN AcceL(L)erator, which allows for massively parallel and energy efficient computation. PIMBALL is capable of being used as a standard spintronic memory (STT-MRAM) array and a computational substrate simultaneously. We evaluate PIMBALL using multiple image classifiers and a genomics kernel. Our simulation results show that PIMBALL is more energy efficient than alternative CPU, GPU, and FPGA based implementations while delivering higher throughput

arXiv.org e-Print Archive

Dendritic-Inspired Processing Enables Bio-Plausible STDP in Compound Binary Synapses

Author: Saxena Vishal
Wu Xinyu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/01/2018
Field of study

Brain-inspired learning mechanisms, e.g. spike timing dependent plasticity (STDP), enable agile and fast on-the-fly adaptation capability in a spiking neural network. When incorporating emerging nanoscale resistive non-volatile memory (NVM) devices, with ultra-low power consumption and high-density integration capability, a spiking neural network hardware would result in several orders of magnitude reduction in energy consumption at a very small form factor and potentially herald autonomous learning machines. However, actual memory devices have shown to be intrinsically binary with stochastic switching, and thus impede the realization of ideal STDP with continuous analog values. In this work, a dendritic-inspired processing architecture is proposed in addition to novel CMOS neuron circuits. The utilization of spike attenuations and delays transforms the traditionally undesired stochastic behavior of binary NVMs into a useful leverage that enables biologically-plausible STDP learning. As a result, this work paves a pathway to adopt practical binary emerging NVM devices in brain-inspired neuromorphic computing

arXiv.org e-Print Archive

Exploring the Connection Between Binary and Spiking Neural Networks

Author: Lu Sen
Sengupta Abhronil
Publication venue
Publication date: 21/05/2020
Field of study

On-chip edge intelligence has necessitated the exploration of algorithmic techniques to reduce the compute requirements of current machine learning frameworks. This work aims to bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks - both of which are driven by the same motivation and yet synergies between the two have not been fully explored. We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on large-scale datasets like CIFAR-

100

and ImageNet. An important implication of this work is that Binary Spiking Neural Networks can be enabled by "In-Memory" hardware accelerators catered for Binary Neural Networks without suffering any accuracy degradation due to binarization. We utilize standard training techniques for non-spiking networks to generate our spiking networks by conversion process and also perform an extensive empirical analysis and explore simple design-time and run-time optimization techniques for reducing inference latency of spiking networks (both for binary and full-precision models) by an order of magnitude over prior work

arXiv.org e-Print Archive

Device and System Level Design Considerations for Analog-Non-Volatile-Memory Based Neuromorphic Architectures

Author: Eryilmaz Sukru Burc
Kuzum Duygu
Wong H. -S. Philip
Yu Shimeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/05/2016
Field of study

This paper gives an overview of recent progress in the brain inspired computing field with a focus on implementation using emerging memories as electronic synapses. Design considerations and challenges such as requirements and design targets on multilevel states, device variability, programming energy, array-level connectivity, fan-in/fanout, wire energy, and IR drop are presented. Wires are increasingly important in design decisions, especially for large systems, and cycle-to-cycle variations have large impact on learning performance.Comment: 4 pages, In Electron Devices Meeting (IEDM), 2015 IEEE International (pp. 4.1). IEEE. Original paper can be found here: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7409622. Abstract can be found here: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7409622&refinements%3D4224410500%26filter%3DAND%28p_IS_Number%3A7409598%2

arXiv.org e-Print Archive

Crossbar-aware neural network pruning

Author: Deng Lei
Hu Xing
Ji Yu
Li Guoqi
Liang Ling
Ma Xin
Xie Yuan
Zeng Yueling
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/12/2018
Field of study

Crossbar architecture based devices have been widely adopted in neural network accelerators by taking advantage of the high efficiency on vector-matrix multiplication (VMM) operations. However, in the case of convolutional neural networks (CNNs), the efficiency is compromised dramatically due to the large amounts of data reuse. Although some mapping methods have been designed to achieve a balance between the execution throughput and resource overhead, the resource consumption cost is still huge while maintaining the throughput. Network pruning is a promising and widely studied leverage to shrink the model size. Whereas, previous work didn`t consider the crossbar architecture and the corresponding mapping method, which cannot be directly utilized by crossbar-based neural network accelerators. Tightly combining the crossbar structure and its mapping, this paper proposes a crossbar-aware pruning framework based on a formulated L0-norm constrained optimization problem. Specifically, we design an L0-norm constrained gradient descent (LGD) with relaxant probabilistic projection (RPP) to solve this problem. Two grains of sparsity are successfully achieved: i) intuitive crossbar-grain sparsity and ii) column-grain sparsity with output recombination, based on which we further propose an input feature maps (FMs) reorder method to improve the model accuracy. We evaluate our crossbar-aware pruning framework on median-scale CIFAR10 dataset and large-scale ImageNet dataset with VGG and ResNet models. Our method is able to reduce the crossbar overhead by 44%-72% with little accuracy degradation. This work greatly saves the resource and the related energy cost, which provides a new co-design solution for mapping CNNs onto various crossbar devices with significantly higher efficiency

arXiv.org e-Print Archive

Recent Advances in Efficient Computation of Deep Convolutional Neural Networks

Author: Cheng Jian
Hu Qinghao
Li Gang
Lu Hanqing
Wang Peisong
Publication venue
Publication date: 11/02/2018
Field of study

Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks also continue to increase. This will pose a significant challenge to the deployment of such networks, especially in real-time applications or on resource-limited devices. Thus, network acceleration has become a hot topic within the deep learning community. As for hardware implementation of deep neural networks, a batch of accelerators based on FPGA/ASIC have been proposed in recent years. In this paper, we provide a comprehensive survey of recent advances in network acceleration, compression and accelerator design from both algorithm and hardware points of view. Specifically, we provide a thorough analysis of each of the following topics: network pruning, low-rank approximation, network quantization, teacher-student networks, compact network design and hardware accelerators. Finally, we will introduce and discuss a few possible future directions.Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

A case for multiple and parallel RRAMs as synaptic model for training SNNs

Author: Ganguly Udayan
Lashkare Sandip
Prasad Sidharth
Shukla Aditya
Publication venue
Publication date: 13/03/2018
Field of study

To enable a dense integration of model synapses in a spiking neural networks hardware, various nano-scale devices are being considered. Such a device, besides exhibiting spike-time dependent plasticity (STDP), needs to be highly scalable, have a large endurance and require low energy for transitioning between states. In this work, we first introduce and empirically determine two new specifications for an synapse in SNNs: number of conductance levels per synapse and maximum learning-rate. To the best of our knowledge, there are no RRAMs that meet the latter specification. As a solution, we propose the use of multiple PCMO-RRAMs in parallel within a synapse. While synaptic reading, all PCMO-RRAMs are simultaneously read and for each synaptic conductance-change event, the mechanism for conductance STDP is initiated for only one RRAM, randomly picked from the set. Second, to validate our solution, we experimentally demonstrate STDP of conductance of a PCMO-RRAM and then show that due to a large learning-rate, a single PCMO-RRAM fails to model a synapse in the training of an SNN. As anticipated, network training improves as more PCMO-RRAMs are added to the synapse. Fourth, we discuss the circuit-requirements for implementing such a scheme, to conclude that the requirements are within bounds. Thus, our work presents specifications for synaptic devices in trainable SNNs, indicates the shortcomings of state-of-art synaptic contenders, and provides a solution to extrinsically meet the specifications and discusses the peripheral circuitry that implements the solution.Comment: 8 pages, 18 figures and 1 tabl

arXiv.org e-Print Archive

High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS

Author: Seo Jae-sun
Sun Xiaoyu
Yin Shihui
Yu Shimeng
Publication venue
Publication date: 16/09/2019
Field of study

Deep learning hardware designs have been bottlenecked by conventional memories such as SRAM due to density, leakage and parallel computing challenges. Resistive devices can address the density and volatility issues, but have been limited by peripheral circuit integration. In this work, we demonstrate a scalable RRAM based in-memory computing design, termed XNOR-RRAM, which is fabricated in a 90nm CMOS technology with monolithic integration of RRAM devices between metal 1 and 2. We integrated a 128x64 RRAM array with CMOS peripheral circuits including row/column decoders and flash analog-to-digital converters (ADCs), which collectively become a core component for scalable RRAM-based in-memory computing towards large deep neural networks (DNNs). To maximize the parallelism of in-memory computing, we assert all 128 wordlines of the RRAM array simultaneously, perform analog computing along the bitlines, and digitize the bitline voltages using ADCs. The resistance distribution of low resistance states is tightened by write-verify scheme, and the ADC offset is calibrated. Prototype chip measurements show that the proposed design achieves high binary DNN accuracy of 98.5% for MNIST and 83.5% for CIFAR-10 datasets, respectively, with energy efficiency of 24 TOPS/W and 158 GOPS throughput. This represents 5.6X, 3.2X, 14.1X improvements in throughput, energy-delay product (EDP), and energy-delay-squared product (ED2P), respectively, compared to the state-of-the-art literature. The proposed XNOR-RRAM can enable intelligent functionalities for area-/energy-constrained edge computing devices

arXiv.org e-Print Archive

Compact Device Models for FinFET and Beyond

Author: Chiang Meng-Hsueh
Dunga Mohan V.
Hsu Chun-Hsiang
Hu Chenming
Hung Wei-Chen
Lee Jia-Wei
Liang Fu-Xiang
Lu Darsen D.
Niknejad Ali M.
Publication venue
Publication date: 05/05/2020
Field of study

Compact device models play a significant role in connecting device technology and circuit design. BSIM-CMG and BSIM-IMG are industry standard compact models suited for the FinFET and UTBB technologies, respectively. Its surface potential based modeling framework and symmetry preserving properties make them suitable for both analog/RF and digital design. In the era of artificial intelligence / deep learning, compact models further enhanced our ability to explore RRAM and other NVM-based neuromorphic circuits. We have demonstrated simulation of RRAM neuromorphic circuits with Verilog-A based compact model at NCKU. Further abstraction with macromodels is performed to enable larger scale machine learning simulation.Comment: Invited talk at the Asia-Pacific Workshop on Fundamentals and Applications of Advanced Semiconductor Devices (AWAD), Kitakyushu, Japan, July 201

arXiv.org e-Print Archive