448 research outputs found
Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators
Recently, RRAM-based Binary Neural Network (BNN) hardware has been gaining
interests as it requires 1-bit sense-amp only and eliminates the need for
high-resolution ADC and DAC. However, RRAM-based BNN hardware still requires
high-resolution ADC for partial sum calculation to implement large-scale neural
network using multiple memory arrays. We propose a neural network-hardware
co-design approach to split input to fit each split network on a RRAM array so
that the reconstructed BNNs calculate 1-bit output neuron in each array. As a
result, ADC can be completely eliminated from the design even for large-scale
neural network. Simulation results show that the proposed network
reconstruction and retraining recovers the inference accuracy of the original
BNN. The accuracy loss of the proposed scheme in the CIFAR-10 testcase was less
than 1.1% compared to the original network. The code for training and running
proposed BNN models is available at:
https://github.com/YulhwaKim/RRAMScalable_BNN
PIMBALL: Binary Neural Networks in Spintronic Memory
Neural networks span a wide range of applications of industrial and
commercial significance. Binary neural networks (BNN) are particularly
effective in trading accuracy for performance, energy efficiency or
hardware/software complexity. Here, we introduce a spintronic, re-configurable
in-memory BNN accelerator, PIMBALL: Processing In Memory BNN AcceL(L)erator,
which allows for massively parallel and energy efficient computation. PIMBALL
is capable of being used as a standard spintronic memory (STT-MRAM) array and a
computational substrate simultaneously. We evaluate PIMBALL using multiple
image classifiers and a genomics kernel. Our simulation results show that
PIMBALL is more energy efficient than alternative CPU, GPU, and FPGA based
implementations while delivering higher throughput
Dendritic-Inspired Processing Enables Bio-Plausible STDP in Compound Binary Synapses
Brain-inspired learning mechanisms, e.g. spike timing dependent plasticity
(STDP), enable agile and fast on-the-fly adaptation capability in a spiking
neural network. When incorporating emerging nanoscale resistive non-volatile
memory (NVM) devices, with ultra-low power consumption and high-density
integration capability, a spiking neural network hardware would result in
several orders of magnitude reduction in energy consumption at a very small
form factor and potentially herald autonomous learning machines. However,
actual memory devices have shown to be intrinsically binary with stochastic
switching, and thus impede the realization of ideal STDP with continuous analog
values. In this work, a dendritic-inspired processing architecture is proposed
in addition to novel CMOS neuron circuits. The utilization of spike
attenuations and delays transforms the traditionally undesired stochastic
behavior of binary NVMs into a useful leverage that enables
biologically-plausible STDP learning. As a result, this work paves a pathway to
adopt practical binary emerging NVM devices in brain-inspired neuromorphic
computing
Exploring the Connection Between Binary and Spiking Neural Networks
On-chip edge intelligence has necessitated the exploration of algorithmic
techniques to reduce the compute requirements of current machine learning
frameworks. This work aims to bridge the recent algorithmic progress in
training Binary Neural Networks and Spiking Neural Networks - both of which are
driven by the same motivation and yet synergies between the two have not been
fully explored. We show that training Spiking Neural Networks in the extreme
quantization regime results in near full precision accuracies on large-scale
datasets like CIFAR- and ImageNet. An important implication of this work
is that Binary Spiking Neural Networks can be enabled by "In-Memory" hardware
accelerators catered for Binary Neural Networks without suffering any accuracy
degradation due to binarization. We utilize standard training techniques for
non-spiking networks to generate our spiking networks by conversion process and
also perform an extensive empirical analysis and explore simple design-time and
run-time optimization techniques for reducing inference latency of spiking
networks (both for binary and full-precision models) by an order of magnitude
over prior work
Device and System Level Design Considerations for Analog-Non-Volatile-Memory Based Neuromorphic Architectures
This paper gives an overview of recent progress in the brain inspired
computing field with a focus on implementation using emerging memories as
electronic synapses. Design considerations and challenges such as requirements
and design targets on multilevel states, device variability, programming
energy, array-level connectivity, fan-in/fanout, wire energy, and IR drop are
presented. Wires are increasingly important in design decisions, especially for
large systems, and cycle-to-cycle variations have large impact on learning
performance.Comment: 4 pages, In Electron Devices Meeting (IEDM), 2015 IEEE International
(pp. 4.1). IEEE. Original paper can be found here:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7409622. Abstract can
be found here:
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7409622&refinements%3D4224410500%26filter%3DAND%28p_IS_Number%3A7409598%2
Crossbar-aware neural network pruning
Crossbar architecture based devices have been widely adopted in neural
network accelerators by taking advantage of the high efficiency on
vector-matrix multiplication (VMM) operations. However, in the case of
convolutional neural networks (CNNs), the efficiency is compromised
dramatically due to the large amounts of data reuse. Although some mapping
methods have been designed to achieve a balance between the execution
throughput and resource overhead, the resource consumption cost is still huge
while maintaining the throughput.
Network pruning is a promising and widely studied leverage to shrink the
model size. Whereas, previous work didn`t consider the crossbar architecture
and the corresponding mapping method, which cannot be directly utilized by
crossbar-based neural network accelerators. Tightly combining the crossbar
structure and its mapping, this paper proposes a crossbar-aware pruning
framework based on a formulated L0-norm constrained optimization problem.
Specifically, we design an L0-norm constrained gradient descent (LGD) with
relaxant probabilistic projection (RPP) to solve this problem. Two grains of
sparsity are successfully achieved: i) intuitive crossbar-grain sparsity and
ii) column-grain sparsity with output recombination, based on which we further
propose an input feature maps (FMs) reorder method to improve the model
accuracy. We evaluate our crossbar-aware pruning framework on median-scale
CIFAR10 dataset and large-scale ImageNet dataset with VGG and ResNet models.
Our method is able to reduce the crossbar overhead by 44%-72% with little
accuracy degradation. This work greatly saves the resource and the related
energy cost, which provides a new co-design solution for mapping CNNs onto
various crossbar devices with significantly higher efficiency
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
A case for multiple and parallel RRAMs as synaptic model for training SNNs
To enable a dense integration of model synapses in a spiking neural networks
hardware, various nano-scale devices are being considered. Such a device,
besides exhibiting spike-time dependent plasticity (STDP), needs to be highly
scalable, have a large endurance and require low energy for transitioning
between states. In this work, we first introduce and empirically determine two
new specifications for an synapse in SNNs: number of conductance levels per
synapse and maximum learning-rate. To the best of our knowledge, there are no
RRAMs that meet the latter specification. As a solution, we propose the use of
multiple PCMO-RRAMs in parallel within a synapse. While synaptic reading, all
PCMO-RRAMs are simultaneously read and for each synaptic conductance-change
event, the mechanism for conductance STDP is initiated for only one RRAM,
randomly picked from the set. Second, to validate our solution, we
experimentally demonstrate STDP of conductance of a PCMO-RRAM and then show
that due to a large learning-rate, a single PCMO-RRAM fails to model a synapse
in the training of an SNN. As anticipated, network training improves as more
PCMO-RRAMs are added to the synapse. Fourth, we discuss the
circuit-requirements for implementing such a scheme, to conclude that the
requirements are within bounds. Thus, our work presents specifications for
synaptic devices in trainable SNNs, indicates the shortcomings of state-of-art
synaptic contenders, and provides a solution to extrinsically meet the
specifications and discusses the peripheral circuitry that implements the
solution.Comment: 8 pages, 18 figures and 1 tabl
High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS
Deep learning hardware designs have been bottlenecked by conventional
memories such as SRAM due to density, leakage and parallel computing
challenges. Resistive devices can address the density and volatility issues,
but have been limited by peripheral circuit integration. In this work, we
demonstrate a scalable RRAM based in-memory computing design, termed XNOR-RRAM,
which is fabricated in a 90nm CMOS technology with monolithic integration of
RRAM devices between metal 1 and 2. We integrated a 128x64 RRAM array with CMOS
peripheral circuits including row/column decoders and flash analog-to-digital
converters (ADCs), which collectively become a core component for scalable
RRAM-based in-memory computing towards large deep neural networks (DNNs). To
maximize the parallelism of in-memory computing, we assert all 128 wordlines of
the RRAM array simultaneously, perform analog computing along the bitlines, and
digitize the bitline voltages using ADCs. The resistance distribution of low
resistance states is tightened by write-verify scheme, and the ADC offset is
calibrated. Prototype chip measurements show that the proposed design achieves
high binary DNN accuracy of 98.5% for MNIST and 83.5% for CIFAR-10 datasets,
respectively, with energy efficiency of 24 TOPS/W and 158 GOPS throughput. This
represents 5.6X, 3.2X, 14.1X improvements in throughput, energy-delay product
(EDP), and energy-delay-squared product (ED2P), respectively, compared to the
state-of-the-art literature. The proposed XNOR-RRAM can enable intelligent
functionalities for area-/energy-constrained edge computing devices
Compact Device Models for FinFET and Beyond
Compact device models play a significant role in connecting device technology
and circuit design. BSIM-CMG and BSIM-IMG are industry standard compact models
suited for the FinFET and UTBB technologies, respectively. Its surface
potential based modeling framework and symmetry preserving properties make them
suitable for both analog/RF and digital design. In the era of artificial
intelligence / deep learning, compact models further enhanced our ability to
explore RRAM and other NVM-based neuromorphic circuits. We have demonstrated
simulation of RRAM neuromorphic circuits with Verilog-A based compact model at
NCKU. Further abstraction with macromodels is performed to enable larger scale
machine learning simulation.Comment: Invited talk at the Asia-Pacific Workshop on Fundamentals and
Applications of Advanced Semiconductor Devices (AWAD), Kitakyushu, Japan,
July 201
- …