273 research outputs found
A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation
The computing wall and data movement challenges of deep neural networks
(DNNs) have exposed the limitations of conventional CMOS-based DNN
accelerators. Furthermore, the deep structure and large model size will make
DNNs prohibitive to embedded systems and IoT devices, where low power
consumption are required. To address these challenges, spin orbit torque
magnetic random-access memory (SOT-MRAM) and SOT-MRAM based
Processing-In-Memory (PIM) engines have been used to reduce the power
consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby
power, high density, none-volatile. However, the drawbacks of SOT-MRAM based
PIM engines such as high writing latency and requiring low bit-width data
decrease its popularity as a favorable energy efficient DNN accelerator. To
mitigate these drawbacks, we propose an ultra energy efficient framework by
using model compression techniques including weight pruning and quantization
from the software level considering the architecture of SOT-MRAM PIM. And we
incorporate the alternating direction method of multipliers (ADMM) into the
training phase to further guarantee the solution feasibility and satisfy
SOT-MRAM hardware constraints. Thus, the footprint and power consumption of
SOT-MRAM PIM can be reduced, while increasing the overall system throughput at
the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy
efficiency and suitable for embedded systems or IoT devices. Our experimental
results show the accuracy and compression rate of our proposed framework is
consistently outperforming the reference works, while the efficiency (area \&
power) and throughput of SOT-MRAM PIM engine is significantly improved
MRAM Co-designed Processing-in-Memory CNN Accelerator for Mobile and IoT Applications
We designed a device for Convolution Neural Network applications with
non-volatile MRAM memory and computing-in-memory co-designed architecture. It
has been successfully fabricated using 22nm technology node CMOS Si process.
More than 40MB MRAM density with 9.9TOPS/W are provided. It enables multiple
models within one single chip for mobile and IoT device applications.Comment: 4 pages, 4 figures, 1 table. Accepted by NIPS 2018 MLPCD worksho
PIMBALL: Binary Neural Networks in Spintronic Memory
Neural networks span a wide range of applications of industrial and
commercial significance. Binary neural networks (BNN) are particularly
effective in trading accuracy for performance, energy efficiency or
hardware/software complexity. Here, we introduce a spintronic, re-configurable
in-memory BNN accelerator, PIMBALL: Processing In Memory BNN AcceL(L)erator,
which allows for massively parallel and energy efficient computation. PIMBALL
is capable of being used as a standard spintronic memory (STT-MRAM) array and a
computational substrate simultaneously. We evaluate PIMBALL using multiple
image classifiers and a genomics kernel. Our simulation results show that
PIMBALL is more energy efficient than alternative CPU, GPU, and FPGA based
implementations while delivering higher throughput
Transfer and Online Reinforcement Learning in STT-MRAM Based Embedded Systems for Autonomous Drones
In this paper we present an algorithm-hardware codesign for camera-based
autonomous flight in small drones. We show that the large write-latency and
write-energy for nonvolatile memory (NVM) based embedded systems makes them
unsuitable for real-time reinforcement learning (RL). We address this by
performing transfer learning (TL) on metaenvironments and RL on the last few
layers of a deep convolutional network. While the NVM stores the meta-model
from TL, an on-die SRAM stores the weights of the last few layers. Thus all the
real-time updates via RL are carried out on the SRAM arrays. This provides us
with a practical platform with comparable performance as end-to-end RL and
83.4% lower energy per image fram
Reliable and Energy Efficient MLC STT-RAM Buffer for CNN Accelerators
We propose a lightweight scheme where the formation of a data block is changed in such a way that it can tolerate soft errors significantly better than the baseline. The key insight behind our work is that CNN weights are normalized between -1 and 1 after each convolutional layer, and this leaves one bit unused in half-precision floating-point representation. By taking advantage of the unused bit, we create a backup for the most significant bit to protect it against the soft errors. Also, considering the fact that in MLC STT-RAMs the cost of memory operations (read and write), and reliability of a cell are content-dependent (some patterns take larger current and longer time, while they are more susceptible to soft error), we rearrange the data block to minimize the number of costly bit patterns. Combining these two techniques provides the same level of accuracy compared to an error-free baseline while improving the read and write energy by 9% and 6%, respectively
FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks
Convolutional Neural Networks (CNNs) demonstrate excellent performance in
various applications but have high computational complexity. Quantization is
applied to reduce the latency and storage cost of CNNs. Among the quantization
methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique
advantage over 8-bit and 4-bit quantization. They replace the multiplication
operations in CNNs with additions, which are favoured on In-Memory-Computing
(IMC) devices. IMC acceleration for BWNs has been widely studied. However,
though TWNs have higher accuracy and better sparsity than BWNs, IMC
acceleration for TWNs has limited research. TWNs on existing IMC devices are
inefficient because the sparsity is not well utilized, and the addition
operation is not efficient.
In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we
propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to
skip the null operations on zero weights. Second, we propose a fast addition
scheme based on the memory Sense Amplifier to avoid the time overhead of both
carry propagation and writing back the carry to memory cells. Third, we further
propose a Combined-Stationary data mapping to reduce the data movement of
activations and weights and increase the parallelism across memory columns.
Simulation results show that for addition operations at the Sense Amplifier
level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area
efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT
achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on
networks with 80% average sparsity.Comment: 14 page
NAND-SPIN-Based Processing-in-MRAM Architecture for Convolutional Neural Network Acceleration
The performance and efficiency of running large-scale datasets on traditional
computing systems exhibit critical bottlenecks due to the existing "power wall"
and "memory wall" problems. To resolve those problems, processing-in-memory
(PIM) architectures are developed to bring computation logic in or near memory
to alleviate the bandwidth limitations during data transmission. NAND-like
spintronics memory (NAND-SPIN) is one kind of promising magnetoresistive
random-access memory (MRAM) with low write energy and high integration density,
and it can be employed to perform efficient in-memory computation operations.
In this work, we propose a NAND-SPIN-based PIM architecture for efficient
convolutional neural network (CNN) acceleration. A straightforward data mapping
scheme is exploited to improve the parallelism while reducing data movements.
Benefiting from the excellent characteristics of NAND-SPIN and in-memory
processing architecture, experimental results show that the proposed approach
can achieve 2.6 speedup and 1.4 improvement in
energy efficiency over state-of-the-art PIM solutions.Comment: 15 pages, accepted by SCIENCE CHINA Information Sciences (SCIS) 202
Stochastic Computing for Hardware Implementation of Binarized Neural Networks
Binarized Neural Networks, a recently discovered class of neural networks
with minimal memory requirements and no reliance on multiplication, are a
fantastic opportunity for the realization of compact and energy efficient
inference hardware. However, such neural networks are generally not entirely
binarized: their first layer remains with fixed point input. In this work, we
propose a stochastic computing version of Binarized Neural Networks, where the
input is also binarized. Simulations on the example of the Fashion-MNIST and
CIFAR-10 datasets show that such networks can approach the performance of
conventional Binarized Neural Networks. We evidence that the training procedure
should be adapted for use with stochastic computing. Finally, the ASIC
implementation of our scheme is investigated, in a system that closely
associates logic and memory, implemented by Spin Torque Magnetoresistive Random
Access Memory. This analysis shows that the stochastic computing approach can
allow considerable savings with regards to conventional Binarized Neural
networks in terms of area (62% area reduction on the Fashion-MNIST task). It
can also allow important savings in terms of energy consumption, if we accept
reasonable reduction of accuracy: for example a factor 2.1 can be saved, with
the cost of 1.4% in Fashion-MNIST test accuracy. These results highlight the
high potential of Binarized Neural Networks for hardware implementation, and
that adapting them to hardware constrains can provide important benefits
Thermal Aware Design Automation of the Electronic Control System for Autonomous Vehicles
The autonomous vehicle (AV) technology, due to its tremendous social and economical benefits, is transforming the entire world in the coming decades. However, significant technical challenges still need to be overcome until AVs can be safely, reliably, and massively deployed. Temperature plays a key role in the safety and reliability of an AV, not only because a vehicle is subjected to extreme operating temperatures but also because the increasing computations demand more powerful IC chips, which can lead to higher operating temperature and large thermal gradient. In particular, as the underpinning technology for AV, artificial intelligence (AI) requires substantially increased computation and memory resources, which have been growing exponentially through recent years and further exacerbated the thermal problems. High operating temperature and large thermal gradient can reduce the performance, degrade the reliability, and even cause an IC to fail catastrophically. We believe that dealing with thermal issues must be coupled closely in the design phase of the AVs’ electronic control system (ECS). To this end, first, we study how to map vehicle applications to ECS with heterogeneous architecture to satisfy peak temperature constraints and optimize latency and system-level reliability. We present a mathematical programming model to bound the peak temperature for the ECS. We also develop an approach based on the genetic algorithm to bound the peak temperature under varying execution time scenarios and optimize the system-level reliability of the ECS. We present several computationally efficient techniques for system-level mean-time-to-failure (MTTF) computation, which show several orders-of-magnitude speed-up over the state-of-the-art method. Second, we focus on studying the thermal impacts of AI techniques. Specifically, we study how the thermal impacts for the memory bit flipping can affect the prediction accuracy of a deep neural network (DNN). We develop a neuron-level analytical sensitivity estimation framework to quantify this impact and study its effectiveness with popular DNN architectures. Third, we study the problem of incorporating thermal impacts into mapping the parameters for DNN neurons to memory banks to improve prediction accuracy. Based on our developed sensitivity metric, we develop a bin-packing-based approach to map DNN neuron parameters to memory banks with different temperature profiles. We also study the problem of identifying the optimal temperature profiles for memory systems that can minimize the thermal impacts. We show that the thermal aware mapping of DNN neuron parameters on memory banks can significantly improve the prediction accuracy at a high-temperature range than the thermal ignorant for state-of-the-art DNNs
Electrically-Tunable Stochasticity for Spin-based Neuromorphic Circuits: Self-Adjusting to Variation
Energy-efficient methods are addressed for leveraging low energy barrier
nanomagnetic devices within neuromorphic architectures. Using a
Magnetoresistive Random Access Memory (MRAM) probabilistic device (p-bit) as
the basis of neuronal structures in Deep Belief Networks (DBNs), the impact of
reducing the Magnetic Tunnel Junction's (MTJ's) energy barrier is assessed and
optimized for the resulting stochasticity present in the learning system. This
can mitigate the process variation sensitivity of stochastic DBNs which
encounter a sharp drop-off when energy barriers exceed near-zero kT. As
evaluated for the MNIST dataset for energy barriers at near-zero kT to 2.0 kT
in increments of 0.5 kT, it is shown that the stability factor changes by 5
orders of magnitude. The self-compensating circuit developed herein provides a
compact, and low complexity approach to mitigating process variation impacts
towards practical implementation and fabrication
- …