2 research outputs found
A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision
The excellent performance of modern deep neural networks (DNNs) comes at an
often prohibitive training cost, limiting the rapid development of DNN
innovations and raising various environmental concerns. To reduce the dominant
data movement cost of training, process in-memory (PIM) has emerged as a
promising solution as it alleviates the need to access DNN weights. However,
state-of-the-art PIM DNN training accelerators employ either analog/mixed
signal computing which has limited precision or digital computing based on a
memory technology that supports limited logic functions and thus requires
complicated procedure to realize floating point computation. In this paper, we
propose a spin orbit torque magnetic random access memory (SOT-MRAM) based
digital PIM accelerator that supports floating point precision. Specifically,
this new accelerator features an innovative (1) SOT-MRAM cell, (2) full
addition design, and (3) floating point computation. Experiment results show
that the proposed SOT-MRAM PIM based DNN training accelerator can achieve
3.3, 1.8, and 2.5 improvement in terms of energy,
latency, and area, respectively, compared with a state-of-the-art PIM based DNN
training accelerator.Comment: Accepted by the IEEE International Symposium on Circuits and Systems
2020 (ISCAS'2020
TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain
Resistive-random-access-memory (ReRAM) based processing-in-memory (RPIM)
accelerators show promise in bridging the gap between Internet of Thing
devices' constrained resources and Convolutional/Deep Neural Networks'
(CNNs/DNNs') prohibitive energy cost. Specifically, RPIM accelerators
enhance energy efficiency by eliminating the cost of weight movements and
improving the computational density through ReRAM's high density. However, the
energy efficiency is still limited by the dominant energy cost of input and
partial sum (Psum) movements and the cost of digital-to-analog (D/A) and
analog-to-digital (A/D) interfaces. In this work, we identify three
energy-saving opportunities in RPIM accelerators: analog data locality,
time-domain interfacing, and input access reduction, and propose an innovative
RPIM accelerator called TIMELY, with three key contributions: (1) TIMELY
adopts analog local buffers (ALBs) within ReRAM crossbars to greatly enhance
the data locality, minimizing the energy overheads of both input and Psum
movements; (2) TIMELY largely reduces the energy of each single D/A (and A/D)
conversion and the total number of conversions by using time-domain interfaces
(TDIs) and the employed ALBs, respectively; (3) we develop an only-once input
read (OIR) mapping method to further decrease the energy of input accesses
and the number of D/A conversions. The evaluation with more than 10 CNN/DNN
models and various chip configurations shows that, TIMELY outperforms the
baseline RPIM accelerator, PRIME, by one order of magnitude in energy
efficiency while maintaining better computational density (up to 31.2)
and throughput (up to 736.6). Furthermore, comprehensive studies are
performed to evaluate the effectiveness of the proposed ALB, TDI, and OIR
innovations in terms of energy savings and area reduction.Comment: Accepted by 47th International Symposium on Computer Architecture
(ISCA'2020