Search CORE

954 research outputs found

ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

Author: González Antonio
Riera Marc
Sabri Mohammad
Publication venue
Publication date: 28/06/2023
Field of study

The primary operation in DNNs is the dot product of quantized input activations and weights. Prior works have proposed the design of memory-centric architectures based on the Processing-In-Memory (PIM) paradigm. Resistive RAM (ReRAM) technology is especially appealing for PIM-based DNN accelerators due to its high density to store weights, low leakage energy, low read latency, and high performance capabilities to perform the DNN dot-products massively in parallel within the ReRAM crossbars. However, the main bottleneck of these architectures is the energy-hungry analog-to-digital conversions (ADCs) required to perform analog computations in-ReRAM, which penalizes the efficiency and performance benefits of PIM. To improve energy-efficiency of in-ReRAM analog dot-product computations we present ReDy, a hardware accelerator that implements a ReRAM-centric Dynamic quantization scheme to take advantage of the bit serial streaming and processing of activations. The energy consumption of ReRAM-based DNN accelerators is directly proportional to the numerical precision of the input activations of each DNN layer. In particular, ReDy exploits that activations of CONV layers from Convolutional Neural Networks (CNNs), a subset of DNNs, are commonly grouped according to the size of their filters and the size of the ReRAM crossbars. Then, ReDy quantizes on-the-fly each group of activations with a different numerical precision based on a novel heuristic that takes into account the statistical distribution of each group. Overall, ReDy greatly reduces the activity of the ReRAM crossbars and the number of A/D conversions compared to an static 8-bit uniform quantization. We evaluate ReDy on a popular set of modern CNNs. On average, ReDy provides 13\% energy savings over an ISAAC-like accelerator with negligible accuracy loss and area overhead.Comment: 13 pages, 16 figures, 4 Table

arXiv.org e-Print Archive

GraphR: Accelerating Graph Processing Using ReRAM

Author: Chen Yiran
Li Hai
Qian Xuehai
Song Linghao
Zhuo Youwei
Publication venue
Publication date: 08/12/2017
Field of study

This paper presents GRAPHR, the first ReRAM-based graph processing accelerator. GRAPHR follows the principle of near-data processing and explores the opportunity of performing massive parallel analog operations with low hardware and energy cost. The analog computation is suit- able for graph processing because: 1) The algorithms are iterative and could inherently tolerate the imprecision; 2) Both probability calculation (e.g., PageRank and Collaborative Filtering) and typical graph algorithms involving integers (e.g., BFS/SSSP) are resilient to errors. The key insight of GRAPHR is that if a vertex program of a graph algorithm can be expressed in sparse matrix vector multiplication (SpMV), it can be efficiently performed by ReRAM crossbar. We show that this assumption is generally true for a large set of graph algorithms. GRAPHR is a novel accelerator architecture consisting of two components: memory ReRAM and graph engine (GE). The core graph computations are performed in sparse matrix format in GEs (ReRAM crossbars). The vector/matrix-based graph computation is not new, but ReRAM offers the unique opportunity to realize the massive parallelism with unprecedented energy efficiency and low hardware cost. With small subgraphs processed by GEs, the gain of performing parallel operations overshadows the wastes due to sparsity. The experiment results show that GRAPHR achieves a 16.01x (up to 132.67x) speedup and a 33.82x energy saving on geometric mean compared to a CPU baseline system. Com- pared to GPU, GRAPHR achieves 1.69x to 2.19x speedup and consumes 4.77x to 8.91x less energy. GRAPHR gains a speedup of 1.16x to 4.12x, and is 3.67x to 10.96x more energy efficiency compared to PIM-based architecture.Comment: Accepted to HPCA 201

arXiv.org e-Print Archive

Crossref

FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

Author: Hu Xing
Ji Yu
Li Shuangchen
Wang Peiqi
Xie Xinfeng
Xie Yuan
Zhang Youhui
Zhang Youyang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/01/2019
Field of study

Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201

arXiv.org e-Print Archive

Crossref

Neuro-memristive Circuits for Edge Computing: A review

Author: Chua Leon O.
James Alex Pappachen
Krestinskaya Olga
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The volume, veracity, variability, and velocity of data produced from the ever-increasing network of sensors connected to Internet pose challenges for power management, scalability, and sustainability of cloud computing infrastructure. Increasing the data processing capability of edge computing devices at lower power requirements can reduce several overheads for cloud computing solutions. This paper provides the review of neuromorphic CMOS-memristive architectures that can be integrated into edge computing devices. We discuss why the neuromorphic architectures are useful for edge devices and show the advantages, drawbacks and open problems in the field of neuro-memristive circuits for edge computing

arXiv.org e-Print Archive

Nazarbayev University Repository

A Complementary Resistive Switch-based Crossbar Array Adder

Author: Linn E.
Menzel S.
Siemon A.
Waser R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Redox-based resistive switching devices (ReRAM) are an emerging class of non-volatile storage elements suited for nanoscale memory applications. In terms of logic operations, ReRAM devices were suggested to be used as programmable interconnects, large-scale look-up tables or for sequential logic operations. However, without additional selector devices these approaches are not suited for use in large scale nanocrossbar memory arrays, which is the preferred architecture for ReRAM devices due to the minimum area consumption. To overcome this issue for the sequential logic approach, we recently introduced a novel concept, which is suited for passive crossbar arrays using complementary resistive switches (CRSs). CRS cells offer two high resistive storage states, and thus, parasitic sneak currents are efficiently avoided. However, until now the CRS-based logic-in-memory approach was only shown to be able to perform basic Boolean logic operations using a single CRS cell. In this paper, we introduce two multi-bit adder schemes using the CRS-based logic-in-memory approach. We proof the concepts by means of SPICE simulations using a dynamical memristive device model of a ReRAM cell. Finally, we show the advantages of our novel adder concept in terms of step count and number of devices in comparison to a recently published adder approach, which applies the conventional ReRAM-based sequential logic concept introduced by Borghetti et al.Comment: 12 pages, accepted for IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), issue on Computing in Emerging Technologie

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources