Search CORE

15 research outputs found

Architecture and Circuit Design Optimization for Compute-In-Memory

Author: Jiang Hongwu
Publication venue: Georgia Institute of Technology
Publication date: 10/01/2023
Field of study

The objective of the proposed research is to optimize computing-in-memory (CIM) design for accelerating Deep Neural Network (DNN) algorithms. As compute peripheries such as analog-to-digital converter (ADC) introduce significant overhead in CIM inference design, the research first focuses on the circuit optimization for inference acceleration and proposes a resistive random access memory (RRAM) based ADC-free in-memory compute scheme. We comprehensively explore the trade-offs involving different types of ADCs and investigate a new ADC design especially suited for the CIM, which performs the analog shift-add for multiple weight significance bits, improving the throughput and energy efficiency under similar area constraints. Furthermore, we prototype an ADC-free CIM inference chip design with a fully-analog data processing manner between sub-arrays, which can significantly improve the hardware performance over the conventional CIM designs and achieve near-software classification accuracy on ImageNet and CIFAR-10/-100 dataset. Secondly, the research focuses on hardware support for CIM on-chip training. To maximize hardware reuse of CIM weight stationary dataflow, we propose the CIM training architectures with the transpose weight mapping strategy. The cell design and periphery circuitry are modified to efficiently support bi-directional compute. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. Finally, we propose an SRAM-based CIM training architecture and comprehensively explore the system-level hardware performance for DNN on-chip training based on silicon measurement results.Ph.D

Scholarly Materials And Research @ Georgia Tech

Recommended from our members

A fully hardware-based memristive multilayer neural network

Author: Kiani Fatemeh
Wang Zhongrui
Xia Qiangfei
Yang J. Joshua
Yin Jun
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

Memristive crossbar arrays promise substantial improvements in computing throughput and power efficiency through in-memory analog computing. Previous machine learning demonstrations with memristive arrays, however, relied on software or digital processors to implement some critical functionalities, leading to frequent analog/digital conversions and more complicated hardware that compromises the energy efficiency and computing parallelism. Here, we show that, by implementing the activation function of a neural network in analog hardware, analog signals can be transmitted to the next layer without unnecessary digital conversion, communication, and processing. We have designed and built compact rectified linear units, with which we constructed a two-layer perceptron using memristive crossbar arrays, and demonstrated a recognition accuracy of 93.63% for the Modified National Institute of Standard and Technology (MNIST) handwritten digits dataset. The fully hardware-based neural network reduces both the data shuttling and conversion, capable of delivering much higher computing throughput and power efficiency

ScholarWorks@UMass Amherst

Multi-State Memristors and Their Applications: An Overview

Author: Jiang Xiongfei
Malik Adil
Pan Yihan
Papavassiliou Christos
Prodromakis Themis
Serb Alex
Si Zhaoguang
Stathopoulos Spyros
Wang Chaohan
Wang Shiwei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/11/2022
Field of study

Edinburgh Research Explorer

COMPUTE-IN-MEMORY WITH EMERGING NON-VOLATILE MEMORIES FOR ACCELERATING DEEP NEURAL NETWORKS

Author: Sun Xiaoyu
Publication venue: Georgia Institute of Technology
Publication date: 15/09/2021
Field of study

The objective of this research is to accelerate deep neural networks (DNNs) with emerging non-volatile memories (eNVMs) based compute-in-memory (CIM) architecture. The research first focuses on the inference acceleration and proposes a resistive random access memory (RRAM) based CIM architecture. Two generations of RRAM testchips which monolithically integrate the RRAM memory array and CMOS peripheral circuits are designed and fabricated using Winbond 90 nm and TSMC 40 nm commercial embedded RRAM process respectively. The first generation of testchip named XNOR-RRAM is dedicated for binary neural networks (BNNs) and the second generation named Flex-RRAM features 1bit-to-8bit run-time configurable precision and leverages the input sparsity of the DNN model to improve the throughput and energy efficiency. However, the non-ideal characteristics of eNVM devices, especially when utilized as multi-level analog synaptic weights, may incur a notable accuracy degradation for both training and inference. This research develops a PyTorch based framework that incorporates the device characteristics into the DNN model to evaluate the impact of the eNVM nonidealities on training/inference accuracy. The results suggest that it is challenging to directly use eNVMs for in-situ training and resistance drift remains as a critical challenge to maintain a high inference accuracy. Furthermore, to overcome the challenges posed by the asymmetric conductance tuning behavior of typical eNVMs, which is found to be the most critical nonideality that prevents the model from achieving software equivalent training accuracy, this research proposes a novel 2-transistor-1-FeFET (ferroelectric field effect transistor) based synaptic weight cell that exploits hybrid precision for in situ training and inference, which achieves near-software classification accuracy on MNIST and CIFAR-10 dataset.Ph.D

Scholarly Materials And Research @ Georgia Tech

Hardware-Friendly Model Compression techniques for Deep Learning Accelerators

Author: Karimzadeh Foroozan
Publication venue: Georgia Institute of Technology
Publication date: 10/01/2023
Field of study

The objective of the proposed research is to introduce solutions to make energy-efficient \gls{dnn} accelerators to be deployable on edge devices through developing hardware-aware \gls{dnn} compression methods. The rising popularity of intelligent mobile devices and the computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. In particular, we proposed four compression techniques for energy and memory efficient \gls{dnn} computing. In the first method, \gls{lgps}, we present a hardware-aware pruning method where the locations of non-zero weights are derived in real-time from a \gls{lfsr}. Using the proposed method, we demonstrate a total saving of energy and area up to 63.96\% and 64.23\% for VGG-16 network on down-sampled ImageNet, respectively for iso-compression-rate and iso-accuracy. Secondly, We achieved ultra-low bit-precision deep learning model by developing a quantization scheme through knowledge distillation and gradual quantization for pruned network. Thirdly, we propose a novel model compression scheme that allows inference to be carried out using bit-level sparsity, which can be efficiently implemented using in-memory computing macros. We introduce a method called BitS-Net to leverage the benefits of bit-sparsity (where the number of zeros is more than number of ones in binary representation of weight/activation values) when applied to Compute-In-Memory (\gls{cim}) with Resistive Random Access Memory (\gls{rram}) to develop energy efficient \gls{dnn} accelerators operating in the inference mode. We demonstrate that BitS-Net improves the energy efficiency by up to 5x for ResNet models on the ImageNet dataset. In the last part, to achieve highly energy-efficient DNN, we introduce a novel twofold sparsity method (Twofold Sparsity, twofoldS-Net) to sparsify the DNN models in bit- and network-level, simultaneously. We added two separate regularizations to the loss function in order to achieve bit- and network-level sparsity at the same time. We sparsify the model in network-level, by adding a mask generated by \gls{lfsr}. For bit-level sparsity, we quantize the network to 8-bit representation in two's complement format. During inference we take advantage of \gls{cim} architecture and \gls{lfsr} indexing. We have shown that by using our proposed method we are able to sparsify the network and design a highly energy-efficient deep learning accelerator to eventually bring \gls{ai} to our daily lives.Ph.D

Scholarly Materials And Research @ Georgia Tech

2022 roadmap on neuromorphic computing and engineering

Author: Christensen Dennis V
Datta Suman
Dittmann Regina
et al
Feldmann Johannes
Grollier Julie
Indiveri Giacomo
Keene Scott T
Lanza Mario
Le Gallo Manuel
Liang Shi-Jun
Linares-Barranco Bernabe
Marković Danijela
Menzel Stephan
Miao Feng
Mikolajick Thomas
Milano Gianluca
Mizrahi Alice
Quill Tyler J
Redaelli Andrea
Ricciardi Carlo
Salleo Alberto
Sebastian Abu
Slesazeck Stefan
Spiga Sabina
Strachan John Paul
Valentian Alexandre
Valov Ilia
Vianello Elisa
Yang J Joshua
Yao Peng
Publication venue: 'IOP Publishing'
Publication date: 01/06/2022
Field of study

Modern computation based on von Neumann architecture is now a mature cutting-edge science. In the von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 10

^{18}

calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges for each research area. We hope that this roadmap will be a useful resource by providing a concise yet comprehensive introduction to readers outside this field, for those who are just entering the field, as well as providing future perspectives for those who are well established in the neuromorphic computing community

ZORA