Search CORE

46 research outputs found

Reconfigurable writing architecture for reliable RRAM operation in wide temperature ranges

Author: Aparicio Cerqueira Hernán
García Redondo Fernando
Ituero Herrero Pablo
López Barrio Carlos Alberto
López Vallejo Marisa
Royer del Barrio Pablo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Resistive switching memories [resistive RAM (RRAM)] are an attractive alternative to nonvolatile storage and nonconventional computing systems, but their behavior strongly depends on the cell features, driver circuit, and working conditions. In particular, the circuit temperature and writing voltage schemes become critical issues, determining resistive switching memories performance. These dependencies usually force a design time tradeoff among reliability, device endurance, and power consumption, thereby imposing nonflexible functioning schemes and limiting the system performance. In this paper, we present a writing architecture that ensures the correct operation no matter the working temperature and allows the dynamic load of application-oriented writing profiles. Thus, taking advantage of more efficient configurations, the system can be dynamically adapted to overcome RRAM intrinsic challenges. Several profiles are analyzed regarding power consumption, temperature-variations protection, and operation speed, showing speedups near 700x compared with other published drivers

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Improving Performance and Endurance for Crossbar Resistive Memory

Author: Wen Wen
Publication venue
Publication date: 28/09/2020
Field of study

Resistive Memory (ReRAM) has emerged as a promising non-volatile memory technology that may replace a significant portion of DRAM in future computer systems. When adopting crossbar architecture, ReRAM cell can achieve the smallest theoretical size in fabrication, ideally for constructing dense memory with large capacity. However, crossbar cell structure suffers from severe performance and endurance degradations, which come from large voltage drops on long wires. In this dissertation, I first study the correlation between the ReRAM cell switching latency and the number of cells in low resistant state (LRS) along bitlines, and propose to dynamically speed up write operations based on bitline data patterns. By leveraging the intrinsic in-memory processing capability of ReRAM crossbars, a low overhead runtime profiler that effectively tracks the data patterns in different bitlines is proposed. To achieve further write latency reduction, data compression and row address dependent memory data layout are employed to reduce the numbers of LRS cells on bitlines. Moreover, two optimization techniques are presented to mitigate energy overhead brought by bitline data patterns tracking. Second, I propose XWL, a novel table-based wear leveling scheme for ReRAM crossbars and study the correlation between write endurance and voltage stress in ReRAM crossbars. By estimating and tracking the effective write stress to different rows at runtime, XWL chooses the ones that are stressed the most to mitigate. Additionally, two extended scenarios are further examined for the performance and endurance issues in neural network accelerators as well as 3D vertical ReRAM (3D-VRAM) arrays. For the ReRAM crossbar-based accelerators, by exploiting the wearing out mechanism of ReRAM cell, a novel comprehensive framework, ReNEW, is proposed to enhance the lifetime of the ReRAM crossbar-based accelerators, particularly for neural network training. To reduce the write latency in 3D-VRAM arrays, a collection of techniques, including an in-memory data encoding scheme, a data pattern estimator for assessing cell resistance distributions, and a write time reduction scheme that opportunistically reduces RESET latency with runtime data patterns, are devised

D-Scholarship@Pitt

Thermal Heating in ReRAM Crossbar Arrays: Challenges and Solutions

Author: Eltawil Ahmed
Fouda Mohammed E.
Smagulova Kamilya
Publication venue
Publication date: 28/12/2022
Field of study

Increasing popularity of deep-learning-powered applications raises the issue of vulnerability of neural networks to adversarial attacks. In other words, hardly perceptible changes in input data lead to the output error in neural network hindering their utilization in applications that involve decisions with security risks. A number of previous works have already thoroughly evaluated the most commonly used configuration - Convolutional Neural Networks (CNNs) against different types of adversarial attacks. Moreover, recent works demonstrated transferability of the some adversarial examples across different neural network models. This paper studied robustness of the new emerging models such as SpinalNet-based neural networks and Compact Convolutional Transformers (CCT) on image classification problem of CIFAR-10 dataset. Each architecture was tested against four White-box attacks and three Black-box attacks. Unlike VGG and SpinalNet models, attention-based CCT configuration demonstrated large span between strong robustness and vulnerability to adversarial examples. Eventually, the study of transferability between VGG, VGG-inspired SpinalNet and pretrained CCT 7/3x1 models was conducted. It was shown that despite high effectiveness of the attack on the certain individual model, this does not guarantee the transferability to other models.Comment: 18 page

arXiv.org e-Print Archive

에너지 효율적 인공신경망 설계

Author: 김재현
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 최기영.최근 심층 학습은 이미지 분류, 음성 인식 및 강화 학습과 같은 영역에서 놀라운 성과를 거두고 있다. 최첨단 심층 인공신경망 중 일부는 이미 인간의 능력을 넘어선 성능을 보여주고 있다. 그러나 인공신경망은 엄청난 수의 고정밀 계산과 수백만개의 매개 변수를 이용하기 위한 빈번한 메모리 액세스를 수반한다. 이는 엄청난 칩 공간과 에너지 소모 문제를 야기하여 임베디드 시스템에서 인공신경망이 사용되는 것을 제한하게 된다. 이 문제를 해결하기 위해 인공신경망을 높은 에너지 효율성을 갖도록 설계하는 방법을 제안한다. 첫번째 파트에서는 가중 스파이크를 이용하여 짧은 추론 시간과 적은 에너지 소모의 장점을 갖는 스파이킹 인공신경망 설계 방법을 다룬다. 스파이킹 인공신경망은 인공신경망의 높은 에너지 소비 문제를 극복하기 위한 유망한 대안 중 하나이다. 기존 연구에서 심층 인공신경망을 정확도 손실없이 스파이킹 인공신경망으로 변환하는 방법이 발표되었다. 그러나 기존의 방법들은 rate coding을 사용하기 때문에 긴 추론 시간을 갖게 되고 이것이 많은 에너지 소모를 야기하게 되는 단점이 있다. 이 파트에서는 페이즈에 따라 다른 스파이크 가중치를 부여하는 방법으로 추론 시간을 크게 줄이는 방법을 제안한다. MNIST, SVHN, CIFAR-10, CIFAR-100 데이터셋에서의 실험 결과는 제안된 방법을 이용한 스파이킹 인공신경망이 기존 방법에 비해 큰 폭으로 추론 시간과 스파이크 발생 빈도를 줄여서 보다 에너지 효율적으로 동작함을 보여준다. 두번째 파트에서는 공정 변이가 있는 상황에서 동작하는 고에너지효율 아날로그 인공신경망 설계 방법을 다루고 있다. 인공신경망을 아날로그 회로를 사용하여 구현하면 높은 병렬성과 에너지 효율성을 얻을 수 있는 장점이 있다. 하지만, 아날로그 시스템은 노이즈에 취약한 중대한 결점을 가지고 있다. 이러한 노이즈 중 하나로 공정 변이를 들 수 있는데, 이는 아날로그 회로의 적정 동작 지점을 변화시켜 심각한 성능 저하 또는 오동작을 유발하는 원인이다. 이 파트에서는 ReRAM에 기반한 고에너지 효율 아날로그 이진 인공신경망을 구현하고, 공정 변이 문제를 해결하기 위해 활성도 일치 방법을 사용한 공정 변이 보상 기법을 제안한다. 제안된 인공신경망은 1T1R 구조의 ReRAM 배열과 차동증폭기를 이용한 뉴런을 이용하여 고밀도 집적과 고에너지 효율 동작이 가능하게 구성되었다. 또한, 아날로그 뉴런 회로의 공정 변이 취약성 문제를 해결하기 위해 이상적인 뉴런의 활성도와 동일한 활성도를 갖도록 뉴런의 바이어스를 조절하는 방법을 소개한다. 제안된 방법을 사용하여 32nm 공정에서 구현된 인공신경망은 3-sigma 지점에서 50% 문턱 전압 변이와 15%의 저항값 변이가 있는 상황에서도 MNIST에서 98.55%, CIFAR-10에서 89.63%의 정확도를 달성하였으며, 970 TOPS/W에 달하는 매우 높은 에너지 효율성을 달성하였다.Recently, deep learning has shown astounding performances on specific tasks such as image classification, speech recognition, and reinforcement learning. Some of the state-of-the-art deep neural networks have already gone over humans ability. However, neural networks involve tremendous number of high precision computations and frequent off-chip memory accesses with millions of parameters. It incurs problems of large area and exploding energy consumption, which hinder neural networks from being exploited in embedded systems. To cope with the problem, techniques for designing energy efficient neural networks are proposed. The first part of this dissertation addresses the design of spiking neural networks with weighted spikes which has advantages of shorter inference latency and smaller energy consumption compared to the conventional spiking neural networks. Spiking neural networks are being regarded as one of the promising alternative techniques to overcome the high energy costs of artificial neural networks. It is supported by many researches showing that a deep convolutional neural network can be converted into a spiking neural network with near zero accuracy loss. However, the advantage on energy consumption of spiking neural networks comes at a cost of long classification latency due to the use of Poisson-distributed spike trains (rate coding), especially in deep networks. We propose to use weighted spikes, which can greatly reduce the latency by assigning a different weight to a spike depending on which time phase it belongs. Experimental results on MNIST, SVHN, CIFAR-10, and CIFAR-100 show that the proposed spiking neural networks with weighted spikes achieve significant reduction in classification latency and number of spikes, which leads to faster and more energy-efficient spiking neural networks than the conventional spiking neural networks with rate coding. We also show that one of the state-of-the-art networks the deep residual network can be converted into spiking neural network without accuracy loss. The second part of this dissertation focuses on the design of highly energy-efficient analog neural networks in the presence of variations. Analog hardware accelerators for deep neural networks have taken center stage in the aspect of high parallelism and energy efficiency. However, a critical weakness of the analog hardware systems is vulnerability to noise. One of the biggest noise sources is a process variation. It is a big obstacle to using analog circuits since the variation shifts various parameters of analog circuits from the correct operating points, which causes severe performance degradation or even malfunction. To achieve high energy efficiency with analog neural networks, we propose resistive random access memory (ReRAM) based analog implementation of binarized neural networks (BNNs) with a novel variation compensation technique through activation matching (VCAM). The proposed architecture consists of 1-transistor-1-resistor (1T1R) structured ReRAM synaptic arrays and differential amplifier based neurons, which leads to high-density integration and energy efficiency. To cope with the vulnerability of analog neurons due to process variation, the biases of all neurons are adjusted in the direction that matches average output activation of ideal neurons without variation. The technique effectively restores the classification accuracy degraded by the variation. Experimental results on 32nm technology show that the proposed architecture achieves the classification accuracy of 98.55% on MNIST and 89.63% on CIFAR-10 in the presence of 50% threshold voltage variation and 15% resistance variation at 3-sigma point. It also achieves 970 TOPS/W energy efficiency with MLP on MNIST.1 Introduction 1 1.1 Deep Neural Networks with Weighted Spikes . . . . . . . . . . . . . 2 1.2 VCAM: Variation Compensation through Activation Matching for Analog Binarized Neural Networks . . . . . . . . . . . . . . . . . . . . . 5 2 Background 8 2.1 Spiking neural network . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Spiking neuron model . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Rate coding in SNNs . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Binarized neural networks . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Resistive random access memory . . . . . . . . . . . . . . . . . . . . 18 3 RelatedWork 22 3.1 Training SNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 SNNs with various spike coding schemes . . . . . . . . . . . . . . . 25 3.3 BNN implementations . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Deep Neural Networks withWeighted Spikes 33 4.1 SNN with weighted spikes . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.1 Weighted spikes . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.2 Spiking neuron model for weighted spikes . . . . . . . . . . . 35 4.1.3 Noise spike . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.4 Approximation of the ReLU activation . . . . . . . . . . . . 39 4.1.5 ANN-to-SNN conversion . . . . . . . . . . . . . . . . . . . . 41 4.2 Optimization techniques . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.1 Skipping initial input currents in the output layer . . . . . . . 45 4.2.2 The number of phases in a period . . . . . . . . . . . . . . . 47 4.2.3 Accuracy-energy trade-off by early decision . . . . . . . . . . 50 4.2.4 Consideration on hardware implementation . . . . . . . . . . 52 4.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.1 Comparison between SNN-RC and SNN-WS . . . . . . . . . 56 4.4.2 Trade-off by early decision . . . . . . . . . . . . . . . . . . . 64 4.4.3 Comparison with other algorithms . . . . . . . . . . . . . . . 67 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5 VCAM: Variation Compensation through Activation Matching for Analog Binarized Neural Networks 71 5.1 Modification of Binarized Neural Network . . . . . . . . . . . . . . . 72 5.1.1 Binarized Neural Network . . . . . . . . . . . . . . . . . . . 72 5.1.2 Use of 0 and 1 Activations . . . . . . . . . . . . . . . . . . . 72 5.1.3 Removal of Batch Normalization Layer . . . . . . . . . . . . 73 5.2 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.1 ReRAM Synaptic Array . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Neuron Circuit . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.3 Issues with Neuron Circuit . . . . . . . . . . . . . . . . . . . 82 5.3 Variation Compensation . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.1 Variation Modeling . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.2 Impact of VT Variation . . . . . . . . . . . . . . . . . . . . . 87 5.3.3 Variation Compensation Techniques . . . . . . . . . . . . . . 88 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.2 Accuracy of the Modified BNN Algorithm . . . . . . . . . . 94 5.4.3 Variation Compensation . . . . . . . . . . . . . . . . . . . . 95 5.4.4 Performance Comparison . . . . . . . . . . . . . . . . . . . . 99 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6 Conclusion 102Docto

SNU Open Repository and Archive

ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

Author: González Antonio
Riera Marc
Sabri Mohammad
Publication venue
Publication date: 28/06/2023
Field of study

The primary operation in DNNs is the dot product of quantized input activations and weights. Prior works have proposed the design of memory-centric architectures based on the Processing-In-Memory (PIM) paradigm. Resistive RAM (ReRAM) technology is especially appealing for PIM-based DNN accelerators due to its high density to store weights, low leakage energy, low read latency, and high performance capabilities to perform the DNN dot-products massively in parallel within the ReRAM crossbars. However, the main bottleneck of these architectures is the energy-hungry analog-to-digital conversions (ADCs) required to perform analog computations in-ReRAM, which penalizes the efficiency and performance benefits of PIM. To improve energy-efficiency of in-ReRAM analog dot-product computations we present ReDy, a hardware accelerator that implements a ReRAM-centric Dynamic quantization scheme to take advantage of the bit serial streaming and processing of activations. The energy consumption of ReRAM-based DNN accelerators is directly proportional to the numerical precision of the input activations of each DNN layer. In particular, ReDy exploits that activations of CONV layers from Convolutional Neural Networks (CNNs), a subset of DNNs, are commonly grouped according to the size of their filters and the size of the ReRAM crossbars. Then, ReDy quantizes on-the-fly each group of activations with a different numerical precision based on a novel heuristic that takes into account the statistical distribution of each group. Overall, ReDy greatly reduces the activity of the ReRAM crossbars and the number of A/D conversions compared to an static 8-bit uniform quantization. We evaluate ReDy on a popular set of modern CNNs. On average, ReDy provides 13\% energy savings over an ISAAC-like accelerator with negligible accuracy loss and area overhead.Comment: 13 pages, 16 figures, 4 Table

arXiv.org e-Print Archive

Enabling Neuromorphic Computing for Artificial Intelligence with Hardware-Software Co-Design

Author: Chen Xiang
Li Bojing
Liu Chenchen
Zhong Duo
Publication venue: IntechOpen
Publication date: 15/06/2023
Field of study

In the last decade, neuromorphic computing was rebirthed with the emergence of novel nano-devices and hardware-software co-design approaches. With the fast advancement in algorithms for today’s artificial intelligence (AI) applications, deep neural networks (DNNs) have become the mainstream technology. It has been a new research trend to enable neuromorphic designs for DNNs computing with high computing efficiency in speed and energy. In this chapter, we will summarize the recent advances in neuromorphic computing hardware and system designs with non-volatile resistive access memory (ReRAM) devices. More specifically, we will discuss the ReRAM-based neuromorphic computing hardware and system implementations, hardware-software co-design approaches for quantized and sparse DNNs, and architecture designs

IntechOpen

Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

Author: Lammie Corey
Publication venue
Publication date: 01/01/2022
Field of study

Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

ResearchOnline at James Cook University

Facilitating Emerging Non-volatile Memories in Next-Generation Memory System Design: Architecture-Level and Application-Level Perspectives

Author: Chi Ping
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

This dissertation focuses on three types of emerging NVMs, spin-transfer torque RAM (STT-RAM), phase change memory (PCM), and metal-oxide resistive RAM (ReRAM). STT-RAM has been identified as the best replacement of SRAM to build large-scale and low-power on-chip caches and also an energy-efficient alternative to DRAM as main memory. PCM and ReRAM have been considered to be promising technologies for building future large-scale and low-power main memory systems. This dissertation investigates two aspects to facilitate them in next-generation memory system design, architecture-level and application-level perspectives. First, multi-level cell (MLC) STT-RAM based cache design is optimized by using data encoding and data compression. Second, MLC STT-RAM is utilized as persistent main memory for fast and energy-efficient local checkpointing. Third, the commonly used database indexing algorithm, B+tree, is redesigned to be NVM-friendly. Forth, a novel processing-in-memory architecture built on ReRAM based main memory is proposed to accelerate neural network applications

Ezid

eScholarship - University of California

Design of Resistive Synaptic Devices and Array Architectures for Neuromorphic Computing

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Over the past few decades, the silicon complementary-metal-oxide-semiconductor (CMOS) technology has been greatly scaled down to achieve higher performance, density and lower power consumption. As the device dimension is approaching its fundamental physical limit, there is an increasing demand for exploration of emerging devices with distinct operating principles from conventional CMOS. In recent years, many efforts have been devoted in the research of next-generation emerging non-volatile memory (eNVM) technologies, such as resistive random access memory (RRAM) and phase change memory (PCM), to replace conventional digital memories (e.g. SRAM) for implementation of synapses in large-scale neuromorphic computing systems. Essentially being compact and “analog”, these eNVM devices in a crossbar array can compute vector-matrix multiplication in parallel, significantly speeding up the machine/deep learning algorithms. However, non-ideal eNVM device and array properties may hamper the learning accuracy. To quantify their impact, the sparse coding algorithm was used as a starting point, where the strategies to remedy the accuracy loss were proposed, and the circuit-level design trade-offs were also analyzed. At architecture level, the parallel “pseudo-crossbar” array to prevent the write disturbance issue was presented. The peripheral circuits to support various parallel array architectures were also designed. One key component is the read circuit that employs the principle of integrate-and-fire neuron model to convert the analog column current to digital output. However, the read circuit is not area-efficient, which was proposed to be replaced with a compact two-terminal oscillation neuron device that exhibits metal-insulator-transition phenomenon. To facilitate the design exploration, a circuit-level macro simulator “NeuroSim” was developed in C++ to estimate the area, latency, energy and leakage power of various neuromorphic architectures. NeuroSim provides a wide variety of design options at the circuit/device level. NeuroSim can be used alone or as a supporting module to provide circuit-level performance estimation in neural network algorithms. A 2-layer multilayer perceptron (MLP) simulator with integration of NeuroSim was demonstrated to evaluate both the learning accuracy and circuit-level performance metrics for the online learning and offline classification, as well as to study the impact of eNVM reliability issues such as data retention and write endurance on the learning performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository