46 research outputs found

    Reconfigurable writing architecture for reliable RRAM operation in wide temperature ranges

    Get PDF
    Resistive switching memories [resistive RAM (RRAM)] are an attractive alternative to nonvolatile storage and nonconventional computing systems, but their behavior strongly depends on the cell features, driver circuit, and working conditions. In particular, the circuit temperature and writing voltage schemes become critical issues, determining resistive switching memories performance. These dependencies usually force a design time tradeoff among reliability, device endurance, and power consumption, thereby imposing nonflexible functioning schemes and limiting the system performance. In this paper, we present a writing architecture that ensures the correct operation no matter the working temperature and allows the dynamic load of application-oriented writing profiles. Thus, taking advantage of more efficient configurations, the system can be dynamically adapted to overcome RRAM intrinsic challenges. Several profiles are analyzed regarding power consumption, temperature-variations protection, and operation speed, showing speedups near 700x compared with other published drivers

    Improving Performance and Endurance for Crossbar Resistive Memory

    Get PDF
    Resistive Memory (ReRAM) has emerged as a promising non-volatile memory technology that may replace a significant portion of DRAM in future computer systems. When adopting crossbar architecture, ReRAM cell can achieve the smallest theoretical size in fabrication, ideally for constructing dense memory with large capacity. However, crossbar cell structure suffers from severe performance and endurance degradations, which come from large voltage drops on long wires. In this dissertation, I first study the correlation between the ReRAM cell switching latency and the number of cells in low resistant state (LRS) along bitlines, and propose to dynamically speed up write operations based on bitline data patterns. By leveraging the intrinsic in-memory processing capability of ReRAM crossbars, a low overhead runtime profiler that effectively tracks the data patterns in different bitlines is proposed. To achieve further write latency reduction, data compression and row address dependent memory data layout are employed to reduce the numbers of LRS cells on bitlines. Moreover, two optimization techniques are presented to mitigate energy overhead brought by bitline data patterns tracking. Second, I propose XWL, a novel table-based wear leveling scheme for ReRAM crossbars and study the correlation between write endurance and voltage stress in ReRAM crossbars. By estimating and tracking the effective write stress to different rows at runtime, XWL chooses the ones that are stressed the most to mitigate. Additionally, two extended scenarios are further examined for the performance and endurance issues in neural network accelerators as well as 3D vertical ReRAM (3D-VRAM) arrays. For the ReRAM crossbar-based accelerators, by exploiting the wearing out mechanism of ReRAM cell, a novel comprehensive framework, ReNEW, is proposed to enhance the lifetime of the ReRAM crossbar-based accelerators, particularly for neural network training. To reduce the write latency in 3D-VRAM arrays, a collection of techniques, including an in-memory data encoding scheme, a data pattern estimator for assessing cell resistance distributions, and a write time reduction scheme that opportunistically reduces RESET latency with runtime data patterns, are devised

    Thermal Heating in ReRAM Crossbar Arrays: Challenges and Solutions

    Full text link
    Increasing popularity of deep-learning-powered applications raises the issue of vulnerability of neural networks to adversarial attacks. In other words, hardly perceptible changes in input data lead to the output error in neural network hindering their utilization in applications that involve decisions with security risks. A number of previous works have already thoroughly evaluated the most commonly used configuration - Convolutional Neural Networks (CNNs) against different types of adversarial attacks. Moreover, recent works demonstrated transferability of the some adversarial examples across different neural network models. This paper studied robustness of the new emerging models such as SpinalNet-based neural networks and Compact Convolutional Transformers (CCT) on image classification problem of CIFAR-10 dataset. Each architecture was tested against four White-box attacks and three Black-box attacks. Unlike VGG and SpinalNet models, attention-based CCT configuration demonstrated large span between strong robustness and vulnerability to adversarial examples. Eventually, the study of transferability between VGG, VGG-inspired SpinalNet and pretrained CCT 7/3x1 models was conducted. It was shown that despite high effectiveness of the attack on the certain individual model, this does not guarantee the transferability to other models.Comment: 18 page

    ์—๋„ˆ์ง€ ํšจ์œจ์  ์ธ๊ณต์‹ ๊ฒฝ๋ง ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2019. 2. ์ตœ๊ธฐ์˜.์ตœ๊ทผ ์‹ฌ์ธต ํ•™์Šต์€ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ์Œ์„ฑ ์ธ์‹ ๋ฐ ๊ฐ•ํ™” ํ•™์Šต๊ณผ ๊ฐ™์€ ์˜์—ญ์—์„œ ๋†€๋ผ์šด ์„ฑ๊ณผ๋ฅผ ๊ฑฐ๋‘๊ณ  ์žˆ๋‹ค. ์ตœ์ฒจ๋‹จ ์‹ฌ์ธต ์ธ๊ณต์‹ ๊ฒฝ๋ง ์ค‘ ์ผ๋ถ€๋Š” ์ด๋ฏธ ์ธ๊ฐ„์˜ ๋Šฅ๋ ฅ์„ ๋„˜์–ด์„  ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ์—„์ฒญ๋‚œ ์ˆ˜์˜ ๊ณ ์ •๋ฐ€ ๊ณ„์‚ฐ๊ณผ ์ˆ˜๋ฐฑ๋งŒ๊ฐœ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋นˆ๋ฒˆํ•œ ๋ฉ”๋ชจ๋ฆฌ ์•ก์„ธ์Šค๋ฅผ ์ˆ˜๋ฐ˜ํ•œ๋‹ค. ์ด๋Š” ์—„์ฒญ๋‚œ ์นฉ ๊ณต๊ฐ„๊ณผ ์—๋„ˆ์ง€ ์†Œ๋ชจ ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•˜์—ฌ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์—์„œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์„ ์ œํ•œํ•˜๊ฒŒ ๋œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ๋†’์€ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ๊ฐ–๋„๋ก ์„ค๊ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ๋ฒˆ์งธ ํŒŒํŠธ์—์„œ๋Š” ๊ฐ€์ค‘ ์ŠคํŒŒ์ดํฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ์งง์€ ์ถ”๋ก  ์‹œ๊ฐ„๊ณผ ์ ์€ ์—๋„ˆ์ง€ ์†Œ๋ชจ์˜ ์žฅ์ ์„ ๊ฐ–๋Š” ์ŠคํŒŒ์ดํ‚น ์ธ๊ณต์‹ ๊ฒฝ๋ง ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃฌ๋‹ค. ์ŠคํŒŒ์ดํ‚น ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ๋†’์€ ์—๋„ˆ์ง€ ์†Œ๋น„ ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ์œ ๋งํ•œ ๋Œ€์•ˆ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ์—์„œ ์‹ฌ์ธต ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ์ •ํ™•๋„ ์†์‹ค์—†์ด ์ŠคํŒŒ์ดํ‚น ์ธ๊ณต์‹ ๊ฒฝ๋ง์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋ฐœํ‘œ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋“ค์€ rate coding์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ธด ์ถ”๋ก  ์‹œ๊ฐ„์„ ๊ฐ–๊ฒŒ ๋˜๊ณ  ์ด๊ฒƒ์ด ๋งŽ์€ ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ์•ผ๊ธฐํ•˜๊ฒŒ ๋˜๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ์ด ํŒŒํŠธ์—์„œ๋Š” ํŽ˜์ด์ฆˆ์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ์ŠคํŒŒ์ดํฌ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ถ”๋ก  ์‹œ๊ฐ„์„ ํฌ๊ฒŒ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. MNIST, SVHN, CIFAR-10, CIFAR-100 ๋ฐ์ดํ„ฐ์…‹์—์„œ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•œ ์ŠคํŒŒ์ดํ‚น ์ธ๊ณต์‹ ๊ฒฝ๋ง์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ํฐ ํญ์œผ๋กœ ์ถ”๋ก  ์‹œ๊ฐ„๊ณผ ์ŠคํŒŒ์ดํฌ ๋ฐœ์ƒ ๋นˆ๋„๋ฅผ ์ค„์—ฌ์„œ ๋ณด๋‹ค ์—๋„ˆ์ง€ ํšจ์œจ์ ์œผ๋กœ ๋™์ž‘ํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‘๋ฒˆ์งธ ํŒŒํŠธ์—์„œ๋Š” ๊ณต์ • ๋ณ€์ด๊ฐ€ ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ๋™์ž‘ํ•˜๋Š” ๊ณ ์—๋„ˆ์ง€ํšจ์œจ ์•„๋‚ ๋กœ๊ทธ ์ธ๊ณต์‹ ๊ฒฝ๋ง ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃจ๊ณ  ์žˆ๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ์•„๋‚ ๋กœ๊ทธ ํšŒ๋กœ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ˜„ํ•˜๋ฉด ๋†’์€ ๋ณ‘๋ ฌ์„ฑ๊ณผ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ํ•˜์ง€๋งŒ, ์•„๋‚ ๋กœ๊ทธ ์‹œ์Šคํ…œ์€ ๋…ธ์ด์ฆˆ์— ์ทจ์•ฝํ•œ ์ค‘๋Œ€ํ•œ ๊ฒฐ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋…ธ์ด์ฆˆ ์ค‘ ํ•˜๋‚˜๋กœ ๊ณต์ • ๋ณ€์ด๋ฅผ ๋“ค ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Š” ์•„๋‚ ๋กœ๊ทธ ํšŒ๋กœ์˜ ์ ์ • ๋™์ž‘ ์ง€์ ์„ ๋ณ€ํ™”์‹œ์ผœ ์‹ฌ๊ฐํ•œ ์„ฑ๋Šฅ ์ €ํ•˜ ๋˜๋Š” ์˜ค๋™์ž‘์„ ์œ ๋ฐœํ•˜๋Š” ์›์ธ์ด๋‹ค. ์ด ํŒŒํŠธ์—์„œ๋Š” ReRAM์— ๊ธฐ๋ฐ˜ํ•œ ๊ณ ์—๋„ˆ์ง€ ํšจ์œจ ์•„๋‚ ๋กœ๊ทธ ์ด์ง„ ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ๊ตฌํ˜„ํ•˜๊ณ , ๊ณต์ • ๋ณ€์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ™œ์„ฑ๋„ ์ผ์น˜ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ณต์ • ๋ณ€์ด ๋ณด์ƒ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ 1T1R ๊ตฌ์กฐ์˜ ReRAM ๋ฐฐ์—ด๊ณผ ์ฐจ๋™์ฆํญ๊ธฐ๋ฅผ ์ด์šฉํ•œ ๋‰ด๋Ÿฐ์„ ์ด์šฉํ•˜์—ฌ ๊ณ ๋ฐ€๋„ ์ง‘์ ๊ณผ ๊ณ ์—๋„ˆ์ง€ ํšจ์œจ ๋™์ž‘์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ๊ตฌ์„ฑ๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ์•„๋‚ ๋กœ๊ทธ ๋‰ด๋Ÿฐ ํšŒ๋กœ์˜ ๊ณต์ • ๋ณ€์ด ์ทจ์•ฝ์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ด์ƒ์ ์ธ ๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑ๋„์™€ ๋™์ผํ•œ ํ™œ์„ฑ๋„๋ฅผ ๊ฐ–๋„๋ก ๋‰ด๋Ÿฐ์˜ ๋ฐ”์ด์–ด์Šค๋ฅผ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ 32nm ๊ณต์ •์—์„œ ๊ตฌํ˜„๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ 3-sigma ์ง€์ ์—์„œ 50% ๋ฌธํ„ฑ ์ „์•• ๋ณ€์ด์™€ 15%์˜ ์ €ํ•ญ๊ฐ’ ๋ณ€์ด๊ฐ€ ์žˆ๋Š” ์ƒํ™ฉ์—์„œ๋„ MNIST์—์„œ 98.55%, CIFAR-10์—์„œ 89.63%์˜ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€์œผ๋ฉฐ, 970 TOPS/W์— ๋‹ฌํ•˜๋Š” ๋งค์šฐ ๋†’์€ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค.Recently, deep learning has shown astounding performances on specific tasks such as image classification, speech recognition, and reinforcement learning. Some of the state-of-the-art deep neural networks have already gone over humans ability. However, neural networks involve tremendous number of high precision computations and frequent off-chip memory accesses with millions of parameters. It incurs problems of large area and exploding energy consumption, which hinder neural networks from being exploited in embedded systems. To cope with the problem, techniques for designing energy efficient neural networks are proposed. The first part of this dissertation addresses the design of spiking neural networks with weighted spikes which has advantages of shorter inference latency and smaller energy consumption compared to the conventional spiking neural networks. Spiking neural networks are being regarded as one of the promising alternative techniques to overcome the high energy costs of artificial neural networks. It is supported by many researches showing that a deep convolutional neural network can be converted into a spiking neural network with near zero accuracy loss. However, the advantage on energy consumption of spiking neural networks comes at a cost of long classification latency due to the use of Poisson-distributed spike trains (rate coding), especially in deep networks. We propose to use weighted spikes, which can greatly reduce the latency by assigning a different weight to a spike depending on which time phase it belongs. Experimental results on MNIST, SVHN, CIFAR-10, and CIFAR-100 show that the proposed spiking neural networks with weighted spikes achieve significant reduction in classification latency and number of spikes, which leads to faster and more energy-efficient spiking neural networks than the conventional spiking neural networks with rate coding. We also show that one of the state-of-the-art networks the deep residual network can be converted into spiking neural network without accuracy loss. The second part of this dissertation focuses on the design of highly energy-efficient analog neural networks in the presence of variations. Analog hardware accelerators for deep neural networks have taken center stage in the aspect of high parallelism and energy efficiency. However, a critical weakness of the analog hardware systems is vulnerability to noise. One of the biggest noise sources is a process variation. It is a big obstacle to using analog circuits since the variation shifts various parameters of analog circuits from the correct operating points, which causes severe performance degradation or even malfunction. To achieve high energy efficiency with analog neural networks, we propose resistive random access memory (ReRAM) based analog implementation of binarized neural networks (BNNs) with a novel variation compensation technique through activation matching (VCAM). The proposed architecture consists of 1-transistor-1-resistor (1T1R) structured ReRAM synaptic arrays and differential amplifier based neurons, which leads to high-density integration and energy efficiency. To cope with the vulnerability of analog neurons due to process variation, the biases of all neurons are adjusted in the direction that matches average output activation of ideal neurons without variation. The technique effectively restores the classification accuracy degraded by the variation. Experimental results on 32nm technology show that the proposed architecture achieves the classification accuracy of 98.55% on MNIST and 89.63% on CIFAR-10 in the presence of 50% threshold voltage variation and 15% resistance variation at 3-sigma point. It also achieves 970 TOPS/W energy efficiency with MLP on MNIST.1 Introduction 1 1.1 Deep Neural Networks with Weighted Spikes . . . . . . . . . . . . . 2 1.2 VCAM: Variation Compensation through Activation Matching for Analog Binarized Neural Networks . . . . . . . . . . . . . . . . . . . . . 5 2 Background 8 2.1 Spiking neural network . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Spiking neuron model . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Rate coding in SNNs . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Binarized neural networks . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Resistive random access memory . . . . . . . . . . . . . . . . . . . . 18 3 RelatedWork 22 3.1 Training SNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 SNNs with various spike coding schemes . . . . . . . . . . . . . . . 25 3.3 BNN implementations . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Deep Neural Networks withWeighted Spikes 33 4.1 SNN with weighted spikes . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.1 Weighted spikes . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.2 Spiking neuron model for weighted spikes . . . . . . . . . . . 35 4.1.3 Noise spike . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.4 Approximation of the ReLU activation . . . . . . . . . . . . 39 4.1.5 ANN-to-SNN conversion . . . . . . . . . . . . . . . . . . . . 41 4.2 Optimization techniques . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.1 Skipping initial input currents in the output layer . . . . . . . 45 4.2.2 The number of phases in a period . . . . . . . . . . . . . . . 47 4.2.3 Accuracy-energy trade-off by early decision . . . . . . . . . . 50 4.2.4 Consideration on hardware implementation . . . . . . . . . . 52 4.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.1 Comparison between SNN-RC and SNN-WS . . . . . . . . . 56 4.4.2 Trade-off by early decision . . . . . . . . . . . . . . . . . . . 64 4.4.3 Comparison with other algorithms . . . . . . . . . . . . . . . 67 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5 VCAM: Variation Compensation through Activation Matching for Analog Binarized Neural Networks 71 5.1 Modification of Binarized Neural Network . . . . . . . . . . . . . . . 72 5.1.1 Binarized Neural Network . . . . . . . . . . . . . . . . . . . 72 5.1.2 Use of 0 and 1 Activations . . . . . . . . . . . . . . . . . . . 72 5.1.3 Removal of Batch Normalization Layer . . . . . . . . . . . . 73 5.2 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.1 ReRAM Synaptic Array . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Neuron Circuit . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.3 Issues with Neuron Circuit . . . . . . . . . . . . . . . . . . . 82 5.3 Variation Compensation . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.1 Variation Modeling . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.2 Impact of VT Variation . . . . . . . . . . . . . . . . . . . . . 87 5.3.3 Variation Compensation Techniques . . . . . . . . . . . . . . 88 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.2 Accuracy of the Modified BNN Algorithm . . . . . . . . . . 94 5.4.3 Variation Compensation . . . . . . . . . . . . . . . . . . . . 95 5.4.4 Performance Comparison . . . . . . . . . . . . . . . . . . . . 99 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6 Conclusion 102Docto

    ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

    Full text link
    The primary operation in DNNs is the dot product of quantized input activations and weights. Prior works have proposed the design of memory-centric architectures based on the Processing-In-Memory (PIM) paradigm. Resistive RAM (ReRAM) technology is especially appealing for PIM-based DNN accelerators due to its high density to store weights, low leakage energy, low read latency, and high performance capabilities to perform the DNN dot-products massively in parallel within the ReRAM crossbars. However, the main bottleneck of these architectures is the energy-hungry analog-to-digital conversions (ADCs) required to perform analog computations in-ReRAM, which penalizes the efficiency and performance benefits of PIM. To improve energy-efficiency of in-ReRAM analog dot-product computations we present ReDy, a hardware accelerator that implements a ReRAM-centric Dynamic quantization scheme to take advantage of the bit serial streaming and processing of activations. The energy consumption of ReRAM-based DNN accelerators is directly proportional to the numerical precision of the input activations of each DNN layer. In particular, ReDy exploits that activations of CONV layers from Convolutional Neural Networks (CNNs), a subset of DNNs, are commonly grouped according to the size of their filters and the size of the ReRAM crossbars. Then, ReDy quantizes on-the-fly each group of activations with a different numerical precision based on a novel heuristic that takes into account the statistical distribution of each group. Overall, ReDy greatly reduces the activity of the ReRAM crossbars and the number of A/D conversions compared to an static 8-bit uniform quantization. We evaluate ReDy on a popular set of modern CNNs. On average, ReDy provides 13\% energy savings over an ISAAC-like accelerator with negligible accuracy loss and area overhead.Comment: 13 pages, 16 figures, 4 Table

    Enabling Neuromorphic Computing for Artificial Intelligence with Hardware-Software Co-Design

    Get PDF
    In the last decade, neuromorphic computing was rebirthed with the emergence of novel nano-devices and hardware-software co-design approaches. With the fast advancement in algorithms for todayโ€™s artificial intelligence (AI) applications, deep neural networks (DNNs) have become the mainstream technology. It has been a new research trend to enable neuromorphic designs for DNNs computing with high computing efficiency in speed and energy. In this chapter, we will summarize the recent advances in neuromorphic computing hardware and system designs with non-volatile resistive access memory (ReRAM) devices. More specifically, we will discuss the ReRAM-based neuromorphic computing hardware and system implementations, hardware-software co-design approaches for quantized and sparse DNNs, and architecture designs

    Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

    Get PDF
    Corey Lammie designed mixed signal memristive-complementary metalโ€“oxideโ€“semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

    Facilitating Emerging Non-volatile Memories in Next-Generation Memory System Design: Architecture-Level and Application-Level Perspectives

    Get PDF
    This dissertation focuses on three types of emerging NVMs, spin-transfer torque RAM (STT-RAM), phase change memory (PCM), and metal-oxide resistive RAM (ReRAM). STT-RAM has been identified as the best replacement of SRAM to build large-scale and low-power on-chip caches and also an energy-efficient alternative to DRAM as main memory. PCM and ReRAM have been considered to be promising technologies for building future large-scale and low-power main memory systems. This dissertation investigates two aspects to facilitate them in next-generation memory system design, architecture-level and application-level perspectives. First, multi-level cell (MLC) STT-RAM based cache design is optimized by using data encoding and data compression. Second, MLC STT-RAM is utilized as persistent main memory for fast and energy-efficient local checkpointing. Third, the commonly used database indexing algorithm, B+tree, is redesigned to be NVM-friendly. Forth, a novel processing-in-memory architecture built on ReRAM based main memory is proposed to accelerate neural network applications

    Design of Resistive Synaptic Devices and Array Architectures for Neuromorphic Computing

    Get PDF
    abstract: Over the past few decades, the silicon complementary-metal-oxide-semiconductor (CMOS) technology has been greatly scaled down to achieve higher performance, density and lower power consumption. As the device dimension is approaching its fundamental physical limit, there is an increasing demand for exploration of emerging devices with distinct operating principles from conventional CMOS. In recent years, many efforts have been devoted in the research of next-generation emerging non-volatile memory (eNVM) technologies, such as resistive random access memory (RRAM) and phase change memory (PCM), to replace conventional digital memories (e.g. SRAM) for implementation of synapses in large-scale neuromorphic computing systems. Essentially being compact and โ€œanalogโ€, these eNVM devices in a crossbar array can compute vector-matrix multiplication in parallel, significantly speeding up the machine/deep learning algorithms. However, non-ideal eNVM device and array properties may hamper the learning accuracy. To quantify their impact, the sparse coding algorithm was used as a starting point, where the strategies to remedy the accuracy loss were proposed, and the circuit-level design trade-offs were also analyzed. At architecture level, the parallel โ€œpseudo-crossbarโ€ array to prevent the write disturbance issue was presented. The peripheral circuits to support various parallel array architectures were also designed. One key component is the read circuit that employs the principle of integrate-and-fire neuron model to convert the analog column current to digital output. However, the read circuit is not area-efficient, which was proposed to be replaced with a compact two-terminal oscillation neuron device that exhibits metal-insulator-transition phenomenon. To facilitate the design exploration, a circuit-level macro simulator โ€œNeuroSimโ€ was developed in C++ to estimate the area, latency, energy and leakage power of various neuromorphic architectures. NeuroSim provides a wide variety of design options at the circuit/device level. NeuroSim can be used alone or as a supporting module to provide circuit-level performance estimation in neural network algorithms. A 2-layer multilayer perceptron (MLP) simulator with integration of NeuroSim was demonstrated to evaluate both the learning accuracy and circuit-level performance metrics for the online learning and offline classification, as well as to study the impact of eNVM reliability issues such as data retention and write endurance on the learning performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
    corecore