90 research outputs found

    High-Performance Ternary (4:2) Compressor Based on Capacitive Threshold Logic

    Get PDF
    This paper presents a ternary (4:2) compressor, which is an important component in multiplication. However, the structure differs from the binary counterpart since the ternary model does not require carry signals. The method of capacitive threshold logic (CTL) is used to achieve the output signals directly. Unlike the previously presented similar structure, the entire capacitor network is divided into two parts. This segregation results in higher reliability and robustness against unwanted process, voltage, and temperature (PVT) variations. Simulations are performed by HSPICE and 32nm CNFET technology. Simulation results demonstrate about 94% higher performance in terms of power-delay product (PDP) for the new design over the previous one

    State of the Art Computational Ternary Logic Currnent-Mode Circuits Based on CNTFET Technology

    Get PDF
    Computational operations are considered as a time-consuming and important operation in ALU. These circuits play major role in computational operation in processing unit. This paper presents new computational Ternary Current Mode Circuits including comparator, multiplexer, decoder, and exclusive OR by means of Carbon NanoTube Field Effect Transistors. The new designs rely on three major parts: 1) the input currents which are converted to voltage; 2) threshold detectors; and 3) the output current flow paths to generate the outputs. The designs have been simulated based on 32nm CNFTET using Synopsys Hspice simulator

    Cutting Edge Nanotechnology

    Get PDF
    The main purpose of this book is to describe important issues in various types of devices ranging from conventional transistors (opening chapters of the book) to molecular electronic devices whose fabrication and operation is discussed in the last few chapters of the book. As such, this book can serve as a guide for identifications of important areas of research in micro, nano and molecular electronics. We deeply acknowledge valuable contributions that each of the authors made in writing these excellent chapters

    Cryptography for Ultra-Low Power Devices

    Get PDF
    Ubiquitous computing describes the notion that computing devices will be everywhere: clothing, walls and floors of buildings, cars, forests, deserts, etc. Ubiquitous computing is becoming a reality: RFIDs are currently being introduced into the supply chain. Wireless distributed sensor networks (WSN) are already being used to monitor wildlife and to track military targets. Many more applications are being envisioned. For most of these applications some level of security is of utmost importance. Common to WSN and RFIDs are their severely limited power resources, which classify them as ultra-low power devices. Early sensor nodes used simple 8-bit microprocessors to implement basic communication, sensing and computing services. Security was an afterthought. The main power consumer is the RF-transceiver, or radio for short. In the past years specialized hardware for low-data rate and low-power radios has been developed. The new bottleneck are security services which employ computationally intensive cryptographic operations. Customized hardware implementations hold the promise of enabling security for severely power constrained devices. Most research groups are concerned with developing secure wireless communication protocols, others with designing efficient software implementations of cryptographic algorithms. There has not been a comprehensive study on hardware implementations of cryptographic algorithms tailored for ultra-low power applications. The goal of this dissertation is to develop a suite of cryptographic functions for authentication, encryption and integrity that is specifically fashioned to the needs of ultra-low power devices. This dissertation gives an introduction to the specific problems that security engineers face when they try to solve the seemingly contradictory challenge of providing lightweight cryptographic services that can perform on ultra-low power devices and shows an overview of our current work and its future direction

    Low Power Memory/Memristor Devices and Systems

    Get PDF
    This reprint focusses on achieving low-power computation using memristive devices. The topic was designed as a convenient reference point: it contains a mix of techniques starting from the fundamental manufacturing of memristive devices all the way to applications such as physically unclonable functions, and also covers perspectives on, e.g., in-memory computing, which is inextricably linked with emerging memory devices such as memristors. Finally, the reprint contains a few articles representing how other communities (from typical CMOS design to photonics) are fighting on their own fronts in the quest towards low-power computation, as a comparison with the memristor literature. We hope that readers will enjoy discovering the articles within

    Hybrid MOS and Single-Electron Transistor Architectures towards Arithmetic Applications

    Get PDF
    Metal-Oxide-Semiconductor Field-Effect Transistor (MOSFET) and Single-Electron Transistor (SET) hybrid architectures, which combine the merits of both MOSFET and SET, promise to be a practical implementation for nanometer-scale circuit design. In this thesis, we design arithmetic circuits, including adders and multipliers, using SET/MOS hybrid architectures with the goal of reducing circuit area and power dissipation and improving circuit reliability. Thanks to the Coulomb blockade oscillation characteristic of SET, the design of SET/MOS hybrid adders becomes very simple, and requires only a few transistors by using the proposed schemes of multiple-valued logic (MVL), phase modulation, and frequency modulation. The phase and frequency modulation schemes are also utilized for the design of multipliers. Two types of SET/MOS hybrid multipliers are presented in this thesis. One is the binary tree multiplier which adopts conventional tree structures with multi-input counters (or compressors) implemented with the phase modulation scheme. Compared to conventional CMOS tree multipliers, the area and power dissipation of the proposed multiplier are reduced by half. The other is the frequency modulated multiplier following a novel design methodology where the information is processed in the frequency domain. In this context, we explore the implicit frequency properties of SET, including both frequency gain and frequency mixing. The major merits of this type of multiplier include: a) simplicity of circuit structure, and b) high immunity against background charges within SET islands. Background charges are mainly induced by defects or impurities located within the oxide barriers, and cannot be entirely removed by today\u27s technology. Since these random charges deteriorate the circuit reliability, we investigate different circuit solutions, such as feedback structure and frequency modulation, in order to counteract this problem. The feedback represents an error detection and correction mechanism which offsets the background charge effect by applying an appropriate voltage through an additional gate of SET. The frequency modulation, on the other hand, exploits the fact that background charges only shift the phase of Coulomb blockade oscillation without changing its amplitude and periodicity. Therefore, SET/MOS hybrid adders and multipliers using the frequency modulation scheme exhibit the high immunity against these undesired charges

    ์—๋„ˆ์ง€ ํšจ์œจ์  ์ธ๊ณต์‹ ๊ฒฝ๋ง ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2019. 2. ์ตœ๊ธฐ์˜.์ตœ๊ทผ ์‹ฌ์ธต ํ•™์Šต์€ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ์Œ์„ฑ ์ธ์‹ ๋ฐ ๊ฐ•ํ™” ํ•™์Šต๊ณผ ๊ฐ™์€ ์˜์—ญ์—์„œ ๋†€๋ผ์šด ์„ฑ๊ณผ๋ฅผ ๊ฑฐ๋‘๊ณ  ์žˆ๋‹ค. ์ตœ์ฒจ๋‹จ ์‹ฌ์ธต ์ธ๊ณต์‹ ๊ฒฝ๋ง ์ค‘ ์ผ๋ถ€๋Š” ์ด๋ฏธ ์ธ๊ฐ„์˜ ๋Šฅ๋ ฅ์„ ๋„˜์–ด์„  ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ์—„์ฒญ๋‚œ ์ˆ˜์˜ ๊ณ ์ •๋ฐ€ ๊ณ„์‚ฐ๊ณผ ์ˆ˜๋ฐฑ๋งŒ๊ฐœ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋นˆ๋ฒˆํ•œ ๋ฉ”๋ชจ๋ฆฌ ์•ก์„ธ์Šค๋ฅผ ์ˆ˜๋ฐ˜ํ•œ๋‹ค. ์ด๋Š” ์—„์ฒญ๋‚œ ์นฉ ๊ณต๊ฐ„๊ณผ ์—๋„ˆ์ง€ ์†Œ๋ชจ ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•˜์—ฌ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์—์„œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์„ ์ œํ•œํ•˜๊ฒŒ ๋œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ๋†’์€ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ๊ฐ–๋„๋ก ์„ค๊ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ๋ฒˆ์งธ ํŒŒํŠธ์—์„œ๋Š” ๊ฐ€์ค‘ ์ŠคํŒŒ์ดํฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ์งง์€ ์ถ”๋ก  ์‹œ๊ฐ„๊ณผ ์ ์€ ์—๋„ˆ์ง€ ์†Œ๋ชจ์˜ ์žฅ์ ์„ ๊ฐ–๋Š” ์ŠคํŒŒ์ดํ‚น ์ธ๊ณต์‹ ๊ฒฝ๋ง ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃฌ๋‹ค. ์ŠคํŒŒ์ดํ‚น ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ๋†’์€ ์—๋„ˆ์ง€ ์†Œ๋น„ ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ์œ ๋งํ•œ ๋Œ€์•ˆ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ์—์„œ ์‹ฌ์ธต ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ์ •ํ™•๋„ ์†์‹ค์—†์ด ์ŠคํŒŒ์ดํ‚น ์ธ๊ณต์‹ ๊ฒฝ๋ง์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋ฐœํ‘œ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋“ค์€ rate coding์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ธด ์ถ”๋ก  ์‹œ๊ฐ„์„ ๊ฐ–๊ฒŒ ๋˜๊ณ  ์ด๊ฒƒ์ด ๋งŽ์€ ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ์•ผ๊ธฐํ•˜๊ฒŒ ๋˜๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ์ด ํŒŒํŠธ์—์„œ๋Š” ํŽ˜์ด์ฆˆ์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ์ŠคํŒŒ์ดํฌ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ถ”๋ก  ์‹œ๊ฐ„์„ ํฌ๊ฒŒ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. MNIST, SVHN, CIFAR-10, CIFAR-100 ๋ฐ์ดํ„ฐ์…‹์—์„œ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•œ ์ŠคํŒŒ์ดํ‚น ์ธ๊ณต์‹ ๊ฒฝ๋ง์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ํฐ ํญ์œผ๋กœ ์ถ”๋ก  ์‹œ๊ฐ„๊ณผ ์ŠคํŒŒ์ดํฌ ๋ฐœ์ƒ ๋นˆ๋„๋ฅผ ์ค„์—ฌ์„œ ๋ณด๋‹ค ์—๋„ˆ์ง€ ํšจ์œจ์ ์œผ๋กœ ๋™์ž‘ํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‘๋ฒˆ์งธ ํŒŒํŠธ์—์„œ๋Š” ๊ณต์ • ๋ณ€์ด๊ฐ€ ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ๋™์ž‘ํ•˜๋Š” ๊ณ ์—๋„ˆ์ง€ํšจ์œจ ์•„๋‚ ๋กœ๊ทธ ์ธ๊ณต์‹ ๊ฒฝ๋ง ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃจ๊ณ  ์žˆ๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ์•„๋‚ ๋กœ๊ทธ ํšŒ๋กœ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ˜„ํ•˜๋ฉด ๋†’์€ ๋ณ‘๋ ฌ์„ฑ๊ณผ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ํ•˜์ง€๋งŒ, ์•„๋‚ ๋กœ๊ทธ ์‹œ์Šคํ…œ์€ ๋…ธ์ด์ฆˆ์— ์ทจ์•ฝํ•œ ์ค‘๋Œ€ํ•œ ๊ฒฐ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋…ธ์ด์ฆˆ ์ค‘ ํ•˜๋‚˜๋กœ ๊ณต์ • ๋ณ€์ด๋ฅผ ๋“ค ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Š” ์•„๋‚ ๋กœ๊ทธ ํšŒ๋กœ์˜ ์ ์ • ๋™์ž‘ ์ง€์ ์„ ๋ณ€ํ™”์‹œ์ผœ ์‹ฌ๊ฐํ•œ ์„ฑ๋Šฅ ์ €ํ•˜ ๋˜๋Š” ์˜ค๋™์ž‘์„ ์œ ๋ฐœํ•˜๋Š” ์›์ธ์ด๋‹ค. ์ด ํŒŒํŠธ์—์„œ๋Š” ReRAM์— ๊ธฐ๋ฐ˜ํ•œ ๊ณ ์—๋„ˆ์ง€ ํšจ์œจ ์•„๋‚ ๋กœ๊ทธ ์ด์ง„ ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ๊ตฌํ˜„ํ•˜๊ณ , ๊ณต์ • ๋ณ€์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ™œ์„ฑ๋„ ์ผ์น˜ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ณต์ • ๋ณ€์ด ๋ณด์ƒ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ 1T1R ๊ตฌ์กฐ์˜ ReRAM ๋ฐฐ์—ด๊ณผ ์ฐจ๋™์ฆํญ๊ธฐ๋ฅผ ์ด์šฉํ•œ ๋‰ด๋Ÿฐ์„ ์ด์šฉํ•˜์—ฌ ๊ณ ๋ฐ€๋„ ์ง‘์ ๊ณผ ๊ณ ์—๋„ˆ์ง€ ํšจ์œจ ๋™์ž‘์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ๊ตฌ์„ฑ๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ์•„๋‚ ๋กœ๊ทธ ๋‰ด๋Ÿฐ ํšŒ๋กœ์˜ ๊ณต์ • ๋ณ€์ด ์ทจ์•ฝ์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ด์ƒ์ ์ธ ๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑ๋„์™€ ๋™์ผํ•œ ํ™œ์„ฑ๋„๋ฅผ ๊ฐ–๋„๋ก ๋‰ด๋Ÿฐ์˜ ๋ฐ”์ด์–ด์Šค๋ฅผ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ 32nm ๊ณต์ •์—์„œ ๊ตฌํ˜„๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ 3-sigma ์ง€์ ์—์„œ 50% ๋ฌธํ„ฑ ์ „์•• ๋ณ€์ด์™€ 15%์˜ ์ €ํ•ญ๊ฐ’ ๋ณ€์ด๊ฐ€ ์žˆ๋Š” ์ƒํ™ฉ์—์„œ๋„ MNIST์—์„œ 98.55%, CIFAR-10์—์„œ 89.63%์˜ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€์œผ๋ฉฐ, 970 TOPS/W์— ๋‹ฌํ•˜๋Š” ๋งค์šฐ ๋†’์€ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค.Recently, deep learning has shown astounding performances on specific tasks such as image classification, speech recognition, and reinforcement learning. Some of the state-of-the-art deep neural networks have already gone over humans ability. However, neural networks involve tremendous number of high precision computations and frequent off-chip memory accesses with millions of parameters. It incurs problems of large area and exploding energy consumption, which hinder neural networks from being exploited in embedded systems. To cope with the problem, techniques for designing energy efficient neural networks are proposed. The first part of this dissertation addresses the design of spiking neural networks with weighted spikes which has advantages of shorter inference latency and smaller energy consumption compared to the conventional spiking neural networks. Spiking neural networks are being regarded as one of the promising alternative techniques to overcome the high energy costs of artificial neural networks. It is supported by many researches showing that a deep convolutional neural network can be converted into a spiking neural network with near zero accuracy loss. However, the advantage on energy consumption of spiking neural networks comes at a cost of long classification latency due to the use of Poisson-distributed spike trains (rate coding), especially in deep networks. We propose to use weighted spikes, which can greatly reduce the latency by assigning a different weight to a spike depending on which time phase it belongs. Experimental results on MNIST, SVHN, CIFAR-10, and CIFAR-100 show that the proposed spiking neural networks with weighted spikes achieve significant reduction in classification latency and number of spikes, which leads to faster and more energy-efficient spiking neural networks than the conventional spiking neural networks with rate coding. We also show that one of the state-of-the-art networks the deep residual network can be converted into spiking neural network without accuracy loss. The second part of this dissertation focuses on the design of highly energy-efficient analog neural networks in the presence of variations. Analog hardware accelerators for deep neural networks have taken center stage in the aspect of high parallelism and energy efficiency. However, a critical weakness of the analog hardware systems is vulnerability to noise. One of the biggest noise sources is a process variation. It is a big obstacle to using analog circuits since the variation shifts various parameters of analog circuits from the correct operating points, which causes severe performance degradation or even malfunction. To achieve high energy efficiency with analog neural networks, we propose resistive random access memory (ReRAM) based analog implementation of binarized neural networks (BNNs) with a novel variation compensation technique through activation matching (VCAM). The proposed architecture consists of 1-transistor-1-resistor (1T1R) structured ReRAM synaptic arrays and differential amplifier based neurons, which leads to high-density integration and energy efficiency. To cope with the vulnerability of analog neurons due to process variation, the biases of all neurons are adjusted in the direction that matches average output activation of ideal neurons without variation. The technique effectively restores the classification accuracy degraded by the variation. Experimental results on 32nm technology show that the proposed architecture achieves the classification accuracy of 98.55% on MNIST and 89.63% on CIFAR-10 in the presence of 50% threshold voltage variation and 15% resistance variation at 3-sigma point. It also achieves 970 TOPS/W energy efficiency with MLP on MNIST.1 Introduction 1 1.1 Deep Neural Networks with Weighted Spikes . . . . . . . . . . . . . 2 1.2 VCAM: Variation Compensation through Activation Matching for Analog Binarized Neural Networks . . . . . . . . . . . . . . . . . . . . . 5 2 Background 8 2.1 Spiking neural network . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Spiking neuron model . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Rate coding in SNNs . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Binarized neural networks . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Resistive random access memory . . . . . . . . . . . . . . . . . . . . 18 3 RelatedWork 22 3.1 Training SNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 SNNs with various spike coding schemes . . . . . . . . . . . . . . . 25 3.3 BNN implementations . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Deep Neural Networks withWeighted Spikes 33 4.1 SNN with weighted spikes . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.1 Weighted spikes . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.2 Spiking neuron model for weighted spikes . . . . . . . . . . . 35 4.1.3 Noise spike . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.4 Approximation of the ReLU activation . . . . . . . . . . . . 39 4.1.5 ANN-to-SNN conversion . . . . . . . . . . . . . . . . . . . . 41 4.2 Optimization techniques . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.1 Skipping initial input currents in the output layer . . . . . . . 45 4.2.2 The number of phases in a period . . . . . . . . . . . . . . . 47 4.2.3 Accuracy-energy trade-off by early decision . . . . . . . . . . 50 4.2.4 Consideration on hardware implementation . . . . . . . . . . 52 4.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.1 Comparison between SNN-RC and SNN-WS . . . . . . . . . 56 4.4.2 Trade-off by early decision . . . . . . . . . . . . . . . . . . . 64 4.4.3 Comparison with other algorithms . . . . . . . . . . . . . . . 67 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5 VCAM: Variation Compensation through Activation Matching for Analog Binarized Neural Networks 71 5.1 Modification of Binarized Neural Network . . . . . . . . . . . . . . . 72 5.1.1 Binarized Neural Network . . . . . . . . . . . . . . . . . . . 72 5.1.2 Use of 0 and 1 Activations . . . . . . . . . . . . . . . . . . . 72 5.1.3 Removal of Batch Normalization Layer . . . . . . . . . . . . 73 5.2 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.1 ReRAM Synaptic Array . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Neuron Circuit . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.3 Issues with Neuron Circuit . . . . . . . . . . . . . . . . . . . 82 5.3 Variation Compensation . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.1 Variation Modeling . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.2 Impact of VT Variation . . . . . . . . . . . . . . . . . . . . . 87 5.3.3 Variation Compensation Techniques . . . . . . . . . . . . . . 88 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.2 Accuracy of the Modified BNN Algorithm . . . . . . . . . . 94 5.4.3 Variation Compensation . . . . . . . . . . . . . . . . . . . . 95 5.4.4 Performance Comparison . . . . . . . . . . . . . . . . . . . . 99 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6 Conclusion 102Docto

    Design of Special Function Units in Modern Microprocessors

    Get PDF
    Todayโ€™s computing systems demand high performance for applications such as cloud computing, web-based search engines, network applications, and social media tasks. Such software applications involve an extensive use of hashing and arithmetic operations in their computation. In this thesis, we explore the use of new special function units (SFUs) for modern microprocessors, to accelerate such workloads. First, we design an SFU for hashing. Hashing can reduce the complexity of search and lookup from O(p) to O(p/n), where n bins are used and p items are being processed. In modern microprocessors, hashing is done in software. In our work, we propose a novel hardware hash unit design for use in modern microprocessors. Since the hash unit is designed at the hardware level, several advantages are obtained by our approach. First, a hardware-based hash unit executes a single hash instruction to perform a hash operation. In a software-based hashing in modern microprocessors, a hash operation is compiled into multiple instructions, thereby degrading performance. Second, software-based hashing stores hash data in a DRAM (also, hash operation entries can be stored in one of the cache levels). In a hardware-based hash unit, hash data is stored in a dedicated memory module (a hardware hash table), which improves performance. Third, todayโ€™s operating systems execute multiple applications (processes) in parallel, which entail high memory utilization. Hence the operating systems require many context switching between different processes, which results in many cache misses. In a hardware-based hash unit, the cache misses is reduced significantly using the dedicated memory module (hash table). These advantages all reduce the power consumption and increase the overall system performance significantly with a minimal increase in the microprocessorโ€™s die area. We evaluate our hardware-based hash unit and compare its performance with software-based hashing. We start by evaluating our design approach at the micro-architecture level in terms of system performance. After that, we design our approach at the circuit level design to obtain the area overhead. Also, we analyze our designโ€™s power and delay for each hash operation. These results are compared with a traditional hashing implementation. Then, we present an FPGA-based coprocessor for hash unit acceleration, applied to a virus checking application. Second, we present an SFU to speed up arithmetic operations. We call this arithmetic SFU a programmable arithmetic unit (PAU). In modern microprocessors, applications that require heavy arithmetic computations are done in software. To improve the performance for such computations, we present a programmable arithmetic unit (PAU), a partially reconfigurable methodology for arithmetic applications. The PAU consists of a set of IP blocks connected to a reconfigurable FPGA controller via a fast mesh-based interconnect. The IP blocks in the PAU can be any IP block such as adders, subtractors, multipliers, comparators and sign extension units. The PAU can have one or more copies of the same IP block (for example, 5 adders and 7 multipliers). The FPGA controller is an on-chip FPGA-based reconfigurable control fabric. The FPGA controller enables different arithmetic applications to be embedded on the PAU. The FPGA controller is programmed for different applications. The reconfigurable logic is based on a LUT-based design like a traditional FPGA. The FPGA controller and the IP blocks in the PAU communicate via a high speed ring data fabric. In our work, we use the PAU as an SFU in modern microprocessors. We compare the performance of different hardware-based arithmetic applications in the PAU with software-based implementations in modern microprocessors
    • โ€ฆ
    corecore