Search CORE

4,324 research outputs found

Robust low-power digital circuit design in nano-CMOS technologies

Author: Azam Touqeer
Publication venue
Publication date: 01/01/2011
Field of study

Device scaling has resulted in large scale integrated, high performance, low-power, and low cost systems. However the move towards sub-100 nm technology nodes has increased variability in device characteristics due to large process variations. Variability has severe implications on digital circuit design by causing timing uncertainties in combinational circuits, degrading yield and reliability of memory elements, and increasing power density due to slow scaling of supply voltage. Conventional design methods add large pessimistic safety margins to mitigate increased variability, however, they incur large power and performance loss as the combination of worst cases occurs very rarely. In-situ monitoring of timing failures provides an opportunity to dynamically tune safety margins in proportion to on-chip variability that can significantly minimize power and performance losses. We demonstrated by simulations two delay sensor designs to detect timing failures in advance that can be coupled with different compensation techniques such as voltage scaling, body biasing, or frequency scaling to avoid actual timing failures. Our simulation results using 45 nm and 32 nm technology BSIM4 models indicate significant reduction in total power consumption under temperature and statistical variations. Future work involves using dual sensing to avoid useless voltage scaling that incurs a speed loss. SRAM cache is the first victim of increased process variations that requires handcrafted design to meet area, power, and performance requirements. We have proposed novel 6 transistors (6T), 7 transistors (7T), and 8 transistors (8T)-SRAM cells that enable variability tolerant and low-power SRAM cache designs. Increased sense-amplifier offset voltage due to device mismatch arising from high variability increases delay and power consumption of SRAM design. We have proposed two novel design techniques to reduce offset voltage dependent delays providing a high speed low-power SRAM design. Increasing leakage currents in nano-CMOS technologies pose a major challenge to a low-power reliable design. We have investigated novel segmented supply voltage architecture to reduce leakage power of the SRAM caches since they occupy bulk of the total chip area and power. Future work involves developing leakage reduction methods for the combination logic designs including SRAM peripherals

Glasgow Theses Service

OpenGrey Repository

근사 컴퓨팅을 이용한 회로 노화 보상과 에너지 효율적인 신경망 구현

Author: 김희수
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 이혁재.Approximate computing reduces the cost (energy and/or latency) of computations by relaxing the correctness (i.e., precision) of computations up to the level, which is dependent on types of applications. Moreover, it can be realized in various hierarchies of computing system design from circuit level to application level. This dissertation presents the methodologies applying approximate computing across such hierarchies; compensating aging-induced delay in logic circuit by dynamic computation approximation (Chapter 1), designing energy-efficient neural network by combining low-power and low-latency approximate neuron models (Chapter 2), and co-designing in-memory gradient descent module with neural processing unit so as to address a memory bottleneck incurred by memory I/O for high-precision data (Chapter 3). The first chapter of this dissertation presents a novel design methodology to turn the timing violation caused by aging into computation approximation error without the reliability guardband or increasing the supply voltage. It can be realized by accurately monitoring the critical path delay at run-time. The proposal is evaluated at two levels: RTL component level and system level. The experimental results at the RTL component level show a significant improvement in terms of (normalized) mean squared error caused by the timing violation and, at the system level, show that the proposed approach successfully transforms the aging-induced timing violation errors into much less harmful computation approximation errors, therefore it recovers image quality up to perceptually acceptable levels. It reduces the dynamic and static power consumption by 21.45% and 10.78%, respectively, with 0.8% area overhead compared to the conventional approach. The second chapter of this dissertation presents an energy-efficient neural network consisting of alternative neuron models; Stochastic-Computing (SC) and Spiking (SP) neuron models. SC has been adopted in various fields to improve the power efficiency of systems by performing arithmetic computations stochastically, which approximates binary computation in conventional computing systems. Moreover, a recent work showed that deep neural network (DNN) can be implemented in the manner of stochastic computing and it greatly reduces power consumption. However, Stochastic DNN (SC-DNN) suffers from problem of high latency as it processes only a bit per cycle. To address such problem, it is proposed to adopt Spiking DNN (SP-DNN) as an input interface for SC-DNN since SP effectively processes more bits per cycle than SC-DNN. Moreover, this chapter resolves the encoding mismatch problem, between two different neuron models, without hardware cost by compensating the encoding mismatch with synapse weight calibration. A resultant hybrid DNN (SPSC-DNN) consists of SP-DNN as bottom layers and SC-DNN as top layers. Exploiting the reduced latency from SP-DNN and low-power consumption from SC-DNN, the proposed SPSC-DNN achieves improved energy-efficiency with lower error-rate compared to SC-DNN and SP-DNN in same network configuration. The third chapter of this dissertation proposes GradPim architecture, which accelerates the parameter updates by in-memory processing which is codesigned with 8-bit floating-point training in Neural Processing Unit (NPU) for deep neural networks. By keeping the high precision processing algorithms in memory, such as the parameter update incorporating high-precision weights in its computation, the GradPim architecture can achieve high computational efficiency using 8-bit floating point in NPU and also gain power efficiency by eliminating massive high-precision data transfers between NPU and off-chip memory. A simple extension of DDR4 SDRAM utilizing bank-group parallelism makes the operation designs in processing-in-memory (PIM) module efficient in terms of hardware cost and performance. The experimental results show that the proposed architecture can improve the performance of the parameter update phase in the training by up to 40% and greatly reduce the memory bandwidth requirement while posing only a minimal amount of overhead to the protocol and the DRAM area.근사 컴퓨팅은 연산의 정확도의 손실을 어플리케이션 별 적절한 수준까지 허용함으로써 연산에 필요한 비용 (에너지나 지연시간)을 줄인다. 게다가, 근사 컴퓨팅은 컴퓨팅 시스템 설계의 회로 계층부터 어플리케이션 계층까지 다양한 계층에 적용될 수 있다. 본 논문에서는 근사 컴퓨팅 방법론을 다양한 시스템 설계의 계층에 적용하여 전력과 에너지 측면에서 이득을 얻을 수 있는 방법들을 제안하였다. 이는, 연산 근사화 (computation Approximation)를 통해 회로의 노화로 인해 증가된 지연시간을 추가적인 전력소모 없이 보상하는 방법과 (챕터 1), 근사 뉴런모델 (approximate neuron model)을 이용해 에너지 효율이 높은 신경망을 구성하는 방법 (챕터 2), 그리고 메모리 대역폭으로 인한 병목현상 문제를 높은 정확도 데이터를 활용한 연산을 메모리 내에서 수행함으로써 완화시키는 방법을 (챕터3) 제안하였다. 첫 번째 챕터는 회로의 노화로 인한 지연시간위반을 (timing violation) 설계마진이나 (reliability guardband) 공급전력의 증가 없이 연산오차 (computation approximation error)를 통해 보상하는 설계방법론 (design methodology)를 제안하였다. 이를 위해 주요경로의 (critical path) 지연시간을 동작시간에 정확하게 측정할 필요가 있다. 여기서 제안하는 방법론은 RTL component와 system 단계에서 평가되었다. RTL component 단계의 실험결과를 통해 제안한 방식이 표준화된 평균제곱오차를 (normalized mean squared error) 상당히 줄였음을 볼 수 있다. 그리고 system 단계에서는 이미지처리 시스템에서 이미지의 품질이 인지적으로 충분히 회복되는 것을 보임으로써 회로노화로 인해 발생한 지연시간위반 오차가 에러의 크기가 작은 연산오차로 변경되는 것을 확인 할 수 있었다. 결론적으로, 제안된 방법론을 따랐을 때 0.8%의 공간을 (area) 더 사용하는 비용을 지불하고 21.45%의 동적전력소모와 (dynamic power consumption) 10.78%의 정적전력소모의 (static power consumption) 감소를 달성할 수 있었다. 두 번째 챕터는 근사 뉴런모델을 활용하는 고-에너지효율의 신경망을 (neural network) 제안하였다. 본 논문에서 사용한 두 가지의 근사 뉴런모델은 확률컴퓨팅과 (stochastic computing) 스파이킹뉴런 (spiking neuron) 이론들을 기반으로 모델링되었다. 확률컴퓨팅은 산술연산들을 확률적으로 수행함으로써 이진연산을 낮은 전력소모로 수행한다. 최근에 확률컴퓨팅 뉴런모델을 이용하여 심층 신경망 (deep neural network)를 구현할 수 있다는 연구가 진행되었다. 그러나, 확률컴퓨팅을 뉴런모델링에 활용할 경우 심층신경망이 매 클락사이클마다 (clock cycle) 하나의 비트만을 (bit) 처리하므로, 지연시간 측면에서 매우 나쁠 수 밖에 없는 문제가 있다. 따라서 본 논문에서는 이러한 문제를 해결하기 위하여 스파이킹 뉴런모델로 구성된 스파이킹 심층신경망을 확률컴퓨팅을 활용한 심층신경망 구조와 결합하였다. 스파이킹 뉴런모델의 경우 매 클락사이클마다 여러 비트를 처리할 수 있으므로 심층신경망의 입력 인터페이스로 사용될 경우 지연시간을 줄일 수 있다. 하지만, 확률컴퓨팅 뉴런모델과 스파이킹 뉴런모델의 경우 부호화 (encoding) 방식이 다른 문제가 있다. 따라서 본 논문에서는 해당 부호화 불일치 문제를 모델의 파라미터를 학습할 때 고려함으로써, 파라미터들의 값이 부호화 불일치를 고려하여 조절 (calibration) 될 수 있도록 하여 문제를 해결하였다. 이러한 분석의 결과로, 앞 쪽에는 스파이킹 심층신경망을 배치하고 뒷 쪽애는 확률컴퓨팅 심층신경망을 배치하는 혼성신경망을 제안하였다. 혼성신경망은 스파이킹 심층신경망을 통해 매 클락사이클마다 처리되는 비트 양의 증가로 인한 지연시간 감소 효과와 확률컴퓨팅 심층신경망의 저전력 소모 특성을 모두 활용함으로써 각 심층신경망을 따로 사용하는 경우 대비 우수한 에너지 효율성을 비슷하거나 더 나은 정확도 결과를 내면서 달성한다. 세 번째 챕터는 심층신경망을 8비트 부동소숫점 연산으로 학습하는 신경망처리유닛의 (neural processing unit) 파라미터 갱신을 (parameter update) 메모리-내-연산으로 (in-memory processing) 가속하는 GradPIM 아키텍쳐를 제안하였다. GradPIM은 8비트의 낮은 정확도 연산은 신경망처리유닛에 남기고, 높은 정확도를 가지는 데이터를 활용하는 연산은 (파라미터 갱신) 메모리 내부에 둠으로써 신경망처리유닛과 메모리간의 데이터통신의 양을 줄여, 높은 연산효율과 전력효율을 달성하였다. 또한, GradPIM은 bank-group 수준의 병렬화를 이루어 내 높은 내부 대역폭을 활용함으로써 메모리 대역폭을 크게 확장시킬 수 있게 되었다. 또한 이러한 메모리 구조의 변경이 최소화되었기 때문에 추가적인 하드웨어 비용도 최소화되었다. 실험 결과를 통해 GradPIM이 최소한의 DRAM 프로토콜 변화와 DRAM칩 내의 공간사용을 통해 심층신경망 학습과정 중 파라미터 갱신에 필요한 시간을 40%만큼 향상시켰음을 보였다.Chapter I: Dynamic Computation Approximation for Aging Compensation 1 1.1 Introduction 1 1.1.1 Chip Reliability 1 1.1.2 Reliability Guardband 2 1.1.3 Approximate Computing in Logic Circuits 2 1.1.4 Computation approximation for Aging Compensation 3 1.1.5 Motivational Case Study 4 1.2 Previous Work 5 1.2.1 Aging-induced Delay 5 1.2.2 Delay-Configurable Circuits 6 1.3 Proposed System 8 1.3.1 Overview of the Proposed System 8 1.3.2 Proposed Adder 9 1.3.3 Proposed Multiplier 11 1.3.4 Proposed Monitoring Circuit 16 1.3.5 Aging Compensation Scheme 19 1.4 Design Methodology 20 1.5 Evaluation 24 1.5.1 Experimental setup 24 1.5.2 RTL component level Adder/Multiplier 27 1.5.3 RTL component level Monitoring circuit 30 1.5.4 System level 31 1.6 Summary 38 Chapter II: Energy-Efficient Neural Network by Combining Approximate Neuron Models 40 2.1 Introduction 40 2.1.1 Deep Neural Network (DNN) 40 2.1.2 Low-power designs for DNN 41 2.1.3 Stochastic-Computing Deep Neural Network 41 2.1.4 Spiking Deep Neural Network 43 2.2 Hybrid of Stochastic and Spiking DNNs 44 2.2.1 Stochastic-Computing vs Spiking Deep Neural Network 44 2.2.2 Combining Spiking Layers and Stochastic Layers 46 2.2.3 Encoding Mismatch 47 2.3 Evaluation 49 2.3.1 Latency and Test Error 49 2.3.2 Energy Efficiency 51 2.4 Summary 54 Chapter III: GradPIM: In-memory Gradient Descent in Mixed-Precision DNN Training 55 3.1 Introduction 55 3.1.1 Neural Processing Unit 55 3.1.2 Mixed-precision Training 56 3.1.3 Mixed-precision Training with In-memory Gradient Descent 57 3.1.4 DNN Parameter Update Algorithms 59 3.1.5 Modern DRAM Architecture 61 3.1.6 Motivation 63 3.2 Previous Work 65 3.2.1 Processing-In-Memory 65 3.2.2 Co-design Neural Processing Unit and Processing-In-Memory 66 3.2.3 Low-precision Computation in NPU 67 3.3 GradPIM 68 3.3.1 GradPIM Architecture 68 3.3.2 GradPIM Operations 69 3.3.3 Timing Considerations 70 3.3.4 Update Phase Procedure 73 3.3.5 Commanding GradPIM 75 3.4 NPU Co-design with GradPIM 76 3.4.1 NPU Architecture 76 3.4.2 Data Placement 79 3.5 Evaluation 82 3.5.1 Evaluation Methodology 82 3.5.2 Experimental Results 83 3.5.3 Sensitivity Analysis 88 3.5.4 Layer Characterizations 90 3.5.5 Distributed Data Parallelism 90 3.6 Summary 92 3.6.1 Discussion 92 Bibliography 113 요약 114Docto

SNU Open Repository and Archive

Adaptive voltage scaling in a heterogeneous FPGA device with memory and logic in-situ detectors

Author: Nunez-Yanez Jose
Publication venue: 'Elsevier BV'
Publication date: 01/06/2017
Field of study

Explore Bristol Research

Designing energy-efficient computing systems using equalization and machine learning

Author: Takhirov Zafar
Publication venue
Publication date: 20/02/2018
Field of study

As technology scaling slows down in the nanometer CMOS regime and mobile computing becomes more ubiquitous, designing energy-efficient hardware for mobile systems is becoming increasingly critical and challenging. Although various approaches like near-threshold computing (NTC), aggressive voltage scaling with shadow latches, etc. have been proposed to get the most out of limited battery life, there is still no “silver bullet” to increasing power-performance demands of the mobile systems. Moreover, given that a mobile system could operate in a variety of environmental conditions, like different temperatures, have varying performance requirements, etc., there is a growing need for designing tunable/reconfigurable systems in order to achieve energy-efficient operation. In this work we propose to address the energy- efficiency problem of mobile systems using two different approaches: circuit tunability and distributed adaptive algorithms. Inspired by the communication systems, we developed feedback equalization based digital logic that changes the threshold of its gates based on the input pattern. We showed that feedback equalization in static complementary CMOS logic enabled up to 20% reduction in energy dissipation while maintaining the performance metrics. We also achieved 30% reduction in energy dissipation for pass-transistor digital logic (PTL) with equalization while maintaining performance. In addition, we proposed a mechanism that leverages feedback equalization techniques to achieve near optimal operation of static complementary CMOS logic blocks over the entire voltage range from near threshold supply voltage to nominal supply voltage. Using energy-delay product (EDP) as a metric we analyzed the use of the feedback equalizer as part of various sequential computational blocks. Our analysis shows that for near-threshold voltage operation, when equalization was used, we can improve the operating frequency by up to 30%, while the energy increase was less than 15%, with an overall EDP reduction of ≈10%. We also observe an EDP reduction of close to 5% across entire above-threshold voltage range. On the distributed adaptive algorithm front, we explored energy-efficient hardware implementation of machine learning algorithms. We proposed an adaptive classifier that leverages the wide variability in data complexity to enable energy-efficient data classification operations for mobile systems. Our approach takes advantage of varying classification hardness across data to dynamically allocate resources and improve energy efficiency. On average, our adaptive classifier is ≈100× more energy efficient but has ≈1% higher error rate than a complex radial basis function classifier and is ≈10× less energy efficient but has ≈40% lower error rate than a simple linear classifier across a wide range of classification data sets. We also developed a field of groves (FoG) implementation of random forests (RF) that achieves an accuracy comparable to Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) under tight energy budgets. The FoG architecture takes advantage of the fact that in random forests a small portion of the weak classifiers (decision trees) might be sufficient to achieve high statistical performance. By dividing the random forest into smaller forests (Groves), and conditionally executing the rest of the forest, FoG is able to achieve much higher energy efficiency levels for comparable error rates. We also take advantage of the distributed nature of the FoG to achieve high level of parallelism. Our evaluation shows that at maximum achievable accuracies FoG consumes ≈1.48×, ≈24×, ≈2.5×, and ≈34.7× lower energy per classification compared to conventional RF, SVM-RBF , Multi-Layer Perceptron Network (MLP), and CNN, respectively. FoG is 6.5× less energy efficient than SVM-LR, but achieves 18% higher accuracy on average across all considered datasets

Boston University Institutional Repository (OpenBU)

Building Reservoir Computing Hardware Using Low Energy-Barrier Magnetics

Author: Abeed M. A.
Cai J.
Daniels M. W.
Ganguly S.
Ganguly S.
Jaeger
Jalalvand A.
Sengupta A.
Yamazaki
Zhang D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/07/2020
Field of study

Biologically inspired recurrent neural networks, such as reservoir computers are of interest in designing spatio-temporal data processors from a hardware point of view due to the simple learning scheme and deep connections to Kalman filters. In this work we discuss using in-depth simulation studies a way to construct hardware reservoir computers using an analog stochastic neuron cell built from a low energy-barrier magnet based magnetic tunnel junction and a few transistors. This allows us to implement a physical embodiment of the mathematical model of reservoir computers. Compact implementation of reservoir computers using such devices may enable building compact, energy-efficient signal processors for standalone or in-situ machine cognition in edge devices.Comment: To be presented at International Conference on Neuromorphic Systems 202

arXiv.org e-Print Archive

Crossref

Timing-Error Tolerance Techniques for Low-Power DSP: Filters and Transforms

Author: Whatmough PN
Publication venue: UCL (University College London)
Publication date: 28/09/2012
Field of study

Low-power Digital Signal Processing (DSP) circuits are critical to commercial System-on-Chip design for battery powered devices. Dynamic Voltage Scaling (DVS) of digital circuits can reclaim worst-case supply voltage margins for delay variation, reducing power consumption. However, removing static margins without compromising robustness is tremendously challenging, especially in an era of escalating reliability concerns due to continued process scaling. The Razor DVS scheme addresses these concerns, by ensuring robustness using explicit timing-error detection and correction circuits. Nonetheless, the design of low-complexity and low-power error correction is often challenging. In this thesis, the Razor framework is applied to fixed-precision DSP filters and transforms. The inherent error tolerance of many DSP algorithms is exploited to achieve very low-overhead error correction. Novel error correction schemes for DSP datapaths are proposed, with very low-overhead circuit realisations. Two new approximate error correction approaches are proposed. The first is based on an adapted sum-of-products form that prevents errors in intermediate results reaching the output, while the second approach forces errors to occur only in less significant bits of each result by shaping the critical path distribution. A third approach is described that achieves exact error correction using time borrowing techniques on critical paths. Unlike previously published approaches, all three proposed are suitable for high clock frequency implementations, as demonstrated with fully placed and routed FIR, FFT and DCT implementations in 90nm and 32nm CMOS. Design issues and theoretical modelling are presented for each approach, along with SPICE simulation results demonstrating power savings of 21 – 29%. Finally, the design of a baseband transmitter in 32nm CMOS for the Spectrally Efficient FDM (SEFDM) system is presented. SEFDM systems offer bandwidth savings compared to Orthogonal FDM (OFDM), at the cost of increased complexity and power consumption, which is quantified with the first VLSI architecture

UCL Discovery

Energy Efficient Neocortex-Inspired Systems with On-Device Learning

Author: Zyarah Abdullah M
Publication venue: RIT Scholar Works
Publication date: 01/09/2020
Field of study

Shifting the compute workloads from cloud toward edge devices can significantly improve the overall latency for inference and learning. On the contrary this paradigm shift exacerbates the resource constraints on the edge devices. Neuromorphic computing architectures, inspired by the neural processes, are natural substrates for edge devices. They offer co-located memory, in-situ training, energy efficiency, high memory density, and compute capacity in a small form factor. Owing to these features, in the recent past, there has been a rapid proliferation of hybrid CMOS/Memristor neuromorphic computing systems. However, most of these systems offer limited plasticity, target either spatial or temporal input streams, and are not demonstrated on large scale heterogeneous tasks. There is a critical knowledge gap in designing scalable neuromorphic systems that can support hybrid plasticity for spatio-temporal input streams on edge devices. This research proposes Pyragrid, a low latency and energy efficient neuromorphic computing system for processing spatio-temporal information natively on the edge. Pyragrid is a full-scale custom hybrid CMOS/Memristor architecture with analog computational modules and an underlying digital communication scheme. Pyragrid is designed for hierarchical temporal memory, a biomimetic sequence memory algorithm inspired by the neocortex. It features a novel synthetic synapses representation that enables dynamic synaptic pathways with reduced memory usage and interconnects. The dynamic growth in the synaptic pathways is emulated in the memristor device physical behavior, while the synaptic modulation is enabled through a custom training scheme optimized for area and power. Pyragrid features data reuse, in-memory computing, and event-driven sparse local computing to reduce data movement by ~44x and maximize system throughput and power efficiency by ~3x and ~161x over custom CMOS digital design. The innate sparsity in Pyragrid results in overall robustness to noise and device failure, particularly when processing visual input and predicting time series sequences. Porting the proposed system on edge devices can enhance their computational capability, response time, and battery life

RIT Scholar Works

Integrated Circuit Design for Radiation Sensing and Hardening.

Author: Kwon Inyong
Publication venue
Publication date
Field of study

Beyond the 1950s, integrated circuits have been widely used in a number of electronic devices surrounding people’s lives. In addition to computing electronics, scientific and medical equipment have also been undergone a metamorphosis, especially in radiation related fields where compact and precision radiation detection systems for nuclear power plants, positron emission tomography (PET), and radiation hardened by design (RHBD) circuits for space applications fabricated in advanced manufacturing technologies are exposed to the non-negligible probability of soft errors by radiation impact events. The integrated circuit design for radiation measurement equipment not only leads to numerous advantages on size and power consumption, but also raises many challenges regarding the speed and noise to replace conventional design modalities. This thesis presents solutions to front-end receiver designs for radiation sensors as well as an error detection and correction method to microprocessor designs under the condition of soft error occurrence. For the first preamplifier design, a novel technique that enhances the bandwidth and suppresses the input current noise by using two inductors is discussed. With the dual-inductor TIA signal processing configuration, one can reduce the fabrication cost, the area overhead, and the power consumption in a fast readout package. The second front-end receiver is a novel detector capacitance compensation technique by using the Miller effect. The fabricated CSA exhibits minimal variation in the pulse shape as the detector capacitance is increased. Lastly, a modified D flip-flop is discussed that is called Razor-Lite using charge-sharing at internal nodes to provide a compact EDAC design for modern well-balanced processors and RHBD against soft errors by SEE.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111548/1/iykwon_1.pd

Deep Blue Documents at the University of Michigan

Health monitoring and life-time prognostics to enable dependable many-processor S0Cs

Author: Zhao Yong
Publication venue: University of Twente
Publication date: 12/12/2019
Field of study

University of Twente Research Information