Search CORE

172 research outputs found

The Logic of Random Pulses: Stochastic Computing.

Author: Alaghi Armin
Publication venue
Publication date
Field of study

Recent developments in the field of electronics have produced nano-scale devices whose operation can only be described in probabilistic terms. In contrast with the conventional deterministic computing that has dominated the digital world for decades, we investigate a fundamentally different technique that is probabilistic by nature, namely, stochastic computing (SC). In SC, numbers are represented by bit-streams of 0's and 1's, in which the probability of seeing a 1 denotes the value of the number. The main benefit of SC is that complicated arithmetic computation can be performed by simple logic circuits. For example, a single (logic) AND gate performs multiplication. The dissertation begins with a comprehensive survey of SC and its applications. We highlight its main challenges, which include long computation time and low accuracy, as well as the lack of general design methods. We then address some of the more important challenges. We introduce a new SC design method, called STRAUSS, that generates efficient SC circuits for arbitrary target functions. We then address the problems arising from correlation among stochastic numbers (SNs). In particular, we show that, contrary to general belief, correlation can sometimes serve as a resource in SC design. We also show that unlike conventional circuits, SC circuits can tolerate high error rates and are hence useful in some new applications that involve nondeterministic behavior in the underlying circuitry. Finally, we show how SC's properties can be exploited in the design of an efficient vision chip that is suitable for retinal implants. In particular, we show that SC circuits can directly operate on signals with neural encoding, which eliminates the need for data conversion.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113561/1/alaghi_1.pd

Deep Blue Documents at the University of Michigan

근사 컴퓨팅을 이용한 회로 노화 보상과 에너지 효율적인 신경망 구현

Author: 김희수
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 이혁재.Approximate computing reduces the cost (energy and/or latency) of computations by relaxing the correctness (i.e., precision) of computations up to the level, which is dependent on types of applications. Moreover, it can be realized in various hierarchies of computing system design from circuit level to application level. This dissertation presents the methodologies applying approximate computing across such hierarchies; compensating aging-induced delay in logic circuit by dynamic computation approximation (Chapter 1), designing energy-efficient neural network by combining low-power and low-latency approximate neuron models (Chapter 2), and co-designing in-memory gradient descent module with neural processing unit so as to address a memory bottleneck incurred by memory I/O for high-precision data (Chapter 3). The first chapter of this dissertation presents a novel design methodology to turn the timing violation caused by aging into computation approximation error without the reliability guardband or increasing the supply voltage. It can be realized by accurately monitoring the critical path delay at run-time. The proposal is evaluated at two levels: RTL component level and system level. The experimental results at the RTL component level show a significant improvement in terms of (normalized) mean squared error caused by the timing violation and, at the system level, show that the proposed approach successfully transforms the aging-induced timing violation errors into much less harmful computation approximation errors, therefore it recovers image quality up to perceptually acceptable levels. It reduces the dynamic and static power consumption by 21.45% and 10.78%, respectively, with 0.8% area overhead compared to the conventional approach. The second chapter of this dissertation presents an energy-efficient neural network consisting of alternative neuron models; Stochastic-Computing (SC) and Spiking (SP) neuron models. SC has been adopted in various fields to improve the power efficiency of systems by performing arithmetic computations stochastically, which approximates binary computation in conventional computing systems. Moreover, a recent work showed that deep neural network (DNN) can be implemented in the manner of stochastic computing and it greatly reduces power consumption. However, Stochastic DNN (SC-DNN) suffers from problem of high latency as it processes only a bit per cycle. To address such problem, it is proposed to adopt Spiking DNN (SP-DNN) as an input interface for SC-DNN since SP effectively processes more bits per cycle than SC-DNN. Moreover, this chapter resolves the encoding mismatch problem, between two different neuron models, without hardware cost by compensating the encoding mismatch with synapse weight calibration. A resultant hybrid DNN (SPSC-DNN) consists of SP-DNN as bottom layers and SC-DNN as top layers. Exploiting the reduced latency from SP-DNN and low-power consumption from SC-DNN, the proposed SPSC-DNN achieves improved energy-efficiency with lower error-rate compared to SC-DNN and SP-DNN in same network configuration. The third chapter of this dissertation proposes GradPim architecture, which accelerates the parameter updates by in-memory processing which is codesigned with 8-bit floating-point training in Neural Processing Unit (NPU) for deep neural networks. By keeping the high precision processing algorithms in memory, such as the parameter update incorporating high-precision weights in its computation, the GradPim architecture can achieve high computational efficiency using 8-bit floating point in NPU and also gain power efficiency by eliminating massive high-precision data transfers between NPU and off-chip memory. A simple extension of DDR4 SDRAM utilizing bank-group parallelism makes the operation designs in processing-in-memory (PIM) module efficient in terms of hardware cost and performance. The experimental results show that the proposed architecture can improve the performance of the parameter update phase in the training by up to 40% and greatly reduce the memory bandwidth requirement while posing only a minimal amount of overhead to the protocol and the DRAM area.근사 컴퓨팅은 연산의 정확도의 손실을 어플리케이션 별 적절한 수준까지 허용함으로써 연산에 필요한 비용 (에너지나 지연시간)을 줄인다. 게다가, 근사 컴퓨팅은 컴퓨팅 시스템 설계의 회로 계층부터 어플리케이션 계층까지 다양한 계층에 적용될 수 있다. 본 논문에서는 근사 컴퓨팅 방법론을 다양한 시스템 설계의 계층에 적용하여 전력과 에너지 측면에서 이득을 얻을 수 있는 방법들을 제안하였다. 이는, 연산 근사화 (computation Approximation)를 통해 회로의 노화로 인해 증가된 지연시간을 추가적인 전력소모 없이 보상하는 방법과 (챕터 1), 근사 뉴런모델 (approximate neuron model)을 이용해 에너지 효율이 높은 신경망을 구성하는 방법 (챕터 2), 그리고 메모리 대역폭으로 인한 병목현상 문제를 높은 정확도 데이터를 활용한 연산을 메모리 내에서 수행함으로써 완화시키는 방법을 (챕터3) 제안하였다. 첫 번째 챕터는 회로의 노화로 인한 지연시간위반을 (timing violation) 설계마진이나 (reliability guardband) 공급전력의 증가 없이 연산오차 (computation approximation error)를 통해 보상하는 설계방법론 (design methodology)를 제안하였다. 이를 위해 주요경로의 (critical path) 지연시간을 동작시간에 정확하게 측정할 필요가 있다. 여기서 제안하는 방법론은 RTL component와 system 단계에서 평가되었다. RTL component 단계의 실험결과를 통해 제안한 방식이 표준화된 평균제곱오차를 (normalized mean squared error) 상당히 줄였음을 볼 수 있다. 그리고 system 단계에서는 이미지처리 시스템에서 이미지의 품질이 인지적으로 충분히 회복되는 것을 보임으로써 회로노화로 인해 발생한 지연시간위반 오차가 에러의 크기가 작은 연산오차로 변경되는 것을 확인 할 수 있었다. 결론적으로, 제안된 방법론을 따랐을 때 0.8%의 공간을 (area) 더 사용하는 비용을 지불하고 21.45%의 동적전력소모와 (dynamic power consumption) 10.78%의 정적전력소모의 (static power consumption) 감소를 달성할 수 있었다. 두 번째 챕터는 근사 뉴런모델을 활용하는 고-에너지효율의 신경망을 (neural network) 제안하였다. 본 논문에서 사용한 두 가지의 근사 뉴런모델은 확률컴퓨팅과 (stochastic computing) 스파이킹뉴런 (spiking neuron) 이론들을 기반으로 모델링되었다. 확률컴퓨팅은 산술연산들을 확률적으로 수행함으로써 이진연산을 낮은 전력소모로 수행한다. 최근에 확률컴퓨팅 뉴런모델을 이용하여 심층 신경망 (deep neural network)를 구현할 수 있다는 연구가 진행되었다. 그러나, 확률컴퓨팅을 뉴런모델링에 활용할 경우 심층신경망이 매 클락사이클마다 (clock cycle) 하나의 비트만을 (bit) 처리하므로, 지연시간 측면에서 매우 나쁠 수 밖에 없는 문제가 있다. 따라서 본 논문에서는 이러한 문제를 해결하기 위하여 스파이킹 뉴런모델로 구성된 스파이킹 심층신경망을 확률컴퓨팅을 활용한 심층신경망 구조와 결합하였다. 스파이킹 뉴런모델의 경우 매 클락사이클마다 여러 비트를 처리할 수 있으므로 심층신경망의 입력 인터페이스로 사용될 경우 지연시간을 줄일 수 있다. 하지만, 확률컴퓨팅 뉴런모델과 스파이킹 뉴런모델의 경우 부호화 (encoding) 방식이 다른 문제가 있다. 따라서 본 논문에서는 해당 부호화 불일치 문제를 모델의 파라미터를 학습할 때 고려함으로써, 파라미터들의 값이 부호화 불일치를 고려하여 조절 (calibration) 될 수 있도록 하여 문제를 해결하였다. 이러한 분석의 결과로, 앞 쪽에는 스파이킹 심층신경망을 배치하고 뒷 쪽애는 확률컴퓨팅 심층신경망을 배치하는 혼성신경망을 제안하였다. 혼성신경망은 스파이킹 심층신경망을 통해 매 클락사이클마다 처리되는 비트 양의 증가로 인한 지연시간 감소 효과와 확률컴퓨팅 심층신경망의 저전력 소모 특성을 모두 활용함으로써 각 심층신경망을 따로 사용하는 경우 대비 우수한 에너지 효율성을 비슷하거나 더 나은 정확도 결과를 내면서 달성한다. 세 번째 챕터는 심층신경망을 8비트 부동소숫점 연산으로 학습하는 신경망처리유닛의 (neural processing unit) 파라미터 갱신을 (parameter update) 메모리-내-연산으로 (in-memory processing) 가속하는 GradPIM 아키텍쳐를 제안하였다. GradPIM은 8비트의 낮은 정확도 연산은 신경망처리유닛에 남기고, 높은 정확도를 가지는 데이터를 활용하는 연산은 (파라미터 갱신) 메모리 내부에 둠으로써 신경망처리유닛과 메모리간의 데이터통신의 양을 줄여, 높은 연산효율과 전력효율을 달성하였다. 또한, GradPIM은 bank-group 수준의 병렬화를 이루어 내 높은 내부 대역폭을 활용함으로써 메모리 대역폭을 크게 확장시킬 수 있게 되었다. 또한 이러한 메모리 구조의 변경이 최소화되었기 때문에 추가적인 하드웨어 비용도 최소화되었다. 실험 결과를 통해 GradPIM이 최소한의 DRAM 프로토콜 변화와 DRAM칩 내의 공간사용을 통해 심층신경망 학습과정 중 파라미터 갱신에 필요한 시간을 40%만큼 향상시켰음을 보였다.Chapter I: Dynamic Computation Approximation for Aging Compensation 1 1.1 Introduction 1 1.1.1 Chip Reliability 1 1.1.2 Reliability Guardband 2 1.1.3 Approximate Computing in Logic Circuits 2 1.1.4 Computation approximation for Aging Compensation 3 1.1.5 Motivational Case Study 4 1.2 Previous Work 5 1.2.1 Aging-induced Delay 5 1.2.2 Delay-Configurable Circuits 6 1.3 Proposed System 8 1.3.1 Overview of the Proposed System 8 1.3.2 Proposed Adder 9 1.3.3 Proposed Multiplier 11 1.3.4 Proposed Monitoring Circuit 16 1.3.5 Aging Compensation Scheme 19 1.4 Design Methodology 20 1.5 Evaluation 24 1.5.1 Experimental setup 24 1.5.2 RTL component level Adder/Multiplier 27 1.5.3 RTL component level Monitoring circuit 30 1.5.4 System level 31 1.6 Summary 38 Chapter II: Energy-Efficient Neural Network by Combining Approximate Neuron Models 40 2.1 Introduction 40 2.1.1 Deep Neural Network (DNN) 40 2.1.2 Low-power designs for DNN 41 2.1.3 Stochastic-Computing Deep Neural Network 41 2.1.4 Spiking Deep Neural Network 43 2.2 Hybrid of Stochastic and Spiking DNNs 44 2.2.1 Stochastic-Computing vs Spiking Deep Neural Network 44 2.2.2 Combining Spiking Layers and Stochastic Layers 46 2.2.3 Encoding Mismatch 47 2.3 Evaluation 49 2.3.1 Latency and Test Error 49 2.3.2 Energy Efficiency 51 2.4 Summary 54 Chapter III: GradPIM: In-memory Gradient Descent in Mixed-Precision DNN Training 55 3.1 Introduction 55 3.1.1 Neural Processing Unit 55 3.1.2 Mixed-precision Training 56 3.1.3 Mixed-precision Training with In-memory Gradient Descent 57 3.1.4 DNN Parameter Update Algorithms 59 3.1.5 Modern DRAM Architecture 61 3.1.6 Motivation 63 3.2 Previous Work 65 3.2.1 Processing-In-Memory 65 3.2.2 Co-design Neural Processing Unit and Processing-In-Memory 66 3.2.3 Low-precision Computation in NPU 67 3.3 GradPIM 68 3.3.1 GradPIM Architecture 68 3.3.2 GradPIM Operations 69 3.3.3 Timing Considerations 70 3.3.4 Update Phase Procedure 73 3.3.5 Commanding GradPIM 75 3.4 NPU Co-design with GradPIM 76 3.4.1 NPU Architecture 76 3.4.2 Data Placement 79 3.5 Evaluation 82 3.5.1 Evaluation Methodology 82 3.5.2 Experimental Results 83 3.5.3 Sensitivity Analysis 88 3.5.4 Layer Characterizations 90 3.5.5 Distributed Data Parallelism 90 3.6 Summary 92 3.6.1 Discussion 92 Bibliography 113 요약 114Docto

SNU Open Repository and Archive

Removing constant‐induced errors in stochastic circuits

Author: Hayes John P.
Ting Paishun
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/05/2019
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163790/1/cdt2bf00226.pd

Deep Blue Documents at the University of Michigan

The hardware implementation of an artificial neural network using stochastic pulse rate encoding principles

Author: Glover John Sigsworth
Publication venue
Publication date: 01/01/1995
Field of study

In this thesis the development of a hardware artificial neuron device and artificial neural network using stochastic pulse rate encoding principles is considered. After a review of neural network architectures and algorithmic approaches suitable for hardware implementation, a critical review of hardware techniques which have been considered in analogue and digital systems is presented. New results are presented demonstrating the potential of two learning schemes which adapt by the use of a single reinforcement signal. The techniques for computation using stochastic pulse rate encoding are presented and extended with new novel circuits relevant to the hardware implementation of an artificial neural network. The generation of random numbers is the key to the encoding of data into the stochastic pulse rate domain. The formation of random numbers and multiple random bit sequences from a single PRBS generator have been investigated. Two techniques, Simulated Annealing and Genetic Algorithms, have been applied successfully to the problem of optimising the configuration of a PRBS random number generator for the formation of multiple random bit sequences and hence random numbers. A complete hardware design for an artificial neuron using stochastic pulse rate encoded signals has been described, designed, simulated, fabricated and tested before configuration of the device into a network to perform simple test problems. The implementation has shown that the processing elements of the artificial neuron are small and simple, but that there can be a significant overhead for the encoding of information into the stochastic pulse rate domain. The stochastic artificial neuron has the capability of on-line weight adaption. The implementation of reinforcement schemes using the stochastic neuron as a basic element are discussed

Durham e-Theses

Metal oxides of resistive memories investigated by electron and ion backscattering

Author: Marmitt Gabriel Guterres
Publication venue
Publication date: 01/01/2017
Field of study

The memristor is one of the most promising devices being studied for multiple uses in future electronic systems, with applications ranging from nonvolatile memories to artificial neural networks. Its working is based on the forming and rupturing of nano-scaled conductive filaments, which drastically alters the device’s resistance. These filaments are formed by oxygen vacancy accumulation, hence a deep understanding of the self-diffusion of oxygen in these systems is necessary. Accurate measurements of oxygen self-diffusion on metal oxides was achieved with the development of a quantitative analysis of the energy spectrum of the backscattering of electrons. The novel technique called Electron Rutherford Backscattering Spectroscopy (ERBS) uses the scattering of high energy electrons ( 40 keV) to probe the sample’s near surface (10–100 nm). Measurements of the high energy loss region – called Reflection High-Energy Electron Loss Spectroscopy (RHEELS) – also exhibit characteristics of the material’s electronic structure. A careful procedure was developed for the fitting of ERBS spectra, which was then applied on the analysis of multi-layered samples of Si3N4/TiO2, and measurements of the band gap of common oxides, such as SiO2, CaCO3 and Li2CO3. Monte Carlo simulations were employed to study the effects of multiple elastic scatterings in ERBS spectra, and a dielectric function description of inelastic scatterings extended the simulation to also consider the plasmon excitation peaks observed in RHEELS. These analysis tools were integrated into a package named PowerInteraction. With its use, a series of measurements of oxygen self-diffusion in TiO2 were conducted. The samples were composed of two sputtered deposited TiO2 layers, one of which was enriched with the 18 mass oxygen isotope. After thermal annealing, diffusion profiles were obtained by tracking the relative concentration of oxygen isotopes in both films. From the logarithmic temperature dependence of the diffusion coefficients, an activation energy of 1.05 eV for oxygen self-diffusion in TiO2 was obtained. Common ion beam analysis, such as RBS and NRA/NRP (Nuclear Reaction Analysis/Profiling), were also used to provide complementary information.O memristor é um dos dispositivos mais promissores sendo estudados para múltiplos usos em sistemas eletrônicos, com aplicações desde memórias não voláteis a redes neurais artificiais. Seu funcionamento é baseado na formação e ruptura de filamentos condutores nanométricos, o que altera drasticamente a resistência do dispositivo. Estes filamentos são formados pela acumulação de vacâncias de oxigênio, portanto um profundo entendimento da autodifusão de oxigênio nestes sistemas é necessário. Medidas acuradas da difusão em óxidos metálicos foi obtida com o desenvolvimento de uma análise quantitativa do espectro em energia de elétrons retroespalhados. A inovadora técnica de RBS de elétrons (ERBS) utiliza elétrons de alta energia ( 40 keV) para investigar a região próxima a superfície (10–100 nm). Medidas da região de alta perda de energia – chamada de Spectroscopia de Perda de Alta-Energia de Elétrons Refletidos (RHEELS) – também exibe características da estrutura eletrônica dos materiais. Um procedimento cuidadoso para o ajuste de espectros de ERBS foi desenvolvido, e então aplicado na análise de amostras multi camada de Si3N4/TiO2, e medidas de band gap de alguns óxidos, como SiO2, CaCO3 e Li2CO3. Simulações de Monte Carlo foram empregadas no estudo dos efeitos de espalhamento múltiplo nos espectros de ERBS, e uma descrição dielétrica dos espalhamentos inelásticos extendeu as simulação para também considerarem os picos de exitação plasmônica observados em RHEELS. Estas ferramentas de análise foram integradas em um pacote chamado PowerInteraction. Com o uso deste, uma série de medidas de autodifusão de oxigênio em TiO2 foram conduzidas. As amostras eram compostas por dois filmes de TiO2 depositados por sputtering, um dos quais enriquecido com isótopo 18 de oxigênio. Após tratamentos térmicos, perfis de difusão foram obtidos pelo rastreio das concentrações relativas dos isótopos de oxigênio nos dois filmes. Do comportamento logarítmico dos coeficientes de difusão em relação à temperatura, uma energia de ativação de 1.05 eV para a autodifusão de oxigênio em TiO2 foi obtida. Análises por feixes de íons, como RBS e NRA/NRP (Análise/Perfilometria por Reação Nuclear), também forneceram informações complementares

Lume 5.8

MOCAST 2021

Author
Publication venue: 'MDPI AG'
Publication date: 02/02/2023
Field of study

The 10th International Conference on Modern Circuit and System Technologies on Electronics and Communications (MOCAST 2021) will take place in Thessaloniki, Greece, from July 5th to July 7th, 2021. The MOCAST technical program includes all aspects of circuit and system technologies, from modeling to design, verification, implementation, and application. This Special Issue presents extended versions of top-ranking papers in the conference. The topics of MOCAST include:Analog/RF and mixed signal circuits;Digital circuits and systems design;Nonlinear circuits and systems;Device and circuit modeling;High-performance embedded systems;Systems and applications;Sensors and systems;Machine learning and AI applications;Communication; Network systems;Power management;Imagers, MEMS, medical, and displays;Radiation front ends (nuclear and space application);Education in circuits, systems, and communications

Directory of Open Access Books (DOAB)

VLSI Design

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc

Directory of Open Access Books (DOAB)