Search CORE

235 research outputs found

Hierarchical Temporal Memory using Memristor Networks: A Survey

Author: Dolzhikova Irina
James Alex Pappachen
Krestinskaya Olga
Publication venue
Publication date: 08/05/2018
Field of study

This paper presents a survey of the currently available hardware designs for implementation of the human cortex inspired algorithm, Hierarchical Temporal Memory (HTM). In this review, we focus on the state of the art advances of memristive HTM implementation and related HTM applications. With the advent of edge computing, HTM can be a potential algorithm to implement on-chip near sensor data processing. The comparison of analog memristive circuit implementations with the digital and mixed-signal solutions are provided. The advantages of memristive HTM over digital implementations against performance metrics such as processing speed, reduced on-chip area and power dissipation are discussed. The limitations and open problems concerning the memristive HTM, such as the design scalability, sneak currents, leakage, parasitic effects, lack of the analog learning circuits implementations and unreliability of the memristive devices integrated with CMOS circuits are also discussed

arXiv.org e-Print Archive

Nazarbayev University Repository

Validating silicon polytrodes with paired juxtacellular recordings: method and dataset

Author: Aarts A
Andrei A
Baiao P
Barquinha P
Fortunato E
Frazao J
Kampff AR
Lacerda P
Lopes G
Musa S
Neto JP
Nogueira J
Publication venue: AMER PHYSIOLOGICAL SOC
Publication date: 26/01/2016
Field of study

Cross-validating new methods for recording neural activity is necessary to accurately interpret and compare the signals they measure. Here we describe a procedure for precisely aligning two probes for in vivo “paired-recordings” such that the spiking activity of a single neuron is monitored with both a dense extracellular silicon polytrode and a juxtacellular micropipette. Our new method allows for efficient, reliable, and automated guidance of both probes to the same neural structure with micrometer resolution. We also describe a new dataset of paired-recordings, which is available online. We propose that our novel targeting system, and ever expanding cross-validation dataset, will be vital to the development of new algorithms for automatically detecting/sorting single-units, characterizing new electrode materials/designs, and resolving nagging questions regarding the origin and nature of extracellular neural signals

Lirias

UCL Discovery

PubMed Central

A versatile circuit for emulating active biological dendrites applied to sound localisation and neuron imitation

Author: Mannion Daniel John
Publication venue
Publication date: 25/10/2023
Field of study

Sophisticated machine learning struggles to transition onto battery-operated devices due to the high-power consumption of neural networks. Researchers have turned to neuromorphic engineering, inspired by biological neural networks, for more efficient solutions. While previous research focused on artificial neurons and synapses, an essential component has been overlooked: dendrites. Dendrites transmit inputs from synapses to the neuron's soma, applying both passive and active transformations. However, neuromorphic circuits replace these sophisticated computational channels with metallic interconnects. In this study, we introduce a versatile circuit that emulates a segment of a dendrite which exhibits gain, introduces delays, and performs integration. We show how sound localisation - a biological example of dendritic computation - is not possible with the existing passive dendrite circuits but can be achieved using this proposed circuit. We also find that dendrites can form bursting neurons. This significant discovery suggests the potential to fabricate neural networks solely comprised of dendrite circuits.Comment: 13 pages. 6 Figues in main text, 1 figure in supplementary material

arXiv.org e-Print Archive

양자화된 학습을 통한 저전력 딥러닝 훈련 가속기 설계

Author: 박정우
Publication venue: 서울대학교 대학원
Publication date: 01/02/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 융합과학부(지능형융합시스템전공), 2022.2. 전동석.딥러닝의 시대가 도래함에 따라, 심층 인공 신경망 (DNN)을 처리하기 위해 요구되는 학습 및 추론 연산량 또한 기하급수적으로 증가하였다. 딥 러닝 시대의 도래와 함께 다양한 작업에 대한 신경망 훈련 및 특정 용도에 대해 훈련된 신경망 추론 수행 측면에서 심층 신경망 (DNN) 처리에 대한 컴퓨팅 요구가 극적으로 증가하였으며, 이러한 추세는 인공지능의 사용이 더욱 범용적으로 진화함에 따라 더욱 가속화 될 것으로 예상된다. 이러한 연산 요구를 해결하기 위해 데이터 센터 내부에 배치하기 위한 FPGA (Field-Programmable Gate Array) 또는 ASIC (Application-Specific Integrated Circuit) 기반 시스템에서 저전력을 위한 SoC (System-on-Chip)의 가속 블록에 이르기까지 다양한 맞춤형 하드웨어가 산업 및 학계에서 제안되었다. 본 논문에서는, 인공 신경망의 에너지 효율적인 훈련 처리를 위한 맞춤형 집적 회로 하드웨어를 보다 에너지 효율적으로 설계할 수 있는 다양한 방법론을 제안하고 실제 저전력 인공 신경망 훈련 시스템을 설계하고 제작하여, 그 효율을 평가하고자 한다. 특히, 본 논문에서는 이러한 저전력 고성능 설계 방법론을 크게 세 가지로 분류하여 분석을 진행하였다. 이러한 분류는 다음과 같다. (1) 훈련 알고리즘. 표준적으로 심층 신경망 훈련은 역전파 (Back-Propagation) 알고리즘으로 수행되지만, 더 효율적인 하드웨어 구현을 위해 스파이크을 기반으로 통신하는 뉴런이 있는 뉴로모픽 학습 알고리즘 또는 비대칭 피드백 을 기반으로 하는 생물학적 모사도가 높은 (Bio-Plausible) 알고리즘을 활용하여 더 효율적인 훈련 시스템을 설계하는 방법을 조사 및 제시하고, 그 하드웨어 효율성을 분석하였다. (2) 저정밀도 수 체계 활용. 일반적으로 사용되는 DNN 가속기에서 효율성을 높이는 가장 강력한 방법 중 하나는 수치 정밀도를 조정하는 것이다. DNN의 추론 단계에 낮은 정밀도 숫자를 사용하는 것은 잘 연구되었지만, 성능 저하 없이 DNN을 훈련하는 것은 상대적으 기술적 어려움이 있다. 본 논문에서는 다양한 모델과 시나리오에서 DNN을 성능 저하 없이 훈련하기 위한 새로운 수 체계를 제안하였다. (3) 시스템 구현 기법. 집적 회로에서 맞춤형 훈련 시스템을 실제로 실현할 때, 거의 무한한 설계 공간은 칩 내부의 데이터 흐름, 시스템 부하 분산, 가속/게이팅 블록 등 다양한 요소에 따라 결과의 품질이 크게 달라질 수 있다. 본 논문에서는 더 나은 성능과 효율성으로 이어지는 다양한 설계 기법을 소개하고 분석하고자 한다. 첫째로, 손글씨 분류 학습을 위한 뉴로모픽 학습 시스템을 제작하여 평가하였다. 이 학습 시스템은 전통적인 기계 학습의 훈련 성능을 유지하면서 낮은 훈련 오버헤드를 제공하는 것을 목표로 하여 설계되었다. 이 목적을 달성하기 위해, 더 적은 연산 요구량과 버퍼 메모리 필요치를 위해 기존의 뉴로모픽 알고리즘을 수정하였으며, 이 과정에서 훈련 성능 손실 없이 기존 역전파 기반 알고리즘에 근접한 훈련 성능을 달성하였다. 뿐만 아니라, 업데이트를 건너뛰는 메커니즘을 구현하고 Lock-Free 매개변수 업데이트 방식을 채택하여 훈련에 소모되는 에너지를 훈련이 진행됨에 따라 동적으로 감소시킬 수 있는 시스템 구현 기법 또한 소개하고 그 성능을 분석하였다. 이런 기법을 통해, 이 학습 시스템은 기존의 훈련 시스템 대비 뛰어난 분류 성능-에너지 소모량 관계를 보이면서도 기존의 역전파 알고리즘 기반의 인공 신경망의 훈련 성능을 유지하였다. 둘째로, 특수 명령어 체계 및 맞춤형 수 체계를 활용한 프로그램 가능한 DNN 훈련용 프로세서가 설계되고 제작되었다. 기존 DNN 추론용 가속기는 8비트 정수 기반으로 이루어진 경우가 많았지만, DNN 학습 설계시 8비트 수 체계를 이용하며 훈련 성능 저하를 보이지 않는 것은 상당한 기술적 난이도를 가지고 있었다. 이런 문제를 극복하기 위해, 본 논문에서는 공유형 멱지수 편향값을 활용하는 8비트 부동 소수점 수 체계를 새로이 제안하였으며, 이 수 체계의 효용성을 보이기 위해 이 DNN 훈련 프로세서가 설계되었다. 뿐만 아니라, 이 프로세서는 단순한 MAC 기반 Matrix-Multiplication 가속기가 아닌, Fused-Multiply-Add 트리를 기반으로 하는 에너지 효율적인 가속기 구조를 채택하면서도, 칩 내부에서의 데이터 이동량 최적화 및 컨볼루션의 공간성을 극대화할 수 있기 위해 데이터 전달 유닛을 입출력부에 2D로 제작하여 트리 기반에서의 컨볼루션 추론 및 훈련 단계에서의 공간성을 활용할 수 있는 방법을 제시하였다. 본 DNN 훈련 프로세서는 맞춤형 벡터 연산기, 가속 명령어 체계, 외부 DRAM으로의 직접적인 접근 제어 방식 등을 통해 한 프로세서 내에서 DNN 훈련의 모든 단계를 다양한 모델 및 환경에서 효율적으로 처리할 수 있도록 설계되었다. 이를 통해 본 프로세서는 기존의 연구에서 제시되었던 다른 프로세서에 비해 동일 모델을 처리하면서 2.48배 가량 더 높은 에너지 효율성, 43% 적은 DRAM 접근 요구량, 0.8%p 높은 훈련 성능을 달성하였다. 이렇게 소개된 두 가지 설계는 모두 실제 칩으로 제작되어 검증되었다. 측정 데이터 및 전력 소모량을 통해 본 논문에서 제안된 저전력 딥러닝 훈련 시스템 설계 기법의 효율을 검증하였으며, 특히 생물학적 모사도가 높은 훈련 알고리즘, 딥러닝 훈련에 최적화된 수 체계, 그리고 효율적인 시스템 구현 기법을 활용하여 시스템의 에너지 효율성을 개선하는 목표를 달성하였는지 정량적으로 분석하였다.With the advent of the deep learning era, the computational need for processing deep neural networks (DNN) have increased dramatically, both in terms of performing training the neural networks on various tasks as well as in performing inference on the trained neural networks for specific use cases. To address those needs, many custom hardware ranging from systems based on field-programmable gate arrays (FPGA) or application-specific integrated circuits (ASIC) for deployment inside data centers to acceleration blocks in system-on-chip (SoC) for low-power processing in mobile devices were proposed. In this dissertation, custom integrated circuits hardware for energy efficient processing of training neural networks are designed, fabricated, and measured for evaluation of different methodologies that could be utilized for more energy efficient processing under same training performance constraints. In particular, these methodologies are categorized to three different categories for evaluation: (1) Training algorithm. While standard deep neural network training is performed with the back-propagation (BP) algorithm, we investigate various training algorithms, such as neuromorphic learning algorithms with spiking neurons or bio-plausible algorithms with asymmetric feedback for exploiting computational properties for more efficient hardware implementation. (2) Low-precision arithmetic. One of the most powerful methods for increased efficiency in DNN accelerators is through scaling numerical precision. While utilizing low precision numerics for inference phase of DNNs is well studied, training DNNs without performance degradation is relatively more challenging. A novel numerical scheme for training DNNs in various models and scenarios is proposed in this dissertation. (3) System implementation techniques. In actual realization of a custom training system in integrated circuits, nearly infinite design space leads to vastly different quality of results depending on dataflow inside the chip, system load balancing, acceleration and gating blocks, et cetera. Different design techniques which leads to better performance and efficiency are introduced in this dissertation. First, a neuromorphic learning system for classifying handwritten digits (MNIST) is introduced. This learning system aims to deliver low training overhead while maintaining the training performance of classical machine learning. In order to achieve this goal, a neuromorphic learning algorithm is modified for lower operation count and memory buffer requirement while maintaining or even obtaining higher machine learning performance. Moreover, implementation techniques such as update skipping mechanism and lock-free parameter updates allow even lower training overhead, dynamically reducing training energy overhead from 25.6% to 7.5%. With these proposed methodologies, this system greatly improves the accuracy-energy trade-off in on-chip learning system as well as showing close learning performance to classical DNN training through back propagation. Second, a programmable DNN training processor with a custom numerical format is introduced. While prior DNN inference accelerators have utilized 8-bit integers, implementing 8-bit numerics for a training accelerator remained to be a challenge due to higher precision requirements in the backward step of DNN training. To overcome this limitation, a custom 8-bit floating point format dubbed 8-bit floating point with shared exponent bias (FP8-SEB) is introduced in this dissertation. Moreover, a processing architecture of 24-way fused-multiply-adder (FMA) tree greatly increases processing energy efficiency per MAC, while complemented with a novel 2-dimensional routing data-path for making use of spatiality to increase data reuse in both forward, backward, and weight gradient step of convolutional neural networks. This DNN training processor is implemented with a custom vector processing unit, acceleration instructions, and DMA in external DRAMs for end-to-end DNN training in various models and datasets. Compared against prior low-precision training processor in ResNet-18 training, this work achieves 2.48× higher energy efficiency, 43% less DRAM accesses, and 0.8\p higher training accuracy. Both of the designs introduced are fabricated in real silicon and verified both in simulations and in physical measurements. Design methodologies are carefully evaluated using simulations of the fabricated chip and measurements with monitored data and power consumption under varying conditions that expose the design techniques in effect. The efficiency of various biologically plausible algorithms, novel numerical formats, and system implementation techniques are analyzed in discussed in this dissertations based on the obtained measurements.Abstract i Contents iv List of Tables vii List of Figures viii 1 Introduction 1 1.1 Study Background 1 1.2 Purpose of Research 6 1.3 Contents 8 2 Hardware-Friendly Learning Algorithms 9 2.1 Modified Learning Rule for Neuromorphic System 9 2.1.1 The Segregated Dendrites Algorithm 9 2.1.2 Modification of the Segregated Dendrites Algorithm 13 2.2 Non-BP Learning Rules on DNN Training Processor 18 2.2.1 Feedback Alignment and Direct Feedback Alignment 18 2.2.2 Reduced Memory Access in Non-BP Learning Rules 23 3 Optimal Numerical Format for DNN Training 27 3.1 Related Works 27 3.2 Proposed FP8 with Shared Exponent Bias 30 3.3 Training Results with FP8-SEB 33 3.4 Fused Multiply Adder Tree for FP8-SEB 37 4 System Implementations 41 4.1 Neuromorphic Learning System 41 4.1.1 Bio-Plausibility 41 4.1.2 Top Level Architecture 43 4.1.3 Lock-Free Weight Updates 47 4.1.4 Update Skipping Mechanism 48 4.2 Low-Precision DNN Training System 51 4.2.1 Top Level Architecture 52 4.2.2 Optimized Auxiliary Instructions in the Vector Processing Unit 55 4.2.3 Buffer Organization 57 4.2.4 Input-Output 2D Spatial Routing for FMA Trees 60 5 Measurement Results 70 5.1 Measurement Results on the Neuromorphic Learning System 70 5.1.1 Measurement Results and Test Setup . 70 5.1.2 Comparison against other works 73 5.1.3 Scalability of the Learning Algorithm 77 5.2 Measurements Results on the Low-Precision DNN Training Processor 79 5.2.1 Measurement Results in Benchmarked Tests 79 5.2.2 Comparison Against Other DNN Training Processors 89 6 Conclusion 93 6.1 Discussion for Future Works 93 6.1.1 Scaling to CNNs in the Neuromorphic System 93 6.1.2 Discussions for Improvements on DNN Training Processor 96 6.2 Conclusion 99 Abstract (In Korean) 108박

SNU Open Repository and Archive

Advanced Computing and Related Applications Leveraging Brain-inspired Spiking Neural Networks

Author: Bucukovski Joseph
Carlson Erwan
Sima Lyuyang
Yien Nicole L.
Publication venue
Publication date: 08/09/2023
Field of study

In the rapid evolution of next-generation brain-inspired artificial intelligence and increasingly sophisticated electromagnetic environment, the most bionic characteristics and anti-interference performance of spiking neural networks show great potential in terms of computational speed, real-time information processing, and spatio-temporal information processing. Data processing. Spiking neural network is one of the cores of brain-like artificial intelligence, which realizes brain-like computing by simulating the structure and information transfer mode of biological neural networks. This paper summarizes the strengths, weaknesses and applicability of five neuronal models and analyzes the characteristics of five network topologies; then reviews the spiking neural network algorithms and summarizes the unsupervised learning algorithms based on synaptic plasticity rules and four types of supervised learning algorithms from the perspectives of unsupervised learning and supervised learning; finally focuses on the review of brain-like neuromorphic chips under research at home and abroad. This paper is intended to provide learning concepts and research orientations for the peers who are new to the research field of spiking neural networks through systematic summaries

arXiv.org e-Print Archive

Quality control and improvement of the aluminum alloy castings for the next generation of engine block cast components.

Author: Francis Robin
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

This research focuses on the quality control and improvement of the W319 aluminum alloy engine blocks produced at the NEMAK Windsor Aluminum Plant (WAP). The present WAP Quality Control (QC) system was critically evaluated using the cause and effect diagram and therefore, a novel Plant Wide Quality Control (PWQC) system is proposed. This new QC system presents novel tools for off line as well as on line quality control. The off line tool uses heating curve analysis for the grading of the ingot suppliers. The on line tool utilizes Tukey control charts of the Thermal Analysis (TA) parameters for statistical process control. An Artificial Neural Network (ANN) model has also been developed for the on-line prediction and control of the Silicon Modification Level (SiML). The student t-statistical analysis has shown that even small scale variations in the Fe and Mn levels significantly affect the shrink porosity level of the 3.0L V6 engine block bulkhead. When the Fe and Mn levels are closer to their upper specification limits (0.4 wt.% and 0.3wt.%, respectively), the probability of low bulkhead shrink porosity is as high as 0.73. Elevated levels of Sn (∼0.04 wt.%) and Pb (∼0.03 wt.%) were found to lower the Brinell Hardness (HB) of the V6 bulkhead after the Thermal Sand Removal (TSR) and Artificial Aging (AA) processes. Therefore, Sn and Pb levels must be kept below 0.0050 wt.% and 0.02 wt.%, respectively, to satisfy the bulkhead HB requirements. The Cosworth electromagnetic pump reliability studies have indicated that the life of the pump has increased from 19,505 castings to 43,904 castings (225% increase) after the implementation of preventive maintenance. The optimum preventive maintenance period of the pump was calculated to be 43,000 castings. The solution treatment parameters (temperature and time) of the Novel Solution Treatment during the Solidification (NSTS) Process were optimized using ANN and the Simulated Annealing (SA) algorithm. The optimal NSTS process (516°C and 66 minutes) would significantly reduce the present Thermal Sand Removal (TSR) time (4 hours) and would avoid the problem of incipient melting without sacrificing the mechanical properties. In order to improve the cast component characteristics and to lower the alloy price, a new alloy, Al 332, (Si=10.5 wt.% & Cu=2 wt.%) was developed by optimizing the Si and Cu levels of 3XX Al alloys as a replacement for the W319 alloy. The predicted as cast characteristics of the new alloy were found to satisfy the requirements of Ford engineering specification WSE-M2A-151-A2/A4.* *This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation).Dept. of Industrial and Manufacturing Systems Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .F735. Source: Dissertation Abstracts International, Volume: 66-11, Section: B, page: 6201. Thesis (Ph.D.)--University of Windsor (Canada), 2005

Scholarship at UWindsor