542 research outputs found
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors
Β© 2016 Cheung, Schultz and Luk.NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation
Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay
Session 1: HLS Toolingpostprin
κΈ°κΈ° μμμμ μ¬μΈ΅ μ κ²½λ§ κ°μΈν λ°©λ²
νμλ
Όλ¬Έ (μμ¬)-- μμΈλνκ΅ λνμ : 곡과λν μ»΄ν¨ν°κ³΅νλΆ, 2019. 2. Egger, Bernhard.There exist several deep neural network (DNN) architectures suitable for embedded inference, however little work has focused on training neural networks on-device.
User customization of DNNs is desirable due to the difficulty of collecting a training set representative of real world scenarios.
Additionally, inter-user variation means that a general model has a limitation on its achievable accuracy.
In this thesis, a DNN architecture that allows for low power on-device user customization is proposed.
This approach is applied to handwritten character recognition of both the Latin and the Korean alphabets.
Experiments show a 3.5-fold reduction of the prediction error after user customization for both alphabets compared to a DNN trained with general data.
This architecture is additionally evaluated using a number of embedded processors demonstrating its practical application.λ΄μ₯ν κΈ°κΈ°μμ μ¬μΈ΅ μ κ²½λ§μ μΆλ‘ ν μ μλ μν€ν
μ²λ€μ μ‘΄μ¬νμ§λ§ λ΄μ₯ν κΈ°κΈ°μμ μ κ²½λ§μ νμ΅νλ μ°κ΅¬λ λ³λ‘ μ΄λ€μ§μ§ μμλ€. μ€μ νκ²½μ λ°μνλ νμ΅μ© λ°μ΄ν° μ§ν©μ λͺ¨μΌλ κ²μ΄ μ΄λ ΅κ³ μ¬μ©μκ°μ λ€μμ±μΌλ‘ μΈν΄ μΌλ°μ μΌλ‘ νμ΅λ λͺ¨λΈμ΄ μΆ©λΆν μ νλλ₯Ό κ°μ§κΈ°μ νκ³κ° μ‘΄μ¬νκΈ° λλ¬Έμ μ¬μ©μ λ§μΆ€ν μ¬μΈ΅ μ κ²½λ§μ΄ νμνλ€. μ΄ λ
Όλ¬Έμμλ κΈ°κΈ°μμμ μ μ λ ₯μΌλ‘ μ¬μ©μ λ§μΆ€νκ° κ°λ₯ν μ¬μΈ΅ μ κ²½λ§ μν€ν
μ²λ₯Ό μ μνλ€. μ΄λ¬ν μ κ·Ό λ°©λ²μ λΌν΄μ΄μ νκΈμ ν기체 κΈμ μΈμμ μ μ©λλ€. λΌν΄μ΄μ νκΈμ μ¬μ©μ λ§μΆ€νλ₯Ό μ μ©νμ¬ μΌλ°μ μΈ λ°μ΄ν°λ‘ νμ΅ν μ¬μΈ΅ μ κ²½λ§λ³΄λ€ 3.5λ°°λ μμ μμΈ‘ μ€λ₯μ κ²°κ³Όλ₯Ό μ»μλ€. λν μ΄ μν€ν
μ²μ μ€μ©μ±μ 보μ¬μ£ΌκΈ° μνμ¬ λ€μν λ΄μ₯ν νλ‘μΈμμμ μ€νμ μ§ννμλ€.Abstract i
Contents iii
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
Chapter 2 Motivation 4
Chapter 3 Background 6
3.1 Deep Neural Networks 6
3.1.1 Inference 6
3.1.2 Training 7
3.2 Convolutional Neural Networks 8
3.3 On-Device Acceleration 9
3.3.1 Hardware Accelerators 9
3.3.2 Software Optimization 10
Chapter 4 Methodology 12
4.1 Initialization 13
4.2 On-Device Training 14
Chapter 5 Implementation 16
5.1 Pre-processing 16
5.2 Latin Handwritten Character Recognition 17
5.2.1 Dataset and BIE Selection 17
5.2.2 AE Design 17
5.3 Korean Handwritten Character Recognition 21
5.3.1 Dataset and BIE Selection 21
5.3.2 AE Design 21
Chapter 6 On-Device Acceleration 26
6.1 Architecure Optimizations 27
6.2 Compiler Optimizations 29
Chapter 7 Experimental Setup 30
Chapter 8 Evaluation 33
8.1 Latin Handwritten Character Recognition 33
8.2 Korean Handwritten Character Recognition 38
8.3 On-Device Acceleration 40
Chapter 9 Related Work 44
Chapter 10 Conclusion 47
Bibliography 47
μμ½ 55
Acknowledgements 56Maste
- β¦