477 research outputs found

    Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay

    Get PDF
    Session 1: HLS Toolingpostprin

    Making a case for an ARM Cortex-A9 CPU interlay replacing the NEON SIMD unit

    Get PDF

    Efficient performance scaling of future CGRAs for mobile applications

    Full text link

    κΈ°κΈ° μƒμ—μ„œμ˜ 심측 신경망 κ°œμΈν™” 방법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (석사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀, 2019. 2. Egger, Bernhard.There exist several deep neural network (DNN) architectures suitable for embedded inference, however little work has focused on training neural networks on-device. User customization of DNNs is desirable due to the difficulty of collecting a training set representative of real world scenarios. Additionally, inter-user variation means that a general model has a limitation on its achievable accuracy. In this thesis, a DNN architecture that allows for low power on-device user customization is proposed. This approach is applied to handwritten character recognition of both the Latin and the Korean alphabets. Experiments show a 3.5-fold reduction of the prediction error after user customization for both alphabets compared to a DNN trained with general data. This architecture is additionally evaluated using a number of embedded processors demonstrating its practical application.λ‚΄μž₯ν˜• κΈ°κΈ°μ—μ„œ 심측 신경망을 μΆ”λ‘ ν•  수 μžˆλŠ” μ•„ν‚€ν…μ²˜λ“€μ€ μ‘΄μž¬ν•˜μ§€λ§Œ λ‚΄μž₯ν˜• κΈ°κΈ°μ—μ„œ 신경망을 ν•™μŠ΅ν•˜λŠ” μ—°κ΅¬λŠ” λ³„λ‘œ 이뀄지지 μ•Šμ•˜λ‹€. μ‹€μ œ ν™˜κ²½μ„ λ°˜μ˜ν•˜λŠ” ν•™μŠ΅μš© 데이터 집합을 λͺ¨μœΌλŠ” 것이 μ–΄λ ΅κ³  μ‚¬μš©μžκ°„μ˜ λ‹€μ–‘μ„±μœΌλ‘œ 인해 일반적으둜 ν•™μŠ΅λœ λͺ¨λΈμ΄ μΆ©λΆ„ν•œ 정확도λ₯Ό 가지기엔 ν•œκ³„κ°€ μ‘΄μž¬ν•˜κΈ° λ•Œλ¬Έμ— μ‚¬μš©μž λ§žμΆ€ν˜• 심측 신경망이 ν•„μš”ν•˜λ‹€. 이 λ…Όλ¬Έμ—μ„œλŠ” κΈ°κΈ°μƒμ—μ„œ μ €μ „λ ₯으둜 μ‚¬μš©μž λ§žμΆ€ν™”κ°€ κ°€λŠ₯ν•œ 심측 신경망 μ•„ν‚€ν…μ²˜λ₯Ό μ œμ•ˆν•œλ‹€. μ΄λŸ¬ν•œ μ ‘κ·Ό 방법은 라틴어와 ν•œκΈ€μ˜ 필기체 κΈ€μž 인식에 μ μš©λœλ‹€. 라틴어와 ν•œκΈ€μ— μ‚¬μš©μž λ§žμΆ€ν™”λ₯Ό μ μš©ν•˜μ—¬ 일반적인 λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ 심측 신경망보닀 3.5λ°°λ‚˜ μž‘μ€ 예츑 였λ₯˜μ˜ κ²°κ³Όλ₯Ό μ–»μ—ˆλ‹€. λ˜ν•œ 이 μ•„ν‚€ν…μ²˜μ˜ μ‹€μš©μ„±μ„ 보여주기 μœ„ν•˜μ—¬ λ‹€μ–‘ν•œ λ‚΄μž₯ν˜• ν”„λ‘œμ„Έμ„œμ—μ„œ μ‹€ν—˜μ„ μ§„ν–‰ν•˜μ˜€λ‹€.Abstract i Contents iii List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Motivation 4 Chapter 3 Background 6 3.1 Deep Neural Networks 6 3.1.1 Inference 6 3.1.2 Training 7 3.2 Convolutional Neural Networks 8 3.3 On-Device Acceleration 9 3.3.1 Hardware Accelerators 9 3.3.2 Software Optimization 10 Chapter 4 Methodology 12 4.1 Initialization 13 4.2 On-Device Training 14 Chapter 5 Implementation 16 5.1 Pre-processing 16 5.2 Latin Handwritten Character Recognition 17 5.2.1 Dataset and BIE Selection 17 5.2.2 AE Design 17 5.3 Korean Handwritten Character Recognition 21 5.3.1 Dataset and BIE Selection 21 5.3.2 AE Design 21 Chapter 6 On-Device Acceleration 26 6.1 Architecure Optimizations 27 6.2 Compiler Optimizations 29 Chapter 7 Experimental Setup 30 Chapter 8 Evaluation 33 8.1 Latin Handwritten Character Recognition 33 8.2 Korean Handwritten Character Recognition 38 8.3 On-Device Acceleration 40 Chapter 9 Related Work 44 Chapter 10 Conclusion 47 Bibliography 47 μš”μ•½ 55 Acknowledgements 56Maste

    A Micro Power Hardware Fabric for Embedded Computing

    Get PDF
    Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

    LTE implementation on CGRA based SiLago Platform

    Get PDF
    Abstract. This thesis implements long term evolution (LTE) transmission layer on a coarse grained reconfigurable called, dynamically reconfigurable resource array (DRRA). Specifically, we implement physical downlink shared channel baseband signal processing blocks (PDSCH) at high level. The overall implementation follows silicon large grain object (SiLago) design methodology. The methodology employs SiLago blocks instead of mainstream standard cells. The main ambition of this thesis was to prove that a standard as complex as LTE can be implemented using the in-house SiLago framework. The work aims to prove that customized design with efficiency close to application specific integrated circuit (ASIC) for LTE can be generated with the programming ease of MATLAB. During this thesis, we have generated a completely parametrizable LTE standard at high level
    • …
    corecore