123 research outputs found

    SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices

    Full text link
    AI has led to significant advancements in computer vision and image processing tasks, enabling a wide range of applications in real-life scenarios, from autonomous vehicles to medical imaging. Many of those applications require efficient object detection algorithms and complementary real-time, low latency hardware to perform inference of these algorithms. The YOLO family of models is considered the most efficient for object detection, having only a single model pass. Despite this, the complexity and size of YOLO models can be too computationally demanding for current edge-based platforms. To address this, we present SATAY: a Streaming Architecture Toolflow for Accelerating YOLO. This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications, enabling real-time, edge-based object detection. We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion. These accelerators are generated using an automated toolflow, and can target a range of suitable FPGA devices. We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources. Our toolflow is able to generate accelerator designs which demonstrate competitive performance and energy characteristics to GPU devices, and which outperform current state-of-the-art FPGA accelerators

    Optimising algorithm and hardware for deep neural networks on FPGAs

    Get PDF
    This thesis proposes novel algorithm and hardware optimisation approaches to accelerate Deep Neural Networks (DNNs), including both Convolutional Neural Networks (CNNs) and Bayesian Neural Networks (BayesNNs). The first contribution of this thesis is to propose an adaptable and reconfigurable hardware design to accelerate CNNs. By analysing the computational patterns of different CNNs, a unified hardware architecture is proposed for both 2-Dimension and 3-Dimension CNNs. The accelerator is also designed with runtime adaptability, which adopts different parallelism strategies for different convolutional layers at runtime. The second contribution of this thesis is to propose a novel neural network architecture and hardware design co-optimisation approach, which improves the performance of CNNs at both algorithm and hardware levels. Our proposed three-phase co-design framework decouples network training from design space exploration, which significantly reduces the time-cost of the co-optimisation process. The third contribution of this thesis is to propose an algorithmic and hardware co-optimisation framework for accelerating BayesNNs. At the algorithmic level, three categories of structured sparsity are explored to reduce the computational complexity of BayesNNs. At the hardware level, we propose a novel hardware architecture with the aim of exploiting the structured sparsity for BayesNNs. Both algorithmic and hardware optimisations are jointly applied to push the performance limit.Open Acces

    ์—ฃ์ง€ ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์„ ์œ„ํ•œ ์—ฐ์‚ฐ ์˜คํ”„๋กœ๋”ฉ ์‹œ์Šคํ…œ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ๋ฌธ์ˆ˜๋ฌต.The purpose of my dissertation is to build lightweight edge computing systems which provide seamless offloading services even when users move across multiple edge servers. I focused on two specific application domains: 1) web applications and 2) DNN applications. I propose an edge computing system which offload computations from web-supported devices to edge servers. The proposed system exploits the portability of web apps, i.e., distributed as source code and runnable without installation, when migrating the execution state of web apps. This significantly reduces the complexity of state migration, allowing a web app to migrate within a few seconds. Also, the proposed system supports offloading of webassembly, a standard low-level instruction format for web apps, having achieved up to 8.4x speedup compared to offloading of pure JavaScript codes. I also propose incremental offloading of neural network (IONN), which simultaneously offloads DNN execution while deploying a DNN model, thus reducing the overhead of DNN model deployment. Also, I extended IONN to support large-scale edge server environments by proactively migrating DNN layers to edge servers where mobile users are predicted to visit. Simulation with open-source mobility dataset showed that the proposed system could significantly reduce the overhead of deploying a DNN model.๋ณธ ๋…ผ๋ฌธ์˜ ๋ชฉ์ ์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ด๋™ํ•˜๋Š” ๋™์•ˆ์—๋„ ์›ํ™œํ•œ ์—ฐ์‚ฐ ์˜คํ”„๋กœ๋”ฉ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒฝ๋Ÿ‰ ์—ฃ์ง€ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์›น ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ์ธ๊ณต์‹ ๊ฒฝ๋ง (DNN: Deep Neural Network) ์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋„๋ฉ”์ธ์—์„œ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฒซ์งธ, ์›น ์ง€์› ์žฅ์น˜์—์„œ ์—ฃ์ง€ ์„œ๋ฒ„๋กœ ์—ฐ์‚ฐ์„ ์˜คํ”„๋กœ๋“œํ•˜๋Š” ์—ฃ์ง€ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ์‹œ์Šคํ…œ์€ ์›น ์•ฑ์˜ ์‹คํ–‰ ์ƒํƒœ๋ฅผ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ํ•  ๋•Œ ์›น ์•ฑ์˜ ๋†’์€ ์ด์‹์„ฑ(์†Œ์Šค ์ฝ”๋“œ๋กœ ๋ฐฐํฌ๋˜๊ณ  ์„ค์น˜ํ•˜์ง€ ์•Š๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Œ)์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ƒํƒœ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์˜ ๋ณต์žก์„ฑ์ด ํฌ๊ฒŒ ์ค„์—ฌ์„œ ์›น ์•ฑ์ด ๋ช‡ ์ดˆ ๋‚ด์— ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ œ์•ˆ๋œ ์‹œ์Šคํ…œ์€ ์›น ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์œ„ํ•œ ํ‘œ์ค€ ์ €์ˆ˜์ค€ ์ธ์ŠคํŠธ๋Ÿญ์…˜์ธ ์›น ์–ด์…ˆ๋ธ”๋ฆฌ ์˜คํ”„๋กœ๋“œ๋ฅผ ์ง€์›ํ•˜์—ฌ ์ˆœ์ˆ˜ํ•œ JavaScript ์ฝ”๋“œ ์˜คํ”„๋กœ๋“œ์™€ ๋น„๊ตํ•˜์—ฌ ์ตœ๋Œ€ 8.4 ๋ฐฐ์˜ ์†๋„ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‘˜์งธ, DNN ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์—ฃ์ง€ ์„œ๋ฒ„์— ๋ฐฐํฌํ•  ๋•Œ, DNN ๋ชจ๋ธ์„ ์ „์†กํ•˜๋Š” ๋™์•ˆ DNN ์—ฐ์‚ฐ์„ ์˜คํ”„๋กœ๋“œ ํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ ์„ฑ๋Šฅํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์ ์ง„์  ์˜คํ”„๋กœ๋“œ ๋ฐฉ์‹์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ฐ”์ผ ์‚ฌ์šฉ์ž๊ฐ€ ๋ฐฉ๋ฌธ ํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜๋Š” ์—ฃ์ง€ ์„œ๋ฒ„๋กœ DNN ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์ „์— ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•˜์—ฌ ์ฝœ๋“œ ์Šคํƒ€ํŠธ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆ ํ•ฉ๋‹ˆ๋‹ค. ์˜คํ”ˆ ์†Œ์Šค ๋ชจ๋นŒ๋ฆฌํ‹ฐ ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ, DNN ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋ฉด์„œ ๋ฐœ์ƒํ•˜๋Š” ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ œ์•ˆ ํ•˜๋Š” ๋ฐฉ์‹์ด ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.Chapter 1. Introduction 1 1.1 Offloading Web App Computations to Edge Servers 1 1.2 Offloading DNN Computations to Edge Servers 3 Chapter 2. Seamless Offloading of Web App Computations 7 2.1 Motivation: Computation-Intensive Web Apps 7 2.2 Mobile Web Worker System 10 2.2.1 Review of HTML5 Web Worker 10 2.2.2 Mobile Web Worker System 11 2.3 Migrating Web Worker 14 2.3.1 Runtime State of Web Worker 15 2.3.2 Snapshot of Mobile Web Worker 16 2.3.3 End-to-End Migration Process 21 2.4 Evaluation 22 2.4.1 Experimental Environment 22 2.4.2 Migration Performance 24 2.4.3 Application Execution Performance 27 Chapter 3. IONN: Incremental Offloading of Neural Network Computations 30 3.1 Motivation: Overhead of Deploying DNN Model 30 3.2 Background 32 3.2.1 Deep Neural Network 33 3.2.2 Offloading of DNN Computations 33 3.3 IONN For DNN Edge Computing 35 3.4 DNN Partitioning 37 3.4.1 Neural Network (NN) Execution Graph 38 3.4.2 Partitioning Algorithm 40 3.4.3 Handling DNNs with Multiple Paths. 43 3.5 Evaluation 45 3.5.1 Experimental Environment 45 3.5.2 DNN Query Performance 46 3.5.3 Accuracy of Prediction Functions 48 3.5.4 Energy Consumption. 49 Chapter 4. PerDNN: Offloading DNN Computations to Pervasive Edge Servers 51 4.1 Motivation: Cold Start Issue 51 4.2 Proposed Offloading System: PerDNN 52 4.2.1 Edge Server Environment 53 4.2.2 Overall Architecture 54 4.2.3 GPU-aware DNN Partitioning 56 4.2.4 Mobility Prediction 59 4.3 Evaluation 63 4.3.1 Performance Gain of Single Client 64 4.3.2 Large-Scale Simulation 65 Chapter 5. RelatedWorks 73 Chapter 6. Conclusion. 78 Chapter 5. RelatedWorks 73 Chapter 6. Conclusion 78 Bibliography 80Docto

    Low power and high performance heterogeneous computing on FPGAs

    Get PDF
    L'abstract รจ presente nell'allegato / the abstract is in the attachmen
    • โ€ฆ
    corecore