Search CORE

123 research outputs found

SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices

Author: Bouganis Christos-Savvas
Montgomerie-Corcoran Alexander
Toupas Petros
Yu Zhewen
Publication venue
Publication date: 04/09/2023
Field of study

AI has led to significant advancements in computer vision and image processing tasks, enabling a wide range of applications in real-life scenarios, from autonomous vehicles to medical imaging. Many of those applications require efficient object detection algorithms and complementary real-time, low latency hardware to perform inference of these algorithms. The YOLO family of models is considered the most efficient for object detection, having only a single model pass. Despite this, the complexity and size of YOLO models can be too computationally demanding for current edge-based platforms. To address this, we present SATAY: a Streaming Architecture Toolflow for Accelerating YOLO. This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications, enabling real-time, edge-based object detection. We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion. These accelerators are generated using an automated toolflow, and can target a range of suitable FPGA devices. We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources. Our toolflow is able to generate accelerator designs which demonstrate competitive performance and energy characteristics to GPU devices, and which outperform current state-of-the-art FPGA accelerators

arXiv.org e-Print Archive

Optimising algorithm and hardware for deep neural networks on FPGAs

Author: Fan Hongxiang
Publication venue: Computing, Imperial College London
Publication date: 01/01/2022
Field of study

This thesis proposes novel algorithm and hardware optimisation approaches to accelerate Deep Neural Networks (DNNs), including both Convolutional Neural Networks (CNNs) and Bayesian Neural Networks (BayesNNs). The first contribution of this thesis is to propose an adaptable and reconfigurable hardware design to accelerate CNNs. By analysing the computational patterns of different CNNs, a unified hardware architecture is proposed for both 2-Dimension and 3-Dimension CNNs. The accelerator is also designed with runtime adaptability, which adopts different parallelism strategies for different convolutional layers at runtime. The second contribution of this thesis is to propose a novel neural network architecture and hardware design co-optimisation approach, which improves the performance of CNNs at both algorithm and hardware levels. Our proposed three-phase co-design framework decouples network training from design space exploration, which significantly reduces the time-cost of the co-optimisation process. The third contribution of this thesis is to propose an algorithmic and hardware co-optimisation framework for accelerating BayesNNs. At the algorithmic level, three categories of structured sparsity are explored to reduce the computational complexity of BayesNNs. At the hardware level, we propose a novel hardware architecture with the aim of exploiting the structured sparsity for BayesNNs. Both algorithmic and hardware optimisations are jointly applied to push the performance limit.Open Acces

Spiral - Imperial College Digital Repository

Development and Evaluation with AI Tools & Devices: Google Edge TPU for General-Purpose Computing

Author: Sakos Chronis
Σάκος Χρόνης
Publication venue
Publication date: 14/10/2022
Field of study

DSpace at NTUA

엣지 클라우드 환경을 위한 연산 오프로딩 시스템

Author: 정혁진
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 전기·정보공학부,2020. 2. 문수묵.The purpose of my dissertation is to build lightweight edge computing systems which provide seamless offloading services even when users move across multiple edge servers. I focused on two specific application domains: 1) web applications and 2) DNN applications. I propose an edge computing system which offload computations from web-supported devices to edge servers. The proposed system exploits the portability of web apps, i.e., distributed as source code and runnable without installation, when migrating the execution state of web apps. This significantly reduces the complexity of state migration, allowing a web app to migrate within a few seconds. Also, the proposed system supports offloading of webassembly, a standard low-level instruction format for web apps, having achieved up to 8.4x speedup compared to offloading of pure JavaScript codes. I also propose incremental offloading of neural network (IONN), which simultaneously offloads DNN execution while deploying a DNN model, thus reducing the overhead of DNN model deployment. Also, I extended IONN to support large-scale edge server environments by proactively migrating DNN layers to edge servers where mobile users are predicted to visit. Simulation with open-source mobility dataset showed that the proposed system could significantly reduce the overhead of deploying a DNN model.본 논문의 목적은 사용자가 이동하는 동안에도 원활한 연산 오프로딩 서비스를 제공하는 경량 엣지 컴퓨팅 시스템을 구축하는 것입니다. 웹 어플리케이션과 인공신경망 (DNN: Deep Neural Network) 이라는 두 가지 어플리케이션 도메인에서 연구를 진행했습니다. 첫째, 웹 지원 장치에서 엣지 서버로 연산을 오프로드하는 엣지 컴퓨팅 시스템을 제안합니다. 제안된 시스템은 웹 앱의 실행 상태를 마이그레이션 할 때 웹 앱의 높은 이식성(소스 코드로 배포되고 설치하지 않고 실행할 수 있음)을 활용합니다. 이를 통해 상태 마이그레이션의 복잡성이 크게 줄여서 웹 앱이 몇 초 내에 마이그레이션 될 수 있습니다. 또한, 제안된 시스템은 웹 어플리케이션을 위한 표준 저수준 인스트럭션인 웹 어셈블리 오프로드를 지원하여 순수한 JavaScript 코드 오프로드와 비교하여 최대 8.4 배의 속도 향상을 달성했습니다. 둘째, DNN 어플리케이션을 엣지 서버에 배포할 때, DNN 모델을 전송하는 동안 DNN 연산을 오프로드 하여 빠르게 성능향상을 달성할 수 있는 점진적 오프로드 방식을 제안합니다. 또한, 모바일 사용자가 방문 할 것으로 예상되는 엣지 서버로 DNN 레이어를 사전에 마이그레이션하여 콜드 스타트 성능을 향상시키는 방식을 제안 합니다. 오픈 소스 모빌리티 데이터셋을 이용한 시뮬레이션에서, DNN 모델을 배포하면서 발생하는 성능 저하를 제안 하는 방식이 크게 줄일 수 있음을 확인하였습니다.Chapter 1. Introduction 1 1.1 Offloading Web App Computations to Edge Servers 1 1.2 Offloading DNN Computations to Edge Servers 3 Chapter 2. Seamless Offloading of Web App Computations 7 2.1 Motivation: Computation-Intensive Web Apps 7 2.2 Mobile Web Worker System 10 2.2.1 Review of HTML5 Web Worker 10 2.2.2 Mobile Web Worker System 11 2.3 Migrating Web Worker 14 2.3.1 Runtime State of Web Worker 15 2.3.2 Snapshot of Mobile Web Worker 16 2.3.3 End-to-End Migration Process 21 2.4 Evaluation 22 2.4.1 Experimental Environment 22 2.4.2 Migration Performance 24 2.4.3 Application Execution Performance 27 Chapter 3. IONN: Incremental Offloading of Neural Network Computations 30 3.1 Motivation: Overhead of Deploying DNN Model 30 3.2 Background 32 3.2.1 Deep Neural Network 33 3.2.2 Offloading of DNN Computations 33 3.3 IONN For DNN Edge Computing 35 3.4 DNN Partitioning 37 3.4.1 Neural Network (NN) Execution Graph 38 3.4.2 Partitioning Algorithm 40 3.4.3 Handling DNNs with Multiple Paths. 43 3.5 Evaluation 45 3.5.1 Experimental Environment 45 3.5.2 DNN Query Performance 46 3.5.3 Accuracy of Prediction Functions 48 3.5.4 Energy Consumption. 49 Chapter 4. PerDNN: Offloading DNN Computations to Pervasive Edge Servers 51 4.1 Motivation: Cold Start Issue 51 4.2 Proposed Offloading System: PerDNN 52 4.2.1 Edge Server Environment 53 4.2.2 Overall Architecture 54 4.2.3 GPU-aware DNN Partitioning 56 4.2.4 Mobility Prediction 59 4.3 Evaluation 63 4.3.1 Performance Gain of Single Client 64 4.3.2 Large-Scale Simulation 65 Chapter 5. RelatedWorks 73 Chapter 6. Conclusion. 78 Chapter 5. RelatedWorks 73 Chapter 6. Conclusion 78 Bibliography 80Docto

SNU Open Repository and Archive

Low power and high performance heterogeneous computing on FPGAs

Author: Ma Liang
Publication venue: Politecnico di Torino
Publication date
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)