Search CORE

12 research outputs found

NGO의 자율성확보를 위한 지원체제 확립방안(A study on the support system to enhance NGO's autonomy)

Author: 박충훈
Publication venue: 수원시
Publication date: 01/01/2002
Field of study

이기종 프로세서로 구성된 모바일 기기에서의 딥러닝 응용 처리량 향상

Author: 박충훈
Publication venue: 서울대학교 대학원
Publication date: 01/02/2024
Field of study

학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2024. 2. 하순회.Improving Deep Learning Application Throughput on Mobile Devices with Heterogeneous Processors Choonghoon Park Department of Computer Science and Engineering College of Engineering The Graduate School Seoul National University The advancements in deep learning have led to the widespread adoption of deep learning applications. Consequently, mobile devices with CPUs, GPUs, and AI accelerators have been introduced. Numerous studies have focused on enhancing the throughput of deep learning applications on heterogeneous processors, but they targeted embedded boards or, in the case of mobile devices, did not consider accelerators. Therefore, this research proposes a method to improve throughput by applying quantization to FP32 networks and pipelining on mobile devices considering heterogeneous processors characteristics. To fully utilize the accelerator, weights and activations are quantized using post-training quantization. Subsequently, a genetic algorithm is used to explore mappings of pipeline stages to maximize the throughput. The effectiveness of this approach has been validated on a mobile device, Samsung Galaxy S22, achieving ×36 to ×43 throughput improve- ments for each network compared to single-processor mapping of FP32 networks. Keywords : Mobile Device, Heterogeneous Processors, Quantization, Pipelining Student Number : 2022-20126딥러닝의 발전으로 딥러닝 응용이 널리 사용되고 있으며, 이에 따라 기존의 CPU, GPU 뿐만 아니라 AI 가속기도 탑재한 기기들이 출시되고 있다. 이러한 이기종 프로세서로 구성된 기기에서 딥러닝 응용의 처리량을 높이기 위한 연구들이 많이 이루어졌다. 해당 연구들은 딥러닝 네트워크의 파라미터들을 양자화하여 처리 속도 및 메모리 접근을 줄이는 식으로 고속화하거나 이기종 프로세서에서 나누어 수행함으로써 처리량 향상을 이루어냈으나, 주로 임베디드 보드를 대상으로 하였거나 모바일 기기를 대상으로 하여도 가속기를 고려하지 않았다. 따라서 본 연구에서는 가속기를 포함한 상용 모바일 기기에서의 양자화 및 파이프라이닝을 적용하여 처리량을 향상하는 방법을 제안한다. 제안하는 방법에서는 훈련 후 양자화 기법으로 가중치와 활성화를 양자화하고, 유전 알고리즘을 통해서 딥러닝 응용의 처리량을 최대화할 수 있는 파이프라인 단계의 매핑을 탐색한다. 해당 방법은 실제 모바일 기기에서 검증하였으며, FP32 네트워크의 단일 프로세서 매핑 대비 네트워크별로 최소 36배에서 43배의 처리량 향상을 달성했다.Abstract i Contents ii List of Figures iv List of Tables v Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Quantization 3 2.2 Pipelining 4 Chapter 3 Background 6 Chapter 4 Proposed Methods 8 4.1 Overview 8 4.2 Post Training Quantization 9 4.3 Deep Learning Application Pipelining 11 4.3.1 Pre-/Post-processing Pipelining 11 4.3.2 Inference Pipelining and Optimal Mapping Search 11 Chapter 5 Experiments 15 5.1 Setups 15 5.2 Quantization Result 16 ii 5.3 Pipelining Result 18 Chapter 6 Conclusion 20 Bibliography 21 요약 24 iii석

SNU Open Repository and Archive