661 research outputs found

    ํšจ์œจ์ ์ธ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์œ„ํ•œ ์–‘์žํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฐ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์œ ์Šน์ฃผ.Deep neural networks (DNN) are becoming increasingly popular and widely adopted for various applications. Energy efficiency of neural networks is critically important for both edge devices and servers. It is imperative to optimize neural networks in terms of both speed and energy consumption while maintaining the accuracy of the network. Quantization is one of the most effective optimization techniques. By reducing the bit-width of activations and weights, both the speed and energy can be improved by executing more computations using the same amount of memory access and computational resources (e.g. silicon chip area and battery). It is expected that computations with 4-bit and lower precision will contribute to the energy efficient and real-time characteristics of future deep learning applications. One major drawback of quantization is the drop in accuracy, resulting from the reduction in the degree of freedom of data representation. Recently, there have been several studies that demonstrated that the inference of DNNs can be accurately done by using 8-bit precision. However, many studies show that the network quantized into 4-bit or less precision suffers from significant quality degradation. Especially, the state-of-the art networks cannot be quantized easily due to their optimized structure. In this dissertation, several methods are proposed that use different approaches to minimize the reduction in the accuracy of the quantized DNNs. Weighted- entropy-based quantization is designed to fully utilize the limited number of quantization levels by maximizing the weighted information of the quantized data. This work shows the potential of multi-bit quantization for both activation and weight. Value-aware quantization, or outlier-aware quantization is designed to support sub-4-bit quantization, while allowing a small amount (1 ~ 3 %) of large values in high precision. This helps the quantized data to maintain the statistics, e.g. mean and variance corresponding to the full-precision, thus minimizing the accuracy drop after quantization. The dedicated hardware accelerator, called OLAccel, is also proposed to maximize the performance of the network quantized by the outlier-aware quantization. The hardware takes advantage of the benefit of reduced precision, i.e. 4-bit, with minimal accuracy drop by the proposed quantization algorithm. Precision-highway is the structural concept that forms an end-to-end high-precision information flow while performing ultra-low-precision computations. This minimizes the accumulated quantization error, which helps to improve the accuracy of the network even with extremely low precision. BLast, the training methodology, and differentiable and unified quantization (DuQ), a novel quantization algorithm, are designed to support sub-4-bit quantization for the optimized mobile networks, i.e. MobileNet-v3. These methods allow the MobileNet-v3 network to be quantized into 4-bit for both activation and weight with negligible accuracy loss.๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ (DNN)๋Š” ํ™œ์šฉ ๋ฒ”์œ„๋ฅผ ์ ์ฐจ ๋„“ํ˜€๊ฐ€๋ฉฐ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์— ์ ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋Š” ์„œ๋ฒ„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ž„๋ฒ ๋””๋“œ ๊ธฐ๊ธฐ์—์„œ๋„ ๋„๋ฆฌ ํ™œ์šฉ๋˜๊ณ  ์žˆ์œผ๋ฉฐ ์ด๋กœ์ธํ•ด ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ๊ฒƒ์€ ์ ์  ๋” ์ค‘์š”ํ•ด์ง€๋Š” ์ค‘์ด๋‹ค. ์ด์ œ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ์†๋„๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•˜๊ณ  ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ์ค„์ด๋Š” ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ์ตœ์ ํ™”๋Š” ํ•„์ˆ˜์  ์š”์†Œ๋กœ ์ž๋ฆฌ์žก์•˜๋‹ค. ์–‘์žํ™”๋Š” ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ ์ตœ์ ํ™” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑ๋„ (activation) ๋ฐ ํ•™์Šต ๊ฐ€์ค‘์น˜ (weight)๋ฅผ ์ €์žฅํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ๋น„ํŠธ ์ˆ˜๋ฅผ ์ค„์ž„์œผ๋กœ์จ ๋™์ผํ•œ ์–‘์˜ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ๊ณผ ์—ฐ์‚ฐ ๋น„์šฉ (์นฉ ๋ฉด์  ๋ฐ ์—๋„ˆ์ง€ ์†Œ๋ชจ ๋“ฑ)์œผ๋กœ ๋” ๋งŽ์€ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•ด์ง€๋ฉฐ ์ด๋กœ์ธํ•ด ์†๋„์™€ ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ๋™์‹œ์— ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ถ”ํ›„ ๋”ฅ ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ํ•„์š”ํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ธก๋˜๋Š” ์—๋„ˆ์ง€ ํšจ์œจ ๋ฐ ์—ฐ์‚ฐ ์†๋„๋ฅผ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ 4 ๋น„ํŠธ ํ˜น์€ ๋” ์ ์€ ์ •๋ฐ€๋„ ๊ธฐ๋ฐ˜์˜ ์–‘์žํ™” ์—ฐ์‚ฐ์ด ์ง€๋Œ€ํ•œ ๊ณตํ—Œ์„ ํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์–‘์žํ™”์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋‹จ์  ์ค‘ ํ•˜๋‚˜๋Š” ๋ฐ์ดํ„ฐ์˜ ํ‘œํ˜„ํ˜•์„ ์ œํ•œํ•˜์—ฌ ์ž์œ ๋„๊ฐ€ ๋–จ์–ด์ง€๊ฒŒ ๋จ์œผ๋กœ์„œ ๋ฐœ์ƒํ•˜๋Š” ์ •ํ™•๋„์˜ ์†์‹ค์ด๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๋“ค์ด ์ง„ํ–‰์ค‘์ด๋‹ค. ์ตœ๊ทผ ์ผ๋ถ€ ์—ฐ๊ตฌ๋“ค์€ 8 ๋น„ํŠธ์˜ ์ •๋ฐ€๋„์—์„œ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉํ•ด ๊ฒฐ๊ณผ๋ฅผ ์ถ”๋ก  (inference)ํ•˜๋Š”๋ฐ ์ •ํ™•๋„ ์†์‹ค์ด ๊ฑฐ์˜ ์—†์Œ์„ ๋ณด๊ณ ํ•˜๊ณ  ์žˆ๋‹ค. ๋ฐ˜๋ฉด ๊ทธ ์™ธ์˜ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๋“ค์„ ํ†ตํ•ด 4 ๋น„ํŠธ ํ˜น์€ ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ ์–‘์žํ™”๋ฅผ ์ ์šฉํ–ˆ์„ ๋•Œ ๋งŽ์€ ๋„คํŠธ์›Œํฌ๋“ค์˜ ์ •ํ™•๋„๊ฐ€ ํฌ๊ฒŒ ์†์ƒ๋˜๋Š” ํ˜„์ƒ๋„ ํ•จ๊ป˜ ๋ณด๊ณ ๋˜๊ณ  ์žˆ๋‹ค. ํŠนํžˆ ์ตœ๊ทผ ์ œ์•ˆ๋œ ๋„คํŠธ์›Œํฌ๋“ค์˜ ๊ฒฝ์šฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋„์ž…ํ•œ ์ตœ์ ํ™”๋œ ๊ตฌ์กฐ๊ฐ€ ์–‘์žํ™” ํ•˜๊ธฐ ์–ด๋ ค์šด ํŠน์„ฑ์„ ๊ฐ€์ ธ ์ด๋Ÿฌํ•œ ํ˜„์ƒ์ด ์‹ฌํ™”๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์–‘์žํ™”๋œ DNN์˜ ์ •ํ™•๋„ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ์œ„ํ•œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ฐ€์ค‘ ์—”ํŠธ๋กœํ”ผ ๊ธฐ๋ฐ˜ ์–‘์žํ™” (Weighted-entropy-based quantization)์€ ์ œํ•œ๋œ ๊ฐœ์ˆ˜์˜ ์–‘์žํ™” ๋ ˆ๋ฒจ์„ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์–‘์žํ™”๋œ ๋ฐ์ดํ„ฐ์˜ ์ •๋ณด๋Ÿ‰์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์–‘์žํ™”๋ฅผ ์ง„ํ–‰ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ์•„์ฃผ ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ๋„ ๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑ๋„์™€ ํ•™์Šต ๊ฐ€์ค‘์น˜ ๋ชจ๋‘์˜ ์–‘์žํ™”๊ฐ€ ์ ์šฉ ๊ฐ€๋Šฅํ•จ์„ ๋ณด์˜€๋‹ค. ๊ฐ’-์˜์‹ ์–‘์žํ™” (value-aware quantization), ํ˜น์€ ์˜ˆ์™ธ-์˜์‹ ์–‘์žํ™” (outlier-aware quantization)๋Š” ๋นˆ๋„๋Š” ๋‚ฎ์ง€๋งŒ ํฐ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํฐ ์ •๋ฐ€๋„๋กœ ์ €์žฅํ•˜๋Š” ๋Œ€์‹  ๋‚˜๋จธ์ง€ ๋ฐ์ดํ„ฐ์— 4 ๋น„ํŠธ ์ดํ•˜์˜ ์–‘์žํ™”๋ฅผ ์ ์šฉํ•˜๋„๋ก ์„ค๊ณ„๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ์ด๋Š” ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ ๊ฐ™์€ ํŠน์„ฑ์ด ์–‘์žํ™”๋œ ํ›„์—๋„ ์œ ์ง€ํ•˜๋„๋ก ๋„์™€์ฃผ์–ด ์–‘์žํ™”๋œ ๋„คํŠธ์›Œํฌ์˜ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋Š”๋ฐ ๊ธฐ์—ฌํ•œ๋‹ค. ์ด์— ๋”ํ•˜์—ฌ OLAccel์ด๋ผ ๋ช…๋ช…๋œ ํŠนํ™” ๊ฐ€์†๊ธฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๊ฐ€์†๊ธฐ๋Š” ๊ฐ’-์˜์‹ ์–‘์žํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ์–‘์žํ™”๋œ ๋„คํŠธ์›Œํฌ๋ฅผ ๊ฐ€์†ํ•จ์œผ๋กœ์จ ์ •ํ™•๋„ ๊ฐ์†Œ๋Š” ์ตœ์†Œํ™” ํ•˜๋ฉด์„œ ๋‚ฎ์€ ์ •๋ฐ€๋„์˜ ์„ฑ๋Šฅ ์ด๋“์„ ์ตœ๋Œ€ํ™”ํ•œ๋‹ค. ๊ณ ์ •๋ฐ€๋„-ํ†ต๋กœ ๊ตฌ์กฐ (precision-highway)๋Š” ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•˜์—ฌ ์ดˆ์ €์ •๋ฐ€๋„ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ๋„ ๊ณ ์ •๋ฐ€๋„ ์ •๋ณด ํ†ต๋กœ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ด๋Š” ์–‘์žํ™”๋กœ ์ธํ•˜์—ฌ ์—๋Ÿฌ๊ฐ€ ๋ˆ„์ ๋˜๋Š” ํ˜„์ƒ์„ ์™„ํ™”ํ•˜์—ฌ ๋งค์šฐ ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ ์ •ํ™•๋„๋ฅผ ๊ฐœ์„ ํ•˜๋Š”๋ฐ ๊ธฐ์—ฌํ•œ๋‹ค. ํ•™์Šต ๊ธฐ๋ฒ•์ธ BLast์™€ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๊ณ  ํ†ตํ•ฉ๋œ ์–‘์žํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ (DuQ)๋Š” MobileNet-v3๊ณผ ๊ฐ™์€ ์ตœ์ ํ™”๋œ ๋ชจ๋ฐ”์ผํ–ฅ ๋„คํŠธ์›Œํฌ๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•๋“ค์„ ํ†ตํ•ด ๋ฏธ๋ฏธํ•œ ์ •ํ™•๋„ ์†์‹ค๋งŒ์œผ๋กœ MobileNet-v3์˜ ํ™œ์„ฑ๋„ ๋ฐ ํ•™์Šต ๊ฐ€์ค‘์น˜ ๋ชจ๋‘๋ฅผ 4 ๋น„ํŠธ ์ •๋ฐ€๋„๋กœ ์–‘์žํ™”ํ•˜๋Š”๋ฐ ์„ฑ๊ณตํ•˜์˜€๋‹ค.Chapter 1. Introduction 1 Chapter 2. Background and RelatedWork 4 Chapter 3. Weighted-entropy-based Quantization 15 3.1 Introduction 15 3.2 Motivation 17 3.3 Quantization based on Weighted Entropy 20 3.3.1 Weight Quantization 20 3.3.2 Activation Quantization 24 3.3.3 IntegratingWeight/Activation Quantization into the Training Algorithm 27 3.4 Experiment 28 3.4.1 Image Classification: AlexNet, GoogLeNet and ResNet-50/101 28 3.4.2 Object Detection: R-FCN with ResNet-50 35 3.4.3 Language Modeling: An LSTM 37 3.5 Conclusion 38 Chapter 4. Value-aware Quantization for Training and Inference of Neural Networks 40 4.1 Introduction 40 4.2 Motivation 41 4.3 Proposed Method 43 4.3.1 Quantized Back-Propagation 44 4.3.2 Back-Propagation of Full-Precision Loss 46 4.3.3 Potential of Further Reduction in Computation Cost 47 4.3.4 Local Sorting in Data Parallel Training 48 4.3.5 ReLU and Value-aware Quantization (RV-Quant) 49 4.3.6 Activation Annealing 50 4.3.7 Quantized Inference 50 4.4 Experiments 51 4.4.1 Training Results 52 4.4.2 Inference Results 59 4.4.3 LSTM Language Model 61 4.5 Conclusions 62 Chapter 5. Energy-efficient Neural Network Accelerator Based on Outlier-aware Low-precision Computation 63 5.1 Introduction 63 5.2 Proposed Architecture 65 5.2.1 Overall Structure 65 5.2.2 Dataflow 68 5.2.3 PE Cluster 72 5.2.4 Normal PE Group 72 5.2.5 Outlier PE Group and Cluster Output Tri-buffer 75 5.3 Evaluation Methodology 78 5.4 Experimental Results 80 5.5 Conclusion 90 Chapter 6. Precision Highway for Ultra Low-Precision Quantization 92 6.1 Introduction 92 6.2 Proposed Method 93 6.2.1 Precision Highway on Residual Network 94 6.2.2 Precision Highway on Recurrent Neural Network 96 6.2.3 Practical Issues with Precision Highway 98 6.3 Training 99 6.3.1 LinearWeight Quantization based on Laplace Distribution Model 99 6.3.2 Fine-tuning for Weight/Activation Quantization 100 6.4 Experiments 101 6.4.1 Experimental Setup 101 6.4.2 Analysis of Accumulated Quantization Error 101 6.4.3 Loss Surface Analysis of Quantized Model Training 103 6.4.4 Evaluating the Accuracy of Quantized Model 103 6.4.5 Hardware Cost Evaluation of Quantized Model 108 6.5 Conclusion 109 Chapter 7. Towards Sub-4-bit Quantization of Optimized Mobile Netowrks 114 7.1 Introduction 114 7.2 BLast Training 117 7.2.1 Notation 118 7.2.2 Observation 118 7.2.3 Activation Instability Metric 120 7.2.4 BLast Training 122 7.3 Differentiable and Unified Quantization 124 7.3.1 Rounding and Truncation Errors 124 7.3.2 Limitations of State-of-the-Art Methods 124 7.3.3 Proposed Method: DuQ 126 7.3.4 Handling Negative Values 128 7.4 Experiments 131 7.4.1 Accuracy on ImageNet Dataset 131 7.4.2 Discussion on Fused-BatchNorm 133 7.4.3 Ablation Study 134 7.5 Conclusion 137 Chapter 8 Conclusion 138 Bibliography 141 ๊ตญ๋ฌธ์ดˆ๋ก 154 Acknowledgements 157Docto

    ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์˜ ํ–ฅ์ƒ์„ ์œ„ํ•œ ๊นŠ์€ ์‹ ๊ฒฝ๋ง ์–‘์žํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์„ฑ์›์šฉ.์ตœ๊ทผ ๊นŠ์€ ์‹ ๊ฒฝ๋ง(deep neural network, DNN)์€ ์˜์ƒ, ์Œ์„ฑ ์ธ์‹ ๋ฐ ํ•ฉ์„ฑ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๋Œ€๋ถ€๋ถ„์˜ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ๋งŽ์€ ๊ฐ€์ค‘์น˜(parameter) ์ˆ˜์™€ ๊ณ„์‚ฐ๋Ÿ‰์„ ์š”๊ตฌํ•˜์—ฌ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์—์„œ์˜ ๋™์ž‘์„ ๋ฐฉํ•ดํ•œ๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ๋„ ์ž˜ ๋™์ž‘ํ•˜๋Š” ์ธ๊ฐ„์˜ ์‹ ๊ฒฝ์„ธํฌ๋ฅผ ๋ชจ๋ฐฉํ•˜์˜€๊ธฐ ๋–„๋ฌธ์— ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ๋„ ์ž˜ ๋™์ž‘ํ•  ๊ฐ€๋Šฅ์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ์–‘์žํ™”(quantization)๋Š” ์ด๋Ÿฌํ•œ ํŠน์ง•์„ ์ด์šฉํ•œ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๊นŠ์€ ์‹ ๊ฒฝ๋ง ๊ณ ์ •์†Œ์ˆ˜์  ์–‘์žํ™”๋Š” 8-bit ์ด์ƒ์˜ ๋‹จ์–ด๊ธธ์ด์—์„œ ๋ถ€๋™์†Œ์ˆ˜์ ๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜์žˆ์ง€๋งŒ, ๊ทธ๋ณด๋‹ค ๋‚ฎ์€ 1-, 2-bit์—์„œ๋Š” ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง„๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ๋ถˆ๊ท ํ˜• ์–‘์žํ™”๊ธฐ๋‚˜ ์ ์‘์  ์–‘์žํ™” ๋“ฑ์˜ ๋” ์ •๋ฐ€ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง ์–‘์žํ™” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด์˜ ์—ฐ๊ตฌ์™€ ๋งค์šฐ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ณ ์ • ์†Œ์ˆ˜์  ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™”๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š”๋ฐ ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์œผ๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด ์žฌํ›ˆ๋ จ(retraining) ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์–‘์žํ™”๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ์„ฑ๋Šฅ์„ ๋ถ„์„ํ•œ๋‹ค. ์„ฑ๋Šฅ ๋ถ„์„์€ ๋ ˆ์ด์–ด๋ณ„ ๋ฏผ๊ฐ๋„ ์ธก์ •(layer-wise sensitivity analysis)์— ๊ธฐ๋ฐ˜ํ•œ๋‹ค. ๋˜ํ•œ ์–‘์žํ™” ๋ชจ๋ธ์˜ ๋„“์ด์™€ ๊นŠ์ด์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ๋„ ๋ถ„์„ํ•œ๋‹ค. ๋ถ„์„๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์–‘์žํ™” ์Šคํ… ์ ์‘ ํ›ˆ๋ จ๋ฒ•(quantization step size adaptation)๊ณผ ์ ์ง„์  ์–‘์žํ™” ํ›ˆ๋ จ ๋ฐฉ๋ฒ•(gradual quantization)์„ ์ œ์•ˆํ•œ๋‹ค. ์–‘์žํ™”๋œ ์‹ ๊ฒฝ๋ง ํ›ˆ๋ จ์‹œ ์–‘์žํ™” ๋…ธ์ด์ฆˆ๋ฅผ ์ ๋‹นํžˆ ์กฐ์ •ํ•˜์—ฌ ์†์‹ค ํ‰๋ฉด(loss surface)์ƒ์— ํ‰ํ‰ํ•œ ๋ฏธ๋‹ˆ๋งˆ(minima)์— ๋„๋‹ฌ ํ•  ์ˆ˜ ์žˆ๋Š” ์–‘์žํ™” ํ›ˆ๋ จ ๋ฐฉ๋ฒ• ๋˜ํ•œ ์ œ์•ˆํ•œ๋‹ค. HLHLp (high-low-high-low-precision)๋กœ ๋ช…๋ช…๋œ ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์€ ์–‘์žํ™” ์ •๋ฐ€๋„๋ฅผ ํ›ˆ๋ จ์ค‘์— ๋†’๊ฒŒ-๋‚ฎ๊ฒŒ-๋†’๊ฒŒ-๋‚ฎ๊ฒŒ ๋ฐ”๊พธ๋ฉด์„œ ํ›ˆ๋ จํ•œ๋‹ค. ํ›ˆ๋ จ๋ฅ (learning rate)๋„ ์–‘์žํ™” ์Šคํ… ์‚ฌ์ด์ฆˆ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์œ ๋™์ ์œผ๋กœ ๋ฐ”๋€๋‹ค. ์ œ์•ˆํ•˜๋Š” ํ›ˆ๋ จ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ํ›ˆ๋ จ๋œ ์–‘์žํ™” ๋ชจ๋ธ์— ๋น„ํ•ด ์ƒ๋‹นํžˆ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋˜ํ•œ ์„ ํ›ˆ๋ จ๋œ ์„ ์ƒ ๋ชจ๋ธ๋กœ ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ์ง€์‹ ์ฆ๋ฅ˜(knowledge distillation, KD) ๊ธฐ์ˆ ์„ ์ด์šฉํ•˜์—ฌ ์–‘์žํ™”์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ ์„ ์ƒ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ง€์‹ ์ฆ๋ฅ˜์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ถ„์„ํ•œ๋‹ค. ๋ถ€๋™์†Œ์ˆ˜์  ์„ ์ƒ๋ชจ๋ธ๊ณผ ์–‘์žํ™” ๋œ ์„ ์ƒ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ์‹œํ‚จ ๊ฒฐ๊ณผ ์„ ์ƒ ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด๋‚ด๋Š” ์†Œํ”„ํŠธ๋งฅ์Šค(softmax) ๋ถ„ํฌ๊ฐ€ ์ง€์‹์ฆ๋ฅ˜ํ•™์Šต ๊ฒฐ๊ณผ์— ํฌ๊ฒŒ ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ์†Œํ”„ํŠธ๋งฅ์Šค ๋ถ„ํฌ๋Š” ์ง€์‹์ฆ๋ฅ˜์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ํ†ตํ•ด ์กฐ์ ˆ๋ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ง€์‹์ฆ๋ฅ˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋“ค๊ฐ„์˜ ์—ฐ๊ด€๊ด€๊ณ„ ๋ถ„์„์„ ํ†ตํ•ด ๋†’์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ ์ ์ง„์ ์œผ๋กœ ์†Œํ”„ํŠธ ์†์‹ค ํ•จ์ˆ˜ ๋ฐ˜์˜ ๋น„์œจ์„ ํ›ˆ๋ จ์ค‘์— ์ค„์—ฌ๊ฐ€๋Š” ์ ์ง„์  ์†Œํ”„ํŠธ ์†์‹ค ๊ฐ์†Œ(gradual soft loss reducing)๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ์–‘์žํ™”๋ชจ๋ธ์„ ํ‰๊ท ๋‚ด์–ด ๋†’์€ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ฐ–๋Š” ์–‘์žํ™” ๋ชจ๋ธ์„ ์–ป๋Š” ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์ธ ํ™•๋ฅ  ์–‘์žํ™” ๊ฐ€์ค‘์น˜ ํ‰๊ท (stochastic quantized weight averaging, SQWA) ํ›ˆ๋ จ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ (1) ๋ถ€๋™์†Œ์ˆ˜์  ํ›ˆ๋ จ, (2) ๋ถ€๋™์†Œ์ˆ˜์  ๋ชจ๋ธ์˜ ์ง์ ‘ ์–‘์žํ™”(direct quantization), (3) ์žฌํ›ˆ๋ จ(retraining)๊ณผ์ •์—์„œ ์ง„๋™ ํ›ˆ๋ จ์œจ(cyclical learning rate)์„ ์‚ฌ์šฉํ•˜์—ฌ ํœธ๋ จ์œจ์ด ์ง„๋™๋‚ด์—์„œ ๊ฐ€์žฅ ๋‚ฎ์„ ๋•Œ ๋ชจ๋ธ๋“ค์„ ์ €์žฅ, (4) ์ €์žฅ๋œ ๋ชจ๋ธ๋“ค์„ ํ‰๊ท , (5) ํ‰๊ท  ๋œ ๋ชจ๋ธ์„ ๋‚ฎ์€ ํ›ˆ๋ จ์œจ๋กœ ์žฌ์กฐ์ • ํ•˜๋Š” ๋‹ค์ค‘ ๋‹จ๊ณ„ ํ›ˆ๋ จ๋ฒ•์ด๋‹ค. ์ถ”๊ฐ€๋กœ ์–‘์žํ™” ๊ฐ€์ค‘์น˜ ๋„๋ฉ”์ธ์—์„œ ์—ฌ๋Ÿฌ ์–‘์žํ™” ๋ชจ๋ธ๋“ค์„ ํ•˜๋‚˜์˜ ์†์‹คํ‰๋ฉด๋‚ด์— ๋™์‹œ์— ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” ์‹ฌ์ƒ(visualization) ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์‹ฌ์ƒ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด SQWA๋กœ ํ›ˆ๋ จ๋œ ์–‘์žํ™” ๋ชจ๋ธ์€ ์†์‹คํ‰๋ฉด์˜ ๊ฐ€์šด๋ฐ ๋ถ€๋ถ„์— ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์˜€๋‹ค.Deep neural networks (DNNs) achieve state-of-the-art performance for various applications such as image recognition and speech synthesis across different fields. However, their implementation in embedded systems is difficult owing to the large number of associated parameters and high computational costs. In general, DNNs operate well using low-precision parameters because they mimic the operation of human neurons; therefore, quantization of DNNs could further improve their operational performance. In many applications, word-length larger than 8 bits leads to DNN performance comparable to that of a full-precision model; however, shorter word-length such as those of 1 or 2 bits can result in significant performance degradation. To alleviate this problem, complex quantization methods implemented via asymmetric or adaptive quantizers have been employed in previous works. In contrast, in this study, we propose a different approach for quantization of DNNs. In particular, we focus on improving the generalization capability of quantized DNNs (QDNNs) instead of employing complex quantizers. To this end, first, we analyze the performance characteristics of quantized DNNs using a retraining algorithm; we employ layer-wise sensitivity analysis to investigate the quantization characteristics of each layer. In addition, we analyze the differences in QDNN performance for different quantized network sizes. Based on our analyses, two simple quantization training techniques, namely \textit{adaptive step size retraining} and \textit{gradual quantization} are proposed. Furthermore, a new training scheme for QDNNs is proposed, which is referred to as high-low-high-low-precision (HLHLp) training scheme, that allows the network to achieve flat minima on its loss surface with the aid of quantization noise. As the name suggests, the proposed training method employs high-low-high-low precision for network training in an alternating manner. Accordingly, the learning rate is also abruptly changed at each stage. Our obtained analysis results include that the proposed training technique leads to good performance improvement for QDNNs compared with previously reported fine tuning-based quantization schemes. Moreover, the knowledge distillation (KD) technique that utilizes a pre-trained teacher model for training a student network is exploited for the optimization of the QDNNs. We explore the effect of teacher network selection and investigate that of different hyperparameters on the quantization of DNNs using KD. In particular, we use several large floating-point and quantized models as teacher networks. Our experiments indicate that, for effective KD training, softmax distribution produced by a teacher network is more important than its performance. Furthermore, because softmax distribution of a teacher network can be controlled using KD hyperparameters, we analyze the interrelationship of each KD component for QDNN training. We show that even a small teacher model can achieve the same distillation performance as a larger teacher model. We also propose the gradual soft loss reducing (GSLR) technique for robust KD-based QDNN optimization, wherein the mixing ratio of hard and soft losses during training is controlled. In addition, we present a new QDNN optimization approach, namely \textit{stochastic quantized weight averaging} (SQWA), to design low-precision DNNs with good generalization capability using model averaging. The proposed approach includes (1) floating-point model training, (2) direct quantization of weights, (3) capture of multiple low-precision models during retraining with cyclical learning rate, (4) averaging of the captured models, and (5) re-quantization of the averaged model and its fine-tuning with low learning rate. Additionally, we present a loss-visualization technique for the quantized weight domain to elucidate the behavior of the proposed method. Our visualization results indicate that a QDNN optimized using our proposed approach is located near the center of the flat minimum on the loss surface.1.Introduction 1 1.1 Quantization of Deep Neural Networks 1 1.2 Generalization Capability of DNNs 3 1.3 Improved Generalization Capability of QDNNs 3 1.4 Outline of the Dissertation 5 2. Analysis of Fixedpoint Quantization of Deep Neural Networks 6 2.1 Introduction 6 2.2 Fixedpoint Performance Analysis of Deep Neural Networks 8 2.2.1 Model Design of Deep Neural Networks 8 2.2.2 Retrainbased Weight Quantization 10 2.2.3 Quantization Sensitivity Analysis 12 2.2.4 Empirical Analysis 13 2.3 Step Size Adaptation and Gradual Quantization for Retraining of DeepNeural Networks 22 2.3.1 Stepsize adaptation during retraining 22 2.3.2 Gradual quantization scheme 24 2.3.3 Experimental Results 24 2.4 Concluding remarks 30 3. HLHLp:Quantized Neural Networks Training for Reaching Flat Minimain Loss Surface 32 3.1 Introduction 32 3.2 Related Works 33 3.2.1 Quantization of Deep Neural Networks 33 3.2.2 Flat Minima in Loss Surfaces 34 3.3 Training QDNN for IMproved Generalization Capability 35 3.3.1 Analysis of Training with Quantized Weights 35 3.3.2 Highlowhighlowprecision Training 38 3.4 Experimental Results 40 3.4.1 Image Classification with CNNs 41 3.4.2 Language Modeling on PTB and WikiText2 44 3.4.3 Speech Recognition on WSJ Corpus 48 3.4.4 Discussion 49 3.5 Concluding Remarks 55 4 Knowledge Distillation for Optimization of Quantized Deep Neural Networks 56 4.1 Introduction 56 4.2 Quantized Deep Neural Netowrk Training Using Knowledge Distillation 57 4.2.1 Quantization of deep neural networks and knowledge distillation 58 4.2.2 Teacher model selection for KD 59 4.2.3 Discussion on hyperparameters of KD 62 4.3 Experimental Results 62 4.3.1 Experimental setup 62 4.3.2 Results on CIFAR10 and CIFAR100 64 4.3.3 Model size and temperature 66 4.3.4 Gradual Soft Loss Reducing 68 4.4 Concluding Remarks 68 5 SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of LowPrecision Deep Neural Networks 70 5.1 Introduction 70 5.2 Related works 71 5.2.1 Quantization of deep neural networks for efficient implementations 71 5.2.2 Stochastic weight averaging and losssurface visualization 72 5.3 Quantization of DNN and loss surface visualization 73 5.3.1 Quantization of deep neural networks 73 5.3.2 Loss surface visualization for QDNNs 75 5.4 SQWA algorithm 76 5.5 Experimental results 80 5.5.1 CIFAR100 80 5.5.2 ImageNet 87 5.6 Concluding remarks 90 6 Conclusion 92 Abstract (In Korean) 110Docto
    • โ€ฆ
    corecore