Search CORE

108 research outputs found

Optimization of Mixed-precision Neural Architecture with Knowledge Distillation

Author: Nguyen Tuan Nghia
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (석사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 이혁재.Quantization은 메모리와 계산 능력이 제한된 edge device에서 deep neural network를 수행하기 위해 필수적인 프로세스이다. 그러나 bit-width 및 transformation function을 포함하여 똑같은 quantization 을 모든 레이어에 그대로 적용하는 uniform-precision quantization은 심각한 성능 저하를 겪는 것으로 널리 알려져 있다. 한편 각각의 레이어에 서로 다른 quantization 을 찾아서 적용하는 것은 후보군의 수가 레이어 수에 따라 기하급수적으로 증가하기 때문에 적용하기 어렵다. 이 문제를 해결하기 위해, 본 연구에서는 Knowledge Distillation 기법을 활용하여 선형시간 내에 검색 공간을 효율적으로 탐색하는 방법을 제안한다. 특히, 제안된 방법은 대상 레이어에 대한 quantization 의 영향을 추정하기 위해 레이어별로 loss function을 공식화하였다. 레이어간에 의존성이 최소라는 가정에 근거하여 각 레이어별 적용을 개별적으로 결정해서 성능의 손실을 최소화한다. 하드웨어 친화적인 quantization 만 사용하여 image classification과 object detection 에 대한 실험을 실시하였다. 그 결과, CIFAR-10 데이터 세트에서 크기가 13.65배 작은 가장 효율적인 mixed-precision의 ResNet20 모델로 93.62%의 정확도를 달성하였으며, 이는 기본 full-precision 모델보다 0.19% 낮은 수준이다. VOC 데이터 세트에서, 제안된 방법은 평균 precision 이 63.87% 인 mixed-precision Sim-YOLOv2-FPGA 모델을 생성하였고, 동일 압축률의 모든 uniform-precision 모델보다 성능이 뛰어나다. 제안하는 방법은 다른 최첨단 mixed-precision quantization 접근법과 유사한 효율을 달성하면서도 실행은 단순하다.Quantization is an essential process in the deployment of deep neural networks on edge devices which only have limited memory and computation capacity. However, it is widely known that the straightforward uniform-precision quantization method, which applies the same quantization scheme including bit-width and transformation function to all layers, suffers a severe performance degradation. Meanwhile, finding and assigning different quantization schemes to different layers is challenging as the number of candidates is exponential to the the number of layers. To address this problem, this study proposes a method that utilizes Knowledge Distillation technique to efficiently explore the search space in linear time. In particular, the proposed method formulates a per-layer loss function to estimate the impact of a quantization scheme on a target layer. Based on the assumption that the dependence among layers is minimal, the assignment is then decided for each layer separately to minimize the performance loss. Experiments are conducted for both image classification and object detection task, using only hardware-friendly quantization schemes. The results show that the most efficient mixed-precision ResNet20 model with 13.65 times smaller size can still achieve up to 93.62% accuracy on CIFAR-10 dataset, which is only 0.19% lower than the baseline full-precision model. On VOC dataset, the proposed method generates a mixed-precision Sim-YOLOv2-FPGA model with a mean average precision of 63.87, which outperforms all uniform-precision models with the same compression rate. The proposed method is practically simple to carry out while still achieving a comparable efficiency to other state-of-the-art approaches on mixed-precision quantization.Chapter 1: Introduction 1 Chapter 2: Related Work 4 2.1. Quantization techniques 4 2.2. Uniform quantization 5 2.3. Shifter quantization 6 2.4. Knowledge Distillation 8 2.5. Neural Architecture Search 10 Chapter 3: Knowledge-Distillation Mixed-Precision 11 3.1. Complexity challenge and solution 11 3.2. Impact assessment via Knowledge Distillation 15 3.3. Parallel Knowledge Distillation training 17 3.4. Candidate architecture generation 18 3.5. Final model fine-tuning 20 3.6. Summary 21 Chapter 4: Experimental Results 23 4.1. Final model fine-tuning time reduction 23 4.2. Scalable and flexible training 25 4.3. Effectiveness of Knowledge Distillation Mixed Precision 27 4.4. Intra-layer mixed precision 30 Chapter 5: Conclusion and Future Work 32 Appendix 33 A. Experiment with Sim-YOLOv2-FPGA on VOC dataset 33 B. Experiment with ResNet20 and ResNet32 on CIFAR-10 34 Reference 35 초 록 38 Acknowledgement 40Maste

SNU Open Repository and Archive

Real-time Optimal Resource Allocation for Embedded UAV Communication Systems

Author: Duong Trung Q.
Nguyen Long D.
Nguyen Minh-Nghia
Tuan Hoang Duong
Publication venue
Publication date: 05/09/2018
Field of study

We consider device-to-device (D2D) wireless information and power transfer systems using an unmanned aerial vehicle (UAV) as a relay-assisted node. As the energy capacity and flight time of UAVs is limited, a significant issue in deploying UAV is to manage energy consumption in real-time application, which is proportional to the UAV transmit power. To tackle this important issue, we develop a real-time resource allocation algorithm for maximizing the energy efficiency by jointly optimizing the energy-harvesting time and power control for the considered (D2D) communication embedded with UAV. We demonstrate the effectiveness of the proposed algorithms as running time for solving them can be conducted in milliseconds.Comment: 11 pages, 5 figures, 1 table. This paper is accepted for publication on IEEE Wireless Communications Letter

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Towards a Robust WiFi-based Fall Detection with Adversarial Data Augmentation

Author: Nguyen Huu-Nghia H.
Nguyen Tuan-Duy H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/05/2020
Field of study

Recent WiFi-based fall detection systems have drawn much attention due to their advantages over other sensory systems. Various implementations have achieved impressive progress in performance, thanks to machine learning and deep learning techniques. However, many of such high accuracy systems have low reliability as they fail to achieve robustness in unseen environments. To address that, this paper investigates a method of generalization through adversarial data augmentation. Our results show a slight improvement in deep learning-systems in unseen domains, though the performance is not significant.Comment: Will appear in Proceedings of the 54th Annual Conference on Information Sciences and Systems (CISS2020

arXiv.org e-Print Archive

Crossref

ShortcutFusion: From Tensorflow to FPGA-based accelerator with reuse-aware memory allocation for shortcut data

Author: Je Hyeonseung
Lee Hyuk-Jae
Lee Kyujung
Nguyen Duy Thanh
Nguyen Tuan Nghia
Ryu Soojung
Publication venue
Publication date: 16/06/2021
Field of study

Residual block is a very common component in recent state-of-the art CNNs such as EfficientNet or EfficientDet. Shortcut data accounts for nearly 40% of feature-maps access in ResNet152 [8]. Most of the previous DNN compilers, accelerators ignore the shortcut data optimization. This paper presents ShortcutFusion, an optimization tool for FPGA-based accelerator with a reuse-aware static memory allocation for shortcut data, to maximize on-chip data reuse given resource constraints. From TensorFlow DNN models, the proposed design generates instruction sets for a group of nodes which uses an optimized data reuse for each residual block. The accelerator design implemented on the Xilinx KCU1500 FPGA card significantly outperforms NVIDIA RTX 2080 Ti, Titan Xp, and GTX 1080 Ti for the EfficientNet inference. Compared to RTX 2080 Ti, the proposed design is 1.35-2.33x faster and 6.7-7.9x more power efficient. Compared to the result from baseline, in which the weights, inputs, and outputs are accessed from the off-chip memory exactly once per each layer, ShortcutFusion reduces the DRAM access by 47.8-84.8% for RetinaNet, Yolov3, ResNet152, and EfficientNet. Given a similar buffer size to ShortcutMining [8], which also mine the shortcut data in hardware, the proposed work reduces off-chip access for feature-maps 5.27x while accessing weight from off-chip memory exactly once.Comment: 12 page

arXiv.org e-Print Archive

SNU Open Repository and Archive

Federated Deep Reinforcement Learning-based Bitrate Adaptation for Dynamic Adaptive Streaming over HTTP

Author: Dinh Canh T.
Le Tuan-Anh
Luu Long
Nguyen Nghia T.
Tran Nguyen H.
Vo Phuong L.
Publication venue
Publication date: 27/06/2023
Field of study

In video streaming over HTTP, the bitrate adaptation selects the quality of video chunks depending on the current network condition. Some previous works have applied deep reinforcement learning (DRL) algorithms to determine the chunk's bitrate from the observed states to maximize the quality-of-experience (QoE). However, to build an intelligent model that can predict in various environments, such as 3G, 4G, Wifi, \textit{etc.}, the states observed from these environments must be sent to a server for training centrally. In this work, we integrate federated learning (FL) to DRL-based rate adaptation to train a model appropriate for different environments. The clients in the proposed framework train their model locally and only update the weights to the server. The simulations show that our federated DRL-based rate adaptations, called FDRLABR with different DRL algorithms, such as deep Q-learning, advantage actor-critic, and proximal policy optimization, yield better performance than the traditional bitrate adaptation methods in various environments.Comment: 13 pages, 1 colum

arXiv.org e-Print Archive

Public knowledge, attitude and practice on influenxa pandemic (H1N1) 2009 prevention in Southern Vietnam

Author: Huong Bui Thu
Huu Tran Ngoc
Ngan Ho Thi Thien
Nghia Nguyen Trung
Van Tuan Le
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

E-Learner’s needs for sustainable tourism Higher Education: a case of Vietnam

Author: Nguyen Thi Hong Hai
Nguyen Thi Minh Nghia
Nguyen Thi Thuy Van
Tran Huu Tuan
Publication venue: 'Texas A and M University School of Law'
Publication date: 05/05/2022
Field of study

E-learning has been suggested to help higher education organizations in tourism to achieve sustainable development. Thus, this study aims to identify factors determining the learners' needs for e-learning programs, particularly in a tourism curriculum. Based on relevant research and the Technology Acceptance Model, the study conducted a survey with 1,109 learners in the Central Coastal region of Vietnam. The results show that the e-learning environment, perceived ease of use, perceived usefulness, playfulness, and information technology skills have a positive impact on the learner's needs. These findings provide useful managerial implications for e-learning of tourism programs which contributes to the sustainable development of higher education as well as tourism industry through its workforce

Greenwich Academic Literature Archive