1,764 research outputs found
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
๊ทธ๋ผ๋์ธํธ ๊ฐ์ ๋ฐ ๋ช ์์ ์ ๊ทํ๋ฅผ ํตํ ์ฌ์ธต ๋ชจ๋ธ ์์ถ์ ๊ดํ ์ฐ๊ตฌ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ์ตํฉ๊ณผํ๊ธฐ์ ๋ํ์ ์ตํ๊ณผํ๋ถ, 2022.2. ๊น์ฅํธDeep Neural Network (DNN)์ ๋น ๋ฅด๊ฒ ๋ฐ์ ํ์ฌ ์ปดํจํฐ ๋น์ , ์์ฐ์ด ์ฒ๋ฆฌ ๋ฐ ์์ฑ ์ฒ๋ฆฌ๋ฅผ ํฌํจํ ๋ง์ ์์ญ์์ ๋๋ผ์ด ์ฑ๋ฅ์ ๋ณด์ฌ ์๋ค. ์ด๋ฌํ DNN์ ๋ฐ์ ์ ๋ฐ๋ผ edge IoT ์ฅ์น์ ์ค๋งํธํฐ์ DNN์ ๊ตฌ๋ํ๋ ์จ๋๋ฐ์ด์ค DNN์ ๋ํ ์์๊ฐ ์ฆ๊ฐํ๊ณ ์๋ค. ๊ทธ๋ฌ๋ DNN์ ์ฑ์ฅ๊ณผ ํจ๊ป DNN ๋งค๊ฐ๋ณ์์ ์๊ฐ ๊ธ๊ฒฉํ ์ฆ๊ฐํ๋ค. ์ด๋ก ์ธํด DNN ๋ชจ๋ธ์ ๋ฆฌ์์ค ์ ์ฝ์ด ์๋ ์์ง ์ฅ์น์ ๊ตฌ๋ํ๊ธฐ๊ฐ ์ด๋ ต๋ค. ๋ ๋ค๋ฅธ ๋ฌธ์ ๋ ์์ง ์ฅ์น์์ DNN์ ์ ๋ ฅ ์๋น๋์ด๋ค ์๋ํ๋ฉด ์์ง ์ฅ์น์ ์ ๋ ฅ์ฉ ๋ฐฐํฐ๋ฆฌ๊ฐ ์ ํ๋์ด ์๊ธฐ ๋๋ฌธ์ด๋ค. ์์ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด์๋ ๋ชจ๋ธ ์์ถ์ด ๋งค์ฐ ์ค์ํ๋ค.
์ด ๋
ผ๋ฌธ์์ ์ฐ๋ฆฌ๋ ์ง์ ์ฆ๋ฅ, ์์ํ ๋ฐ ๊ฐ์ง์น๊ธฐ๋ฅผ ํฌํจํ ๋ชจ๋ธ ์์ถ์ ์ธ ๊ฐ์ง ์๋ก์ด ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๋จผ์ , ์ง์ ์ฆ๋ฅ๋ผ๊ณ ๋ถ๋ฆฌ๋ ๋ฐฉ๋ฒ์ผ๋ก์จ, ๊ต์ฌ ๋คํธ์ํฌ์ ์ถ๊ฐ ์ ๋ณด๋ฅผ ์ฌ์ฉํ์ฌ ํ์ ๋ชจ๋ธ์ ํ์ต์ํค๋ ๊ฒ์ ๋ชฉํ๋ก ํ๋ค. ์ด ํ๋ ์์ํฌ๋ฅผ ์ฌ์ฉํ๋ฉด ์ฃผ์ด์ง ๋งค๊ฐ๋ณ์๋ฅผ ์ต๋ํ ํ์ฉํ ์ ์์ผ๋ฉฐ ์ด๋ ์ฅ์น์ ๋ฆฌ์์ค๊ฐ ์ ํ๋ ์ํฉ์์ ์ค์ํ๋ค. ๊ธฐ์กด ์ง์ ์ฆ๋ฅ ํ๋ ์์ํฌ์ ๋ฌ๋ฆฌ ๋คํธ์ํฌ ๊ตฌ์กฐ, ๋ฐฐ์น ๋ฌด์์์ฑ ๋ฐ ์ด๊ธฐ ์กฐ๊ฑด๊ณผ ๊ฐ์ ๊ต์ฌ์ ํ์ ๊ฐ์ ๊ณ ์ ํ ์ฐจ์ด๊ฐ ์ ์ ํ ์ง์์ ์ ๋ฌํ๋ ๋ฐ ๋ฐฉํด๊ฐ ๋ ์ ์์ผ๋ฏ๋ก ํผ์ณ์์ ์์๋ฅผ ์ถ์ถํ์ฌ ์ง์์ ๊ฐ์ ์ ์ผ๋ก ์ฆ๋ฅํ๋ ๋ฐ ์ค์ ์ ๋๋ค.
๋์งธ, ์์ํ๋ฅผ ์ํ ์ ๊ทํ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ์์ํ๋ ๋ชจ๋ธ์ ์์์ด ์ ํ๋ ์์ง ์ฅ์น์ ์ค์ํ ์ ๋ ฅ ์๋ชจ์ ๋ฉ๋ชจ๋ฆฌ์ ์ด์ ์ด ์๋ค. ํ๋ผ๋ฏธํฐ ๋ถํฌ๋ฅผ ์์ํ ์นํ์ ์ผ๋ก ๋ง๋ค๊ธฐ ์ํด ํ๋ จ ์๊ฐ์ ๋ชจ๋ธ์ ๊ธฐ์ธ๊ธฐ๋ฅผ ๋ถ๊ท ์ผํ๊ฒ ์ฌ์กฐ์ ํ๋ค. ์ฐ๋ฆฌ๋ ๊ทธ๋ผ๋์ธํธ์ ํฌ๊ธฐ๋ฅผ ์ฌ์กฐ์ ํ๊ธฐ ์ํด position-based scaled gradient (PSG)๋ฅผ ์ฌ์ฉํ๋ค. Stochastic gradient descent (SGD) ์ ๋น๊ตํ์ฌ, ์ฐ๋ฆฌ์ position-based scaled gradient descent (PSGD)๋ ๋ชจ๋ธ์ ์์ํ ์นํ์ ์ธ ๊ฐ์ค์น ๋ถํฌ๋ฅผ ๋ง๋ค๊ธฐ ๋๋ฌธ์ ์์ํ ํ ์ฑ๋ฅ ์ ํ๋ฅผ ์ํํ๋ค.
์
์งธ, ์ค์ํ์ง ์์ ๊ณผ์ ๋งค๊ฐ ๋ณ์ํ ๋ชจ๋ธ์ ์ ๊ฑฐํ๊ธฐ ์ํด, ๊ฐ์ง์น๊ธฐ๋ ๊ฐ์ค์น์ ๋๋ต์ ์ธ ๊ธฐ์ธ๊ธฐ์ Straight-Through-Estimator (STE)๋ฅผ ํ์ฉํ์ฌ ํ๋ จ ์ค์ ๋ค์ํ ํฌ์์ฑ ํจํด์ ์ฐพ์ผ๋ ค๊ณ ํ๋ ๋์ ๊ฐ์ง์น๊ธฐ ๋ฐฉ๋ฒ์ด ๋ฑ์ฅํ๋ค. STE๋ ๋์ ํฌ์์ฑ ํจํด์ ์ฐพ๋ ๊ณผ์ ์์ ์ ๊ฑฐ๋ ํ๋ผ๋ฏธํฐ๊ฐ ๋์ด์๋๋๋ก ๋์ธ ์ ์๋ค. ๊ทธ๋ฌ๋ ์ด๋ฌํ ๊ฑฐ์น ๊ธฐ์ธ๊ธฐ (coarse gradient)๋ฅผ ์ฌ์ฉํ๋ฉด STE ๊ทผ์ฌ์ ์ ๋ขฐํ ์ ์๋ ๊ธฐ์ธ๊ธฐ ๋ฐฉํฅ์ผ๋ก ์ธํด ํ๋ จ์ด ๋ถ์์ ํด์ง๊ณ ์ฑ๋ฅ์ด ์ ํ๋๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ์ฐ๋ฆฌ๋ ์ด์ค ์ ๋ฌ ๊ฒฝ๋ก๋ฅผ ํ์ฑํ์ฌ ์ ๊ฑฐ๋ ํ๋ผ๋ฏธํฐ (pruned weights)๋ฅผ ์
๋ฐ์ดํธํ๊ธฐ ์ํด ์ ์ ๋ ๊ทธ๋ผ๋์ธํธ๋ฅผ ์ ์ํ๋ค. ๊ฐ์ง์น๊ธฐ์ ๊ฑฐ์น ๊ธฐ์ธ๊ธฐ๋ฅผ ์ฌ์ฉํ์ง ์๊ธฐ ์ํด Dynamic Collective Intelligence Learning (DCIL)์ ์ ์ํ๋ค.
๋ง์ง๋ง์ผ๋ก ์ ์๋ ๋ฐฉ๋ฒ๋ค์ ์ด์ฉํ์ฌ ํตํฉ ๋ชจ๋ธ ์์ถ ํ๋ จ ํ๋ ์์ํฌ๋ก์ ๊ฒฐํฉํ๋ค. ์ด ๋ฐฉ๋ฒ์ ๊ทน๋๋ก ํฌ์ํ๊ณ ์์ํ ์นํ์ ์ธ ๋ชจ๋ธ์ ํ๋ จํ ์ ์๋ค.Deep neural network (DNN) has been developed rapidly and has shown remarkable performance in many domains including computer vision, natural language processing and speech processing. The demand for on-device DNN, i.e., deploying DNN on the edge IoT device and smartphone in line with this development of DNN has increased. However, with the growth of DNN, the number of DNN parameters has risen drastically. This makes DNN models hard to be deployed on resource-constraint edge devices. Another challenge is the power consumption of DNN on the edge device because edge devices have a limited battery for the power. To resolve the above issues model compression is very important.
In this dissertation, we propose three novel methods in model compression including knowledge distillation, quantization and pruning. First, we aim to train the student model with additional information of the teacher network, named as knowledge distillation. This framework makes it possible to make the most of a given parameter, which is essential in situations where the device's resources are limited. Unlike previous knowledge distillation frameworks, we focus on distilling the knowledge indirectly by extracting the factor from features because the inherent differences between the teacher and the student, such as the network structure, batch randomness, and initial conditions, can hinder the transfer of appropriate knowledge.
Second, we propose the regularization method for quantization. The quantized model has advantages in power consumption and memory which are essential to the resource-constraint edge device. We non-uniformly rescale the gradient of the model in the training time to make a weight distribution quantization-friendly. We use position-based scaled gradient (PSG) for rescaling the gradient. Compared with the stochastic gradient descent (SGD), our position-based scaled gradient descent (PSGD) mitigates the performance degradation after quantization because it makes a quantization-friendly weight distribution of the model.
Third, to prune the unimportant overparameterized model dynamic pruning methods have emerged, which try to find diverse sparsity patterns during training by utilizing Straight-Through-Estimator (STE) to approximate gradients of pruned weights. STE can help the pruned weights revive in the process of finding dynamic sparsity patterns. However, using these coarse gradients causes training instability and performance degradation owing to the unreliable gradient signal of the STE approximation. To tackle this issue, we propose refined gradients to update the pruned weights by forming dual forwarding paths. We propose a Dynamic Collective Intelligence Learning (DCIL) to avoid using coarse gradients for pruning.
Lastly, we combine proposed methods as a unified model compression training framework. This method can train a drastically sparse and quantization-friendly model.Abstract i
Contents iii
List of Tables vii
List of Figures x
1 Introduction 1
1.1 Motivation 1
1.2 Tasks 4
1.3 Contributions and Outline 7
2 Related work 11
2.1 Knowledge Distillation 11
2.2 Quantization 13
2.2.1 Sparse training 14
2.3 Pruning 15
3 Factor Transfer (FT) for Knowledge Distillation 17
3.1 Introduction 17
3.2 Proposed method 19
3.2.1 Teacher Factor Extraction with Paraphraser 20
3.2.2 Factor Transfer with Translator 21
3.3 Experiments 23
3.3.1 CIFAR-10 24
3.3.2 CIFAR-100 26
3.3.3 Ablation Study 28
3.3.4 ImageNet 29
3.3.5 Object Detection 29
3.3.6 Discussion 31
3.4 Conclusion 31
4 Position based Scaled Gradients (PSG) for Quantization 33
4.1 Introduction 33
4.2 Proposed method 37
4.2.1 Optimization in warped space 38
4.2.2 Position-based scaled gradient 39
4.2.3 Target points 43
4.2.4 PSGD for deep networks 44
4.2.5 Geometry of the Warped Space 45
4.3 Experiments 50
4.3.1 Implementation details 51
4.3.2 Pruning 53
4.3.3 Quantization 56
4.3.4 Knowledge Distillation 58
4.3.5 Various architectures with PSGD 60
4.3.6 Adam optimizer with PSG 60
4.4 Discussion 61
4.4.1 Toy Example 61
4.4.2 Weight Distributions 62
4.4.3 Quantization-aware training vs PSGD 64
4.4.4 Post-training with PSGD-trained model 65
4.5 Conclusion 65
5 Dynamic Collective Intelligence Learning (DCIL) for Pruning 69
5.1 Introduction 69
5.2 Proposed method 73
5.2.1 Backgrounds 73
5.2.2 Dynamic Collective Intelligence Learning 74
5.2.3 Convergence analysis 79
5.3 Experiments 80
5.3.1 Experiment Setting 81
5.3.2 Experiment Results 84
5.3.3 Differences between Dense and pruned model 87
5.3.4 Analysis of the stability 87
5.3.5 Cost of training 90
5.3.6 Fast convergence of DCIL 92
5.3.7 Tendency of warm-up 93
5.3.8 CIFAR10 94
5.3.9 ImageNet 94
5.3.10 Analysis of training and inference overheads 95
5.4 Conclusion 96
6 Deep Model Compression via KD, Quantization and Pruning (KQP) 97
6.1 Method 97
6.2 Experiment 98
6.3 Conclusion 102
7 Conclusion 103
7.1 Summary 103
7.2 Limitations and Future Directions 105
Abstract (In Korean) 118
๊ฐ์ฌ์ ๊ธ 120๋ฐ
Memory and information processing in neuromorphic systems
A striking difference between brain-inspired neuromorphic processors and
current von Neumann processors architectures is the way in which memory and
processing is organized. As Information and Communication Technologies continue
to address the need for increased computational power through the increase of
cores within a digital processor, neuromorphic engineers and scientists can
complement this need by building processor architectures where memory is
distributed with the processing. In this paper we present a survey of
brain-inspired processor architectures that support models of cortical networks
and deep neural networks. These architectures range from serial clocked
implementations of multi-neuron systems to massively parallel asynchronous ones
and from purely digital systems to mixed analog/digital systems which implement
more biological-like models of neurons and synapses together with a suite of
adaptation and learning mechanisms analogous to the ones found in biological
nervous systems. We describe the advantages of the different approaches being
pursued and present the challenges that need to be addressed for building
artificial neural processing systems that can display the richness of behaviors
seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed
neuromorphic computing platforms and system
- โฆ