1,764 research outputs found

    Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

    Get PDF
    Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial

    ๊ทธ๋ผ๋””์–ธํŠธ ๊ฐœ์„  ๋ฐ ๋ช…์‹œ์  ์ •๊ทœํ™”๋ฅผ ํ†ตํ•œ ์‹ฌ์ธต ๋ชจ๋ธ ์••์ถ•์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•™๊ณผํ•™๋ถ€, 2022.2. ๊น€์žฅํ˜ธDeep Neural Network (DNN)์€ ๋น ๋ฅด๊ฒŒ ๋ฐœ์ „ํ•˜์—ฌ ์ปดํ“จํ„ฐ ๋น„์ „, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฐ ์Œ์„ฑ ์ฒ˜๋ฆฌ๋ฅผ ํฌํ•จํ•œ ๋งŽ์€ ์˜์—ญ์—์„œ ๋†€๋ผ์šด ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ์™”๋‹ค. ์ด๋Ÿฌํ•œ DNN์˜ ๋ฐœ์ „์— ๋”ฐ๋ผ edge IoT ์žฅ์น˜์™€ ์Šค๋งˆํŠธํฐ์— DNN์„ ๊ตฌ๋™ํ•˜๋Š” ์˜จ๋””๋ฐ”์ด์Šค DNN์— ๋Œ€ํ•œ ์ˆ˜์š”๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ DNN์˜ ์„ฑ์žฅ๊ณผ ํ•จ๊ป˜ DNN ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ˆ˜๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ–ˆ๋‹ค. ์ด๋กœ ์ธํ•ด DNN ๋ชจ๋ธ์„ ๋ฆฌ์†Œ์Šค ์ œ์•ฝ์ด ์žˆ๋Š” ์—์ง€ ์žฅ์น˜์— ๊ตฌ๋™ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค. ๋˜ ๋‹ค๋ฅธ ๋ฌธ์ œ๋Š” ์—์ง€ ์žฅ์น˜์—์„œ DNN์˜ ์ „๋ ฅ ์†Œ๋น„๋Ÿ‰์ด๋‹ค ์™œ๋ƒํ•˜๋ฉด ์—์ง€ ์žฅ์น˜์˜ ์ „๋ ฅ์šฉ ๋ฐฐํ„ฐ๋ฆฌ๊ฐ€ ์ œํ•œ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์œ„์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ชจ๋ธ ์••์ถ•์ด ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ ์šฐ๋ฆฌ๋Š” ์ง€์‹ ์ฆ๋ฅ˜, ์–‘์žํ™” ๋ฐ ๊ฐ€์ง€์น˜๊ธฐ๋ฅผ ํฌํ•จํ•œ ๋ชจ๋ธ ์••์ถ•์˜ ์„ธ ๊ฐ€์ง€ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ €, ์ง€์‹ ์ฆ๋ฅ˜๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ์จ, ๊ต์‚ฌ ๋„คํŠธ์›Œํฌ์˜ ์ถ”๊ฐ€ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์ƒ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ฃผ์–ด์ง„ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋Š” ์žฅ์น˜์˜ ๋ฆฌ์†Œ์Šค๊ฐ€ ์ œํ•œ๋œ ์ƒํ™ฉ์—์„œ ์ค‘์š”ํ•˜๋‹ค. ๊ธฐ์กด ์ง€์‹ ์ฆ๋ฅ˜ ํ”„๋ ˆ์ž„์›Œํฌ์™€ ๋‹ฌ๋ฆฌ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ, ๋ฐฐ์น˜ ๋ฌด์ž‘์œ„์„ฑ ๋ฐ ์ดˆ๊ธฐ ์กฐ๊ฑด๊ณผ ๊ฐ™์€ ๊ต์‚ฌ์™€ ํ•™์ƒ ๊ฐ„์˜ ๊ณ ์œ ํ•œ ์ฐจ์ด๊ฐ€ ์ ์ ˆํ•œ ์ง€์‹์„ ์ „๋‹ฌํ•˜๋Š” ๋ฐ ๋ฐฉํ•ด๊ฐ€ ๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ํ”ผ์ณ์—์„œ ์š”์†Œ๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ง€์‹์„ ๊ฐ„์ ‘์ ์œผ๋กœ ์ฆ๋ฅ˜ํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘”๋‹ค. ๋‘˜์งธ, ์–‘์žํ™”๋ฅผ ์œ„ํ•œ ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์–‘์žํ™”๋œ ๋ชจ๋ธ์€ ์ž์›์ด ์ œํ•œ๋œ ์—์ง€ ์žฅ์น˜์— ์ค‘์š”ํ•œ ์ „๋ ฅ ์†Œ๋ชจ์™€ ๋ฉ”๋ชจ๋ฆฌ์— ์ด์ ์ด ์žˆ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ ๋ถ„ํฌ๋ฅผ ์–‘์žํ™” ์นœํ™”์ ์œผ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ํ›ˆ๋ จ ์‹œ๊ฐ„์— ๋ชจ๋ธ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๋ถˆ๊ท ์ผํ•˜๊ฒŒ ์žฌ์กฐ์ •ํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ทธ๋ผ๋””์–ธํŠธ์˜ ํฌ๊ธฐ๋ฅผ ์žฌ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด position-based scaled gradient (PSG)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. Stochastic gradient descent (SGD) ์™€ ๋น„๊ตํ•˜์—ฌ, ์šฐ๋ฆฌ์˜ position-based scaled gradient descent (PSGD)๋Š” ๋ชจ๋ธ์˜ ์–‘์žํ™” ์นœํ™”์ ์ธ ๊ฐ€์ค‘์น˜ ๋ถ„ํฌ๋ฅผ ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์— ์–‘์žํ™” ํ›„ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์™„ํ™”ํ•œ๋‹ค. ์…‹์งธ, ์ค‘์š”ํ•˜์ง€ ์•Š์€ ๊ณผ์ž‰ ๋งค๊ฐœ ๋ณ€์ˆ˜ํ™” ๋ชจ๋ธ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ€์ง€์น˜๊ธฐ๋œ ๊ฐ€์ค‘์น˜์˜ ๋Œ€๋žต์ ์ธ ๊ธฐ์šธ๊ธฐ์— Straight-Through-Estimator (STE)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ์ค‘์— ๋‹ค์–‘ํ•œ ํฌ์†Œ์„ฑ ํŒจํ„ด์„ ์ฐพ์œผ๋ ค๊ณ  ํ•˜๋Š” ๋™์  ๊ฐ€์ง€์น˜๊ธฐ ๋ฐฉ๋ฒ•์ด ๋“ฑ์žฅํ–ˆ๋‹ค. STE๋Š” ๋™์  ํฌ์†Œ์„ฑ ํŒจํ„ด์„ ์ฐพ๋Š” ๊ณผ์ •์—์„œ ์ œ๊ฑฐ๋œ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋˜์‚ด์•„๋‚˜๋„๋ก ๋„์šธ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๊ฑฐ์นœ ๊ธฐ์šธ๊ธฐ (coarse gradient)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด STE ๊ทผ์‚ฌ์˜ ์‹ ๋ขฐํ•  ์ˆ˜ ์—†๋Š” ๊ธฐ์šธ๊ธฐ ๋ฐฉํ–ฅ์œผ๋กœ ์ธํ•ด ํ›ˆ๋ จ์ด ๋ถˆ์•ˆ์ •ํ•ด์ง€๊ณ  ์„ฑ๋Šฅ์ด ์ €ํ•˜๋œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ์ด์ค‘ ์ „๋‹ฌ ๊ฒฝ๋กœ๋ฅผ ํ˜•์„ฑํ•˜์—ฌ ์ œ๊ฑฐ๋œ ํŒŒ๋ผ๋ฏธํ„ฐ (pruned weights)๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๊ธฐ ์œ„ํ•ด ์ •์ œ๋œ ๊ทธ๋ผ๋””์–ธํŠธ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ฐ€์ง€์น˜๊ธฐ์— ๊ฑฐ์นœ ๊ธฐ์šธ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ ์œ„ํ•ด Dynamic Collective Intelligence Learning (DCIL)์„ ์ œ์•ˆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋“ค์„ ์ด์šฉํ•˜์—ฌ ํ†ตํ•ฉ ๋ชจ๋ธ ์••์ถ• ํ›ˆ๋ จ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ์„œ ๊ฒฐํ•ฉํ•œ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๊ทน๋„๋กœ ํฌ์†Œํ•˜๊ณ  ์–‘์žํ™” ์นœํ™”์ ์ธ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ๋‹ค.Deep neural network (DNN) has been developed rapidly and has shown remarkable performance in many domains including computer vision, natural language processing and speech processing. The demand for on-device DNN, i.e., deploying DNN on the edge IoT device and smartphone in line with this development of DNN has increased. However, with the growth of DNN, the number of DNN parameters has risen drastically. This makes DNN models hard to be deployed on resource-constraint edge devices. Another challenge is the power consumption of DNN on the edge device because edge devices have a limited battery for the power. To resolve the above issues model compression is very important. In this dissertation, we propose three novel methods in model compression including knowledge distillation, quantization and pruning. First, we aim to train the student model with additional information of the teacher network, named as knowledge distillation. This framework makes it possible to make the most of a given parameter, which is essential in situations where the device's resources are limited. Unlike previous knowledge distillation frameworks, we focus on distilling the knowledge indirectly by extracting the factor from features because the inherent differences between the teacher and the student, such as the network structure, batch randomness, and initial conditions, can hinder the transfer of appropriate knowledge. Second, we propose the regularization method for quantization. The quantized model has advantages in power consumption and memory which are essential to the resource-constraint edge device. We non-uniformly rescale the gradient of the model in the training time to make a weight distribution quantization-friendly. We use position-based scaled gradient (PSG) for rescaling the gradient. Compared with the stochastic gradient descent (SGD), our position-based scaled gradient descent (PSGD) mitigates the performance degradation after quantization because it makes a quantization-friendly weight distribution of the model. Third, to prune the unimportant overparameterized model dynamic pruning methods have emerged, which try to find diverse sparsity patterns during training by utilizing Straight-Through-Estimator (STE) to approximate gradients of pruned weights. STE can help the pruned weights revive in the process of finding dynamic sparsity patterns. However, using these coarse gradients causes training instability and performance degradation owing to the unreliable gradient signal of the STE approximation. To tackle this issue, we propose refined gradients to update the pruned weights by forming dual forwarding paths. We propose a Dynamic Collective Intelligence Learning (DCIL) to avoid using coarse gradients for pruning. Lastly, we combine proposed methods as a unified model compression training framework. This method can train a drastically sparse and quantization-friendly model.Abstract i Contents iii List of Tables vii List of Figures x 1 Introduction 1 1.1 Motivation 1 1.2 Tasks 4 1.3 Contributions and Outline 7 2 Related work 11 2.1 Knowledge Distillation 11 2.2 Quantization 13 2.2.1 Sparse training 14 2.3 Pruning 15 3 Factor Transfer (FT) for Knowledge Distillation 17 3.1 Introduction 17 3.2 Proposed method 19 3.2.1 Teacher Factor Extraction with Paraphraser 20 3.2.2 Factor Transfer with Translator 21 3.3 Experiments 23 3.3.1 CIFAR-10 24 3.3.2 CIFAR-100 26 3.3.3 Ablation Study 28 3.3.4 ImageNet 29 3.3.5 Object Detection 29 3.3.6 Discussion 31 3.4 Conclusion 31 4 Position based Scaled Gradients (PSG) for Quantization 33 4.1 Introduction 33 4.2 Proposed method 37 4.2.1 Optimization in warped space 38 4.2.2 Position-based scaled gradient 39 4.2.3 Target points 43 4.2.4 PSGD for deep networks 44 4.2.5 Geometry of the Warped Space 45 4.3 Experiments 50 4.3.1 Implementation details 51 4.3.2 Pruning 53 4.3.3 Quantization 56 4.3.4 Knowledge Distillation 58 4.3.5 Various architectures with PSGD 60 4.3.6 Adam optimizer with PSG 60 4.4 Discussion 61 4.4.1 Toy Example 61 4.4.2 Weight Distributions 62 4.4.3 Quantization-aware training vs PSGD 64 4.4.4 Post-training with PSGD-trained model 65 4.5 Conclusion 65 5 Dynamic Collective Intelligence Learning (DCIL) for Pruning 69 5.1 Introduction 69 5.2 Proposed method 73 5.2.1 Backgrounds 73 5.2.2 Dynamic Collective Intelligence Learning 74 5.2.3 Convergence analysis 79 5.3 Experiments 80 5.3.1 Experiment Setting 81 5.3.2 Experiment Results 84 5.3.3 Differences between Dense and pruned model 87 5.3.4 Analysis of the stability 87 5.3.5 Cost of training 90 5.3.6 Fast convergence of DCIL 92 5.3.7 Tendency of warm-up 93 5.3.8 CIFAR10 94 5.3.9 ImageNet 94 5.3.10 Analysis of training and inference overheads 95 5.4 Conclusion 96 6 Deep Model Compression via KD, Quantization and Pruning (KQP) 97 6.1 Method 97 6.2 Experiment 98 6.3 Conclusion 102 7 Conclusion 103 7.1 Summary 103 7.2 Limitations and Future Directions 105 Abstract (In Korean) 118 ๊ฐ์‚ฌ์˜ ๊ธ€ 120๋ฐ•

    Memory and information processing in neuromorphic systems

    Full text link
    A striking difference between brain-inspired neuromorphic processors and current von Neumann processors architectures is the way in which memory and processing is organized. As Information and Communication Technologies continue to address the need for increased computational power through the increase of cores within a digital processor, neuromorphic engineers and scientists can complement this need by building processor architectures where memory is distributed with the processing. In this paper we present a survey of brain-inspired processor architectures that support models of cortical networks and deep neural networks. These architectures range from serial clocked implementations of multi-neuron systems to massively parallel asynchronous ones and from purely digital systems to mixed analog/digital systems which implement more biological-like models of neurons and synapses together with a suite of adaptation and learning mechanisms analogous to the ones found in biological nervous systems. We describe the advantages of the different approaches being pursued and present the challenges that need to be addressed for building artificial neural processing systems that can display the richness of behaviors seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed neuromorphic computing platforms and system
    • โ€ฆ
    corecore