976 research outputs found

    Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges

    Full text link
    Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and execute tasks with human cognitive abilities, engenders significant anticipation and intrigue across scientific, commercial, and societal arenas. This fascination extends particularly to the Internet of Things (IoT), a landscape characterized by the interconnection of countless devices, sensors, and systems, collectively gathering and sharing data to enable intelligent decision-making and automation. This research embarks on an exploration of the opportunities and challenges towards achieving AGI in the context of the IoT. Specifically, it starts by outlining the fundamental principles of IoT and the critical role of Artificial Intelligence (AI) in IoT systems. Subsequently, it delves into AGI fundamentals, culminating in the formulation of a conceptual framework for AGI's seamless integration within IoT. The application spectrum for AGI-infused IoT is broad, encompassing domains ranging from smart grids, residential environments, manufacturing, and transportation to environmental monitoring, agriculture, healthcare, and education. However, adapting AGI to resource-constrained IoT settings necessitates dedicated research efforts. Furthermore, the paper addresses constraints imposed by limited computing resources, intricacies associated with large-scale IoT communication, as well as the critical concerns pertaining to security and privacy

    효율적인 좔둠을 μœ„ν•œ ν•˜λ“œμ›¨μ–΄ μΉœν™”μ  신경망 ꡬ쑰 및 가속기 섀계

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2020. 8. 이혁재.λ¨Έμ‹  λŸ¬λ‹ (Machine Learning) 방법 쀑 ν˜„μž¬ κ°€μž₯ μ£Όλͺ©λ°›κ³  μžˆλŠ” λ”₯λŸ¬λ‹(Deep Learning)에 κ΄€ν•œ 연ꡬ듀이 ν•˜λ“œμ›¨μ–΄μ™€ μ†Œν”„νŠΈμ›¨μ–΄ 두 μΈ‘λ©΄μ—μ„œ λͺ¨λ‘ ν™œλ°œν•˜κ²Œ μ§„ν–‰λ˜κ³  μžˆλ‹€. 높은 μ„±λŠ₯을 μœ μ§€ν•˜λ©΄μ„œλ„ 효율적으둜 좔둠을 ν•˜κΈ° μœ„ν•˜μ—¬ λͺ¨λ°”μΌμš© 신경망 ꡬ쑰(Neural Network Architecture) 섀계 및 ν•™μŠ΅λœ λͺ¨λΈ μ••μΆ• λ“± μ†Œν”„νŠΈμ›¨μ–΄ μΈ‘λ©΄μ—μ„œμ˜ μ΅œμ ν™” 방법듀이 μ—°κ΅¬λ˜κ³  있으며, 이미 ν•™μŠ΅λœ λ”₯λŸ¬λ‹ λͺ¨λΈμ΄ μ£Όμ–΄μ‘Œμ„ λ•Œ λΉ λ₯Έ μΆ”λ‘ κ³Ό 높은 μ—λ„ˆμ§€νš¨μœ¨μ„±μ„ κ°–λŠ” 가속기λ₯Ό μ„€κ³„ν•˜λŠ” ν•˜λ“œμ›¨μ–΄ μΈ‘λ©΄μ—μ„œμ˜ 연ꡬ가 λ™μ‹œμ— μ§„ν–‰λ˜κ³  μžˆλ‹€. μ΄λŸ¬ν•œ 기쑴의 μ΅œμ ν™” 및 섀계 λ°©λ²•μ—μ„œ 더 λ‚˜μ•„κ°€ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μƒˆλ‘œμš΄ ν•˜λ“œμ›¨μ–΄ 섀계 기술과 λͺ¨λΈ λ³€ν™˜ 방법 등을 μ μš©ν•˜μ—¬ 더 효율적인 μΆ”λ‘  μ‹œμŠ€ν…œμ„ λ§Œλ“œλŠ” 것을 λͺ©ν‘œλ‘œ ν•œλ‹€. 첫 번째, μƒˆλ‘œμš΄ ν•˜λ“œμ›¨μ–΄ 섀계 방법인 ν™•λ₯  μ»΄ν“¨νŒ…(Stochastic computing)을 λ„μž…ν•˜μ—¬ 더 효율적인 λ”₯λŸ¬λ‹ 가속 ν•˜λ“œμ›¨μ–΄λ₯Ό μ„€κ³„ν•˜μ˜€λ‹€. ν™•λ₯  μ»΄ν“¨νŒ…μ€ ν™•λ₯  연산에 κΈ°λ°˜μ„ λ‘” μƒˆλ‘œμš΄ 회둜 섀계 λ°©λ²•μœΌλ‘œ 기쑴의 이진 μ—°μ‚° 회둜(Binary system)보닀 훨씬 더 적은 νŠΈλžœμ§€μŠ€ν„°λ₯Ό μ‚¬μš©ν•˜μ—¬ λ™μΌν•œ μ—°μ‚° 회둜λ₯Ό κ΅¬ν˜„ν•  수 μžˆλ‹€λŠ” μž₯점이 μžˆλ‹€. 특히, λ”₯λŸ¬λ‹μ—μ„œ κ°€μž₯ 많이 μ‚¬μš©λ˜λŠ” κ³±μ…ˆ 연산을 μœ„ν•˜μ—¬ 이진 μ—°μ‚° νšŒλ‘œμ—μ„œλŠ” λ°°μ—΄ μŠΉμ‚°κΈ°(Array Multiplier)λ₯Ό ν•„μš”λ‘œ ν•˜μ§€λ§Œ ν™•λ₯  μ»΄ν“¨νŒ…μ—μ„œλŠ” AND κ²Œμ΄νŠΈν•˜λ‚˜λ‘œ κ΅¬ν˜„μ΄ κ°€λŠ₯ν•˜λ‹€. μ„ ν–‰ 연ꡬ듀이 ν™•λ₯  μ»΄ν“¨νŒ… 회둜λ₯Ό κΈ°λ°˜ν•œ λ”₯λŸ¬λ‹ 가속기듀을 μ„€κ³„ν•˜κ³  μžˆλŠ”λ°, 인식λ₯ μ΄ 이진 μ—°μ‚° νšŒλ‘œμ— λΉ„ν•˜μ—¬ 많이 λ’€μ³μ§€λŠ” κ²°κ³Όλ₯Ό λ³΄μ—¬μ£Όμ—ˆλ‹€. μ΄λŸ¬ν•œ λ¬Έμ œλ“€μ„ ν•΄κ²°ν•˜κΈ° μœ„ν•˜μ—¬ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ—°μ‚°μ˜ 정확도λ₯Ό 더 높일 수 μžˆλ„λ‘ 단극성 λΆ€ν˜Έν™”(Unipolar encoding) 방법을 ν™œμš©ν•˜μ—¬ 가속기λ₯Ό μ„€κ³„ν•˜μ˜€κ³ , ν™•λ₯  μ»΄ν“¨νŒ… 숫자 생성기 (Stochastic number generator)의 μ˜€λ²„ν—€λ“œλ₯Ό 쀄이기 μœ„ν•˜μ—¬ ν™•λ₯  μ»΄ν“¨νŒ… 숫자 생성기λ₯Ό μ—¬λŸ¬ 개의 λ‰΄λŸ°μ΄ κ³΅μœ ν•˜λŠ” 방법을 μ œμ•ˆν•˜μ˜€λ‹€. 두 번째, 더 높은 μΆ”λ‘  속도 ν–₯상을 μœ„ν•˜μ—¬ ν•™μŠ΅λœ λ”₯λŸ¬λ‹ λͺ¨λΈμ„ μ••μΆ•ν•˜λŠ” 방법 λŒ€μ‹ μ— 신경망 ꡬ쑰λ₯Ό λ³€ν™˜ ν•˜λŠ” 방법을 μ œμ‹œν•˜μ˜€λ‹€. μ„ ν–‰ μ—°κ΅¬λ“€μ˜ κ²°κ³Όλ₯Ό 보면, ν•™μŠ΅λœ λͺ¨λΈμ„ μ••μΆ•ν•˜λŠ” 방법을 μ΅œμ‹  ꡬ쑰듀에 μ μš©ν•˜κ²Œ 되면 κ°€μ€‘μΉ˜ νŒŒλΌλ―Έν„°(Weight Parameter)μ—λŠ” 높은 μ••μΆ•λ₯ μ„ λ³΄μ—¬μ£Όμ§€λ§Œ μ‹€μ œ μΆ”λ‘  속도 ν–₯μƒμ—λŠ” λ―Έλ―Έν•œ 효과λ₯Ό λ³΄μ—¬μ£Όμ—ˆλ‹€. μ‹€μ§ˆμ μΈ 속도 ν–₯상이 λ―Έν‘ν•œ 것은 신경망 ꡬ쑰가 가지고 μžˆλŠ” κ΅¬μ‘°μƒμ˜ ν•œκ³„μ—μ„œ λ°œμƒν•˜λŠ” 문제이고, 이것을 ν•΄κ²°ν•˜λ €λ©΄ 신경망 ꡬ쑰λ₯Ό λ°”κΎΈλŠ”κ²ƒμ΄ κ°€μž₯ 근본적인 해결책이닀. μ΄λŸ¬ν•œ κ΄€μ°° κ²°κ³Όλ₯Ό ν† λŒ€λ‘œ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 선행연ꡬ보닀 더 높은 속도 ν–₯상을 μœ„ν•˜μ—¬ 신경망 ꡬ쑰λ₯Ό λ³€ν™˜ν•˜λŠ” 방법을 μ œμ•ˆν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, 각 μΈ΅λ§ˆλ‹€ μ„œλ‘œ λ‹€λ₯Έ ꡬ쑰λ₯Ό κ°€μ§ˆ 수 μžˆλ„λ‘ 탐색 λ²”μœ„λ₯Ό 더 ν™•μž₯μ‹œν‚€λ©΄μ„œλ„ ν•™μŠ΅μ„ κ°€λŠ₯ν•˜κ²Œ ν•˜λŠ” 신경망 ꡬ쑰 탐색 방법을 μ œμ‹œν•˜μ˜€λ‹€. μ„ ν–‰ μ—°κ΅¬μ—μ„œμ˜ 신경망 ꡬ쑰 탐색은 κΈ°λ³Έ λ‹¨μœ„μΈ μ…€(Cell)의 ꡬ쑰λ₯Ό νƒμƒ‰ν•˜κ³ , κ·Έ κ²°κ³Όλ₯Ό λ³΅μ‚¬ν•˜μ—¬ ν•˜λ‚˜μ˜ 큰 μ‹ κ²½λ§μœΌλ‘œ λ§Œλ“œλŠ” 방법을 μ΄μš©ν•œλ‹€. ν•΄λ‹Ή 방법은 ν•˜λ‚˜μ˜ μ…€ ꡬ쑰만 μ‚¬μš©λ˜κΈ° λ•Œλ¬Έμ— μœ„μΉ˜μ— λ”°λ₯Έ μž…λ ₯ νŠΉμ„±λ§΅(Input Feature Map)의 ν¬κΈ°λ‚˜ κ°€μ€‘μΉ˜ νŒŒλΌλ―Έν„°μ˜ 크기 등에 κ΄€ν•œ μ •λ³΄λŠ” λ¬΄μ‹œν•˜κ²Œ λœλ‹€. λ³Έ 논문은 μ΄λŸ¬ν•œ λ¬Έμ œμ λ“€μ„ ν•΄κ²°ν•˜λ©΄μ„œλ„ μ•ˆμ •μ μœΌλ‘œ ν•™μŠ΅μ„ μ‹œν‚¬ 수 μžˆλŠ” 방법을 μ œμ‹œν•˜μ˜€λ‹€. λ˜ν•œ, μ—°μ‚°λŸ‰λΏλ§Œμ•„λ‹ˆλΌ λ©”λͺ¨λ¦¬ μ ‘κ·Ό 횟수의 μ œμ•½μ„ μ£Όμ–΄ 더 효율적인 ꡬ쑰λ₯Ό 찾을 수 μžˆλ„λ‘ λ„μ™€μ£ΌλŠ” νŽ˜λ„ν‹°(Penalty)λ₯Ό μƒˆλ‘œμ΄ κ³ μ•ˆν•˜μ˜€λ‹€.Deep learning is the most promising machine learning algorithm, and it is already used in real life. Actually, the latest smartphone use a neural network for better photograph and voice recognition. However, as the performance of the neural network improved, the hardware cost dramatically increases. Until the past few years, many researches focus on only a single side such as hardware or software, so its actual cost is hardly improved. Therefore, hardware and software co-optimization is needed to achieve further improvement. For this reason, this dissertation proposes the efficient inference system considering the hardware accelerator to the network architecture design. The first part of the dissertation is a deep neural network accelerator with stochastic computing. The main goal is the efficient stochastic computing hardware design for a convolutional neural network. It includes stochastic ReLU and optimized max function, which are key components in the convolutional neural network. To avoid the range limitation problem of stochastic numbers and increase the signal-to-noise ratio, we perform weight normalization and upscaling. In addition, to reduce the overhead of binary-to-stochastic conversion, we propose a scheme for sharing stochastic number generators among the neurons in the convolutional neural network. The second part of the dissertation is a neural architecture transformation. The network recasting is proposed, and it enables the network architecture transformation. The primary goal of this method is to accelerate the inference process through the transformation, but there can be many other practical applications. The method is based on block-wise recasting; it recasts each source block in a pre-trained teacher network to a target block in a student network. For the recasting, a target block is trained such that its output activation approximates that of the source block. Such a block-by-block recasting in a sequential manner transforms the network architecture while preserving accuracy. This method can be used to transform an arbitrary teacher network type to an arbitrary student network type. It can even generate a mixed-architecture network that consists of two or more types of block. The network recasting can generate a network with fewer parameters and/or activations, which reduce the inference time significantly. Naturally, it can be used for network compression by recasting a trained network into a smaller network of the same type. The third part of the dissertation is a fine-grained neural architecture search. InheritedNAS is the fine-grained architecture search method, and it uses the coarsegrained architecture that is found from the cell-based architecture search. Basically, fine-grained architecture has a very large search space, so it is hard to find directly. A stage independent search is proposed, and this method divides the entire network to several stages and trains each stage independently. To break the dependency between each stage, a two-point matching distillation method is also proposed. And then, operation pruning is applied to remove the unimportant operation. The block-wise pruning method is used to remove the operations rather than the node-wise pruning. In addition, a hardware-aware latency penalty is proposed, and it covers not only FLOPs but also memory access.1 Introduction 1 1.1 DNN Accelerator with Stochastic Computing 2 1.2 Neural Architecture Transformation 4 1.3 Fine-Grained Neural Architecture Search 6 2 Background 8 2.1 Stochastic Computing 8 2.2 Neural Network 10 2.2.1 Network Compression 10 2.2.2 Neural Network Accelerator 13 2.3 Knowledge Distillation 17 2.4 Neural Architecture Search 19 3 DNN Accelerator with Stochastic Computing 23 3.1 Motivation 23 3.1.1 Multiplication Error on Stochastic Computing 23 3.1.2 DNN with Stochastic Computing 24 3.2 Unipolar SC Hardware for CNN 25 3.2.1 Overall Hardware Design 25 3.2.2 Stochastic ReLU function 27 3.2.3 Stochastic Max function 30 3.2.4 Efficient Average Function 36 3.3 Weight Modulation for SC Hardware 38 3.3.1 Weight Normalization for SC 38 3.3.2 Weight Upscaling for Output Layer 43 3.4 Early Decision Termination 44 3.5 Stochastic Number Generator Sharing 49 3.6 Experiments 53 3.6.1 Accuracy of CNN using Unipolar SC 53 3.6.2 Synthesis Result 57 3.7 Summary 58 4 Neural Architecture Transformation 59 4.1 Motivation 59 4.2 Network Recasting 61 4.2.1 Recasting from DenseNet to ResNet and ConvNet 63 4.2.2 Recasting from ResNet to ConvNet 63 4.2.3 Compression 63 4.2.4 Block Training 65 4.2.5 Sequential Recasting and Fine-tuning 67 4.3 Experiments 69 4.3.1 Visualization of Filter Reduction 70 4.3.2 CIFAR 71 4.3.3 ILSVRC2012 73 4.4 Summary 76 5 Fine-Grained Neural Architecture Search 77 5.1 Motivation 77 5.1.1 Search Space Reduction Versus Diversity 77 5.1.2 Hardware-Aware Optimization 78 5.2 InheritedNAS 79 5.2.1 Stage Independent Search 79 5.2.2 Operation Pruning 82 5.2.3 Entire Search Procedure 83 5.3 Hardware-aware Penalty Design 85 5.4 Experiments 87 5.4.1 Fine-Grained Architecture Search 88 5.4.2 Penalty Analysis 90 5.5 Summary 92 6 Conclusion 93 Abstract (In Korean) 113Docto
    • …
    corecore