14,477 research outputs found

    κΉŠμ€ 신경망 기반 일상 행동에 λŒ€ν•œ 평생 ν•™μŠ΅: λ“€μ–Ό λ©”λͺ¨λ¦¬ 아킀텍쳐와 점진적 λͺ¨λ©˜νŠΈ 맀칭

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2018. 8. μž₯병탁.Learning from human behaviors in the real world is imperative for building human-aware intelligent systems. We attempt to train a personalized context recognizer continuously in a wearable device by rapidly adapting deep neural networks from sensor data streams of user behaviors. However, training deep neural networks from the data stream is challenging because learning new data through neural networks often results in loss of previously acquired information, referred to as catastrophic forgetting. This catastrophic forgetting problem has been studied for nearly three decades but has not been solved yet because the mechanism of deep learning has been not understood enough. We introduce two methods to deal with the catastrophic forgetting problem in deep neural networks. The first method is motivated by the concept of complementary learning systems (CLS) theory - contending that effective learning of the data stream in a lifetime requires complementary systems that comprise the neocortex and hippocampus in the human brain. We propose a dual memory architecture (DMA), which trains two learning structures: one gradually acquires structured knowledge representations, and the other rapidly learns the specifics of individual experiences. The ability of online learning is achieved by new techniques, such as weight transfer for the new deep module and hypernetworks for fast adaptation. The second method is incremental moment matching (IMM) algorithm. IMM incrementally matches the moment of the posterior distribution of neural networks, which is trained for the previous and the current task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. To provide an insight into the success of two proposed lifelong learning methods, we introduce an insight by introducing two online learning methods of sum-product network, which is a kind of deep probabilistic graphical model. We discuss online learning approaches which are valid in probabilistic models and explain how these approaches can be extended to the lifelong learning algorithms of deep neural networks. We evaluate proposed DMA and IMM on two types of datasets: the various artificial benchmarks devised for evaluating the performance of lifelong learning and the lifelog dataset collected through the Google Glass for 46 days. The experimental results show that our methods outperform comparative models in various experimental settings and that our trials for overcoming catastrophic forgetting are valuable and promising.1 Introduction 1 1.1 Wearable Devices and Lifelog Dataset . . . . . . . . . . . . . . . 1 1.2 Lifelong Learning and Catastrophic Forgetting . . . . . . . . . . 2 1.3 Approach and Contribution . . . . . . . . . . . . . . . . . . . . . 3 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . 6 2 Related Works 8 2.1 Lifelong Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Application-driven Lifelong Learning . . . . . . . . . . . . . . . . 9 2.3 Classical Approach for Preventing Catastrophic Forgetting . . . . 9 2.4 Learning Parameter Distribution for for Preventing Catastrophic Forgetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Sequential Bayesian . . . . . . . . . . . . . . . . . . . . . 12 2.4.2 Approach to Simulating Parameter Distribution . . . . . 14 2.5 Learning Data Distribution for Preventing Catstrophic Forgetting 15 3 Preliminary Study: Online Learning of Sum-Product Networks 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Sum-Product Networks . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Representation of Sum-Product Networks . . . . . . . . . 19 3.2.2 Structure Learning of Sum-Product Networks . . . . . . . 22 3.3 Online Incremental Structure Learning of Sum-Product Networks 23 3.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Non-Parametric Bayesian Sum-Product Networks . . . . . . . . . 29 3.4.1 Model 1: A Prior Distribution for SPN Trees . . . . . . . 29 3.4.2 Model 2: A Prior Distribution for a Class of dag-SPNs . . 34 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.1 History of Online Learning of Sum-Product Networks . . 38 3.5.2 Toward Lifelong Learning of Deep Neural Networks . . . 38 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 Structure Learning for Lifelong Learning: Dual Memory Architecture 42 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Complementary Learning Systems Theory . . . . . . . . . . . . . 44 4.3 Dual Memory Architectures . . . . . . . . . . . . . . . . . . . . . 46 4.4 Online Learning of Multiplicative-Gaussian Hypernetworks . . . 50 4.4.1 Multiplicative-Gaussian Hypernetworks . . . . . . . . . . 50 4.4.2 Evolutionary Structure Learning . . . . . . . . . . . . . . 52 4.4.3 Online Learning on Incremental Features . . . . . . . . . 53 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5.1 Non-stationary Image Data Stream . . . . . . . . . . . . . 56 4.5.2 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.1 Parameter-Decomposability in Deep Learning . . . . . . . 65 4.6.2 Online Bayesian Optimization . . . . . . . . . . . . . . . . 65 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 Sequential Bayesian for Lifelong Learning: Incremental Moment Matching 68 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 Incremental Moment Matching . . . . . . . . . . . . . . . . . . . 69 5.2.1 Mean-based Incremental Moment Matching (mean-IMM) 70 5.2.2 Mode-based Incremental Moment Matching (mode-IMM) 71 5.3 Transfer Techniques for Incremental Moment Matching . . . . . . 74 5.3.1 Weight-Transfer . . . . . . . . . . . . . . . . . . . . . . . 74 5.3.2 L2-transfer . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.3 Drop-transfer . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.4 IMM Procedure . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.1 Disjoint MNIST Experiment . . . . . . . . . . . . . . . . 80 5.4.2 Shuffled MNIST Experiment . . . . . . . . . . . . . . . . 83 5.4.3 ImageNet to CUB Dataset . . . . . . . . . . . . . . . . . 85 5.4.4 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 88 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.5.1 A Shift of Optimal Hyperparameter via Space Smoothing 89 5.5.2 Bayesian Approach on lifelong learning. . . . . . . . . . . 90 5.5.3 Balancing the Information of an Old and a New Task. . . 90 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6 Concluding Remarks 92 6.1 Summary of Methods and Contributions . . . . . . . . . . . . . . 92 6.2 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 93 초둝 109Docto
    • …
    corecore