5 research outputs found

    ๋™์  ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ ํ•™์Šต์„ ์œ„ํ•œ ์‹ฌ์ธต ํ•˜์ดํผ๋„คํŠธ์›Œํฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 2. ์žฅ๋ณ‘ํƒ.Recent advancements in information communication technology has led the explosive increase of data. Dissimilar to traditional data which are structured and unimodal, in particular, the characteristics of recent data generated from dynamic environments are summarized as high-dimensionality, multimodality, and structurelessness as well as huge-scale size. The learning from non-stationary multimodal data is essential for solving many difficult problems in artificial intelligence. However, despite many successful reports, existing machine learning methods have mainly focused on solving practical problems represented by large-scaled but static databases, such as image classification, tagging, and retrieval. Hypernetworks are a probabilistic graphical model representing empirical distribution, using a hypergraph structure that is a large collection of many hyperedges encoding the associations among variables. This representation allows the model to be suitable for characterizing the complex relationships between features with a population of building blocks. However, since a hypernetwork is represented by a huge combinatorial feature space, the model requires a large number of hyperedges for handling the multimodal large-scale data and thus faces the scalability problem. In this dissertation, we propose a deep architecture of hypernetworks for dealing with the scalability issue for learning from multimodal data with non-stationary properties such as videos, i.e., deep hypernetworks. Deep hypernetworks handle the issues through the abstraction at multiple levels using a hierarchy of multiple hypergraphs. We use a stochastic method based on Monte-Carlo simulation, a graph MC, for efficiently constructing hypergraphs representing the empirical distribution of the observed data. The structure of a deep hypernetwork continuously changes as the learning proceeds, and this flexibility is contrasted to other deep learning models. The proposed model incrementally learns from the data, thus handling the nonstationary properties such as concept drift. The abstract representations in the learned models play roles of multimodal knowledge on data, which are used for the content-aware crossmodal transformation including vision-language conversion. We view the vision-language conversion as a machine translation, and thus formulate the vision-language translation in terms of the statistical machine translation. Since the knowledge on the video stories are used for translation, we call this story-aware vision-language translation. We evaluate deep hypernetworks on large-scale vision-language multimodal data including benmarking datasets and cartoon video series. The experimental results show the deep hypernetworks effectively represent visual-linguistic information abstracted at multiple levels of the data contents as well as the associations between vision and language. We explain how the introduction of a hierarchy deals with the scalability and non-stationary properties. In addition, we present the story-aware vision-language translation on cartoon videos by generating scene images from sentences and descriptive subtitles from scene images. Furthermore, we discuss the meaning of our model for lifelong learning and the improvement direction for achieving human-level artificial intelligence.1 Introduction 1.1 Background and Motivation 1.2 Problems to be Addressed 1.3 The Proposed Approach and its Contribution 1.4 Organization of the Dissertation 2 RelatedWork 2.1 Multimodal Leanring 2.2 Models for Learning from Multimodal Data 2.2.1 Topic Model-Based Multimodal Leanring 2.2.2 Deep Network-based Multimodal Leanring 2.3 Higher-Order Graphical Models 2.3.1 Hypernetwork Models 2.3.2 Bayesian Evolutionary Learning of Hypernetworks 3 Multimodal Hypernetworks for Text-to-Image Retrievals 3.1 Overview 3.2 Hypernetworks for Multimodal Associations 3.2.1 Multimodal Hypernetworks 3.2.2 Incremental Learning of Multimodal Hypernetworks 3.3 Text-to-Image Crossmodal Inference 3.3.1 Representatation of Textual-Visual Data 3.3.2 Text-to-Image Query Expansion 3.4 Text-to-Image Retrieval via Multimodal Hypernetworks 3.4.1 Data and Experimental Settings 3.4.2 Text-to-Image Retrieval Performance 3.4.3 Incremental Learning for Text-to-Image Retrieval 3.5 Summary 4 Deep Hypernetworks for Multimodal Cocnept Learning from Cartoon Videos 4.1 Overview 4.2 Visual-Linguistic Concept Representation of Catoon Videos 4.3 Deep Hypernetworks for Modeling Visual-Linguistic Concepts 4.3.1 Sparse Population Coding 4.3.2 Deep Hypernetworks for Concept Hierarchies 4.3.3 Implication of Deep Hypernetworks on Cognitive Modeling 4.4 Learning of Deep Hypernetworks 4.4.1 Problem Space of Deep Hypernetworks 4.4.2 Graph Monte-Carlo Simulation 4.4.3 Learning of Concept Layers 4.4.4 Incremental Concept Construction 4.5 Incremental Concept Construction from Catoon Videos 4.5.1 Data Description and Parameter Setup 4.5.2 Concept Representation and Development 4.5.3 Character Classification via Concept Learning 4.5.4 Vision-Language Conversion via Concept Learning 4.6 Summary 5 Story-awareVision-LanguageTranslation usingDeepConcept Hiearachies 5.1 Overview 5.2 Vision-Language Conversion as a Machine Translation 5.2.1 Statistical Machine Translation 5.2.2 Vision-Language Translation 5.3 Story-aware Vision-Language Translation using Deep Concept Hierarchies 5.3.1 Story-aware Vision-Language Translation 5.3.2 Vision-to-Language Translation 5.3.3 Language-to-Vision Translation 5.4 Story-aware Vision-Language Translation on Catoon Videos 5.4.1 Data and Experimental Setting 5.4.2 Scene-to-Sentence Generation 5.4.3 Sentence-to-Scene Generation 5.4.4 Visual-Linguistic Story Summarization of Cartoon Videos 5.5 Summary 6 Concluding Remarks 6.1 Summary of the Dissertation 6.2 Directions for Further Research Bibliography ํ•œ๊ธ€์ดˆ๋กDocto

    Multimodal Learning from TV Drama using Deep Hypernetworks

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 2. ์žฅ๋ณ‘ํƒ.์ตœ๊ทผ ์ธํ„ฐ๋„ท๊ธฐ์ˆ ์˜ ๋ฐœ์ „๊ณผ ๋”ฅ ๋Ÿฌ๋‹ ์—ฐ๊ตฌ์˜ ํ™œ์„ฑํ™”๋ฅผ ํ†ตํ•ด ์ธ๊ณต์ง€๋Šฅ ์—ฐ๊ตฌ์— ๊ด€๋ จ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ImageNet, WordNet๊ณผ ๊ฐ™์€ ์ •ํ˜•ํ™”๋œ ๋‹จ์ผ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ๋ฐ์ดํ„ฐ๋Š” ๋ฌผ๋ก , Flickr 8K, Flickr 30K, Microsoft COCO์™€ ๊ฐ™์€ ๋Œ€ํ‘œ์ ์ธ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ๋“ค๋„ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ •์  ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šต๋œ ์ธ๊ณต์ง€๋Šฅ ๊ธฐ์ˆ ์€ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰, ์‹œ๊ฐ-์–ธ์–ด ๋ฒˆ์—ญ ๋“ฑ ๋งŽ์€ ๋ถ„์•ผ์—์„œ ์„ฑ๊ณต์‚ฌ๋ก€๋“ค์„ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์„ธ๊ณ„์—์„œ ๋”์šฑ ๋‹ค์–‘ํ•œ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋™์  ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ์ธ๊ณต์ง€๋Šฅ ๊ธฐ์ˆ ์ด ํ•„์š”ํ•˜๋‹ค. TV๋“œ๋ผ๋งˆ๋Š” ์ธ๊ฐ„ ์‚ฌํšŒ์˜ ์—„์ฒญ๋‚œ ์ง€์‹์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด๋Ÿฌํ•œ ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ๋Š” ์ž์œ ๋กœ์šด ์Šคํ† ๋ฆฌ ์ „๊ฐœ๋ฅผ ํ†ตํ•ด ์ธ๋ฌผ๋“ค ๊ฐ„์˜ ๊ด€๊ณ„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฒฝ์ œ, ์ •์น˜, ๋ฌธํ™” ๋“ฑ ๋‹ค์–‘ํ•œ ์ง€์‹์„ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์ „๋‹ฌํ•ด์ฃผ๊ณ  ์žˆ๋‹ค. ํŠนํžˆ ๋‹ค์–‘ํ•œ ์žฅ์†Œ์—์„œ ์ธ๊ฐ„์˜ ๋Œ€ํ™” ์Šต์„ฑ๊ณผ ํ–‰๋™ ํŒจํ„ด์€ ์‚ฌํšŒ๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ ์•„์ฃผ ์ค‘์š”ํ•œ ์ •๋ณด์ด๋‹ค. ํ•˜์ง€๋งŒ TV๋“œ๋ผ๋งˆ์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ๊ณผ ๋™์ ์ธ ํŠน์„ฑ์œผ๋กœ ์ธํ•ด ํ•™์Šต๋ชจ๋ธ์ด ๋น„๋””์˜ค๋กœ๋ถ€ํ„ฐ ์ž๋™์œผ๋กœ ์ง€์‹์„ ์Šต๋“ํ•˜๊ธฐ์—๋Š” ์•„์ง ๋งŽ์€ ์–ด๋ ค์›€์ด ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ๋“ค์„ ํ•ด๊ฒฐํ•˜๋ ค๋ฉด ํšจ๊ณผ์ ์ธ ๋™์  ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ ํ•™์Šต ๊ธฐ์ˆ ๊ณผ ๋‹ค์–‘ํ•œ ์˜์ƒ์ฒ˜๋ฆฌ ๊ธฐ์ˆ ๋“ค์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” TV๋“œ๋ผ๋งˆ์˜ ์ง€์‹์„ ์ž๋™์œผ๋กœ ํ•™์Šตํ•˜๊ณ  ๋ถ„์„ํ•˜๋Š” ๋”ฅ ํ•˜์ดํผ๋„คํŠธ์›Œํฌ(Deep hypernetworks) ๊ธฐ๋ฐ˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๋”ฅ ํ•˜์ดํผ๋„คํŠธ์›Œํฌ๋Š” ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋‹จ๊ณ„์˜ ์ถ”์ƒํ™”๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ง€์‹์„ ํ•™์Šตํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์ง•์œผ๋กœ ์ธํ•ด ๋ชจ๋ธ์ด ๋ณต์žกํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํ•™์Šต์„ ํšจ์œจ์ ์œผ๋กœ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ธฐ์กด์˜ ๊ณ ์ •๋œ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์™€๋Š” ๋‹ฌ๋ฆฌ ๋”ฅ ํ•˜์ดํผ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๋Š” ์œ ๋™์ ์œผ๋กœ ๋ณ€ํ•  ์ˆ˜ ์žˆ์–ด ๋™์ ์ธ ์ •๋ณด๋ฅผ ๋‹ค๋ฃจ๊ธฐ์— ์ ํ•ฉํ•˜๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ํ†ตํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” TV๋“œ๋ผ๋งˆ๋ฅผ ๋ถ„์„ํ•˜์˜€๋‹ค. ์‹คํ—˜์„ ์œ„ํ•ด 183ํŽธ ์—ํ”ผ์†Œ๋“œ, ์ด 4400๋ถ„ ๋ถ„๋Ÿ‰์˜ TV๋“œ๋ผ๋งˆ 'Friends'๋ฅผ ์‚ฌ์šฉํ–ˆ๊ณ  ๋‹ค์–‘ํ•œ ์˜์ƒ์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์žฅ์†Œ์™€ ๋“ฑ์žฅ์ธ๋ฌผ ๋“ฑ ์‹œ๊ฐ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋”ฅ ํ•˜์ดํผ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ์„ ํ†ตํ•ด ์ž๋™์œผ๋กœ ์†Œ์…œ ๋„คํŠธ์›Œํฌ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ TV๋“œ๋ผ๋งˆ์—์„œ ์ถœํ˜„ํ•˜๋Š” ๋‹ค์–‘ํ•œ ์žฅ๋ฉด์—์„œ์˜ ์ธ๋ฌผ ๊ด€๊ณ„ ๋ณ€ํ™”๋ฅผ ๋ถ„์„ํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ์†Œ์…œ ๋„คํŠธ์›Œํฌ ๋ถ„์„์œผ๋กœ๋ถ€ํ„ฐ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํ•™์Šต์„ ํ•  ์ˆ˜ ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ ์Šคํ† ๋ฆฌ์˜ ์ „๊ฐœ์— ๋”ฐ๋ฅธ ์ธ๋ฌผ๊ด€๊ณ„ ๋ณ€ํ™”๋กœ๋ถ€ํ„ฐ ๋™์  ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์—ˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋ชจ๋ธ์˜ ํ•™์Šต์ •๋„๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šต๋œ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ์‹œ๊ฐ-์–ธ์–ด ๋ฒˆ์—ญ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์‹คํ—˜๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํ•™์Šต์„ ํ†ตํ•ด ์ถ”์ถœ๋œ ์ง€์‹์ด ์‹œ๊ฐ-์–ธ์–ด ๋ฒˆ์—ญ ์ •ํ™•๋„์— ๊ธฐ์—ฌํ•˜์˜€์Œ์„ ์•Œ ์ˆ˜๊ฐ€ ์žˆ๊ณ  ์Šคํ† ๋ฆฌ์˜ ์ถ•์ ์— ๋”ฐ๋ผ ์ •ํ™•๋„๊ฐ€ ๋†’์•„์กŒ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.I. ์„œ ๋ก  1 1. ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋ชฉ์  1 2. ๋…ผ๋ฌธ ๊ตฌ์„ฑ 4 II. ๊ด€๋ จ ์—ฐ๊ตฌ 5 1. ๋”ฅ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํ•™์Šต ์—ฐ๊ตฌ 5 2. ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์—ฐ๊ตฌ 7 2.1. ์†Œ์…œ ๋ฏธ๋””์–ด์˜ ์ •๋ณด ์ถ”์ถœ 7 2.2. ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ์˜ ์†Œ์…œ ์ •๋ณด ๋ถ„์„ 8 3. ์‹œ๊ฐ-์–ธ์–ด ๋ฒˆ์—ญ ์—ฐ๊ตฌ 9 III. ๋”ฅ ํ•˜์ดํผ๋„คํŠธ์›Œํฌ 11 1. ํ•˜์ดํผ๋„คํŠธ์›Œํฌ 11 1.1. ํ•˜์ดํผ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ 11 1.2. ํ•˜์ดํผ๋„คํŠธ์›Œํฌ ํ•™์Šต 14 2. ๋”ฅ ํ•˜์ดํผ๋„คํŠธ์›Œํฌ 15 2.1. ๋”ฅ ํ•˜์ดํผ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ 15 2.2. ๋”ฅ ํ•˜์ดํผ๋„คํŠธ์›Œํฌ ํ•™์Šต 18 IV. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ 23 1. TV๋“œ๋ผ๋งˆ ์‹œ๊ฐ ์ •๋ณด์˜ ์ถ”์ถœ 23 1.1. ๋“ฑ์žฅ์ธ๋ฌผ ์ธ์‹ ๋ฐฉ๋ฒ• 23 1.2. ์žฅ์†Œ ๋ถ„๋ฅ˜ ๋ฐฉ๋ฒ• 26 2. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ์‹คํ—˜ ์„ค์ • 28 V. ๊ฒฐ๊ณผ ๋ฐ ๋…ผ์˜ 30 1. ์†Œ์…œ ๋„คํŠธ์›Œํฌ ๋ถ„์„ 30 1.1. ์ธ๋ฌผ ์ค‘์‹ฌ ๋„คํŠธ์›Œํฌ ์‹œ๊ฐํ™” ๊ธฐ๋ฒ• 30 1.2. ์žฅ์†Œ ๊ธฐ๋ฐ˜ ๋„คํŠธ์›Œํฌ์˜ ์ •๋Ÿ‰์  ํ‰๊ฐ€ 34 2. ์‹œ๊ฐ-์–ธ์–ด ๋ฒˆ์—ญ 38 VI. ๊ฒฐ ๋ก  42 ์ฐธ๊ณ ๋ฌธํ—Œ 43 ์˜๋ฌธ์š”์•ฝ 51Maste

    ๊นŠ์€ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ์ผ์ƒ ํ–‰๋™์— ๋Œ€ํ•œ ํ‰์ƒ ํ•™์Šต: ๋“€์–ผ ๋ฉ”๋ชจ๋ฆฌ ์•„ํ‚คํ…์ณ์™€ ์ ์ง„์  ๋ชจ๋ฉ˜ํŠธ ๋งค์นญ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2018. 8. ์žฅ๋ณ‘ํƒ.Learning from human behaviors in the real world is imperative for building human-aware intelligent systems. We attempt to train a personalized context recognizer continuously in a wearable device by rapidly adapting deep neural networks from sensor data streams of user behaviors. However, training deep neural networks from the data stream is challenging because learning new data through neural networks often results in loss of previously acquired information, referred to as catastrophic forgetting. This catastrophic forgetting problem has been studied for nearly three decades but has not been solved yet because the mechanism of deep learning has been not understood enough. We introduce two methods to deal with the catastrophic forgetting problem in deep neural networks. The first method is motivated by the concept of complementary learning systems (CLS) theory - contending that effective learning of the data stream in a lifetime requires complementary systems that comprise the neocortex and hippocampus in the human brain. We propose a dual memory architecture (DMA), which trains two learning structures: one gradually acquires structured knowledge representations, and the other rapidly learns the specifics of individual experiences. The ability of online learning is achieved by new techniques, such as weight transfer for the new deep module and hypernetworks for fast adaptation. The second method is incremental moment matching (IMM) algorithm. IMM incrementally matches the moment of the posterior distribution of neural networks, which is trained for the previous and the current task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. To provide an insight into the success of two proposed lifelong learning methods, we introduce an insight by introducing two online learning methods of sum-product network, which is a kind of deep probabilistic graphical model. We discuss online learning approaches which are valid in probabilistic models and explain how these approaches can be extended to the lifelong learning algorithms of deep neural networks. We evaluate proposed DMA and IMM on two types of datasets: the various artificial benchmarks devised for evaluating the performance of lifelong learning and the lifelog dataset collected through the Google Glass for 46 days. The experimental results show that our methods outperform comparative models in various experimental settings and that our trials for overcoming catastrophic forgetting are valuable and promising.1 Introduction 1 1.1 Wearable Devices and Lifelog Dataset . . . . . . . . . . . . . . . 1 1.2 Lifelong Learning and Catastrophic Forgetting . . . . . . . . . . 2 1.3 Approach and Contribution . . . . . . . . . . . . . . . . . . . . . 3 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . 6 2 Related Works 8 2.1 Lifelong Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Application-driven Lifelong Learning . . . . . . . . . . . . . . . . 9 2.3 Classical Approach for Preventing Catastrophic Forgetting . . . . 9 2.4 Learning Parameter Distribution for for Preventing Catastrophic Forgetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Sequential Bayesian . . . . . . . . . . . . . . . . . . . . . 12 2.4.2 Approach to Simulating Parameter Distribution . . . . . 14 2.5 Learning Data Distribution for Preventing Catstrophic Forgetting 15 3 Preliminary Study: Online Learning of Sum-Product Networks 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Sum-Product Networks . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Representation of Sum-Product Networks . . . . . . . . . 19 3.2.2 Structure Learning of Sum-Product Networks . . . . . . . 22 3.3 Online Incremental Structure Learning of Sum-Product Networks 23 3.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Non-Parametric Bayesian Sum-Product Networks . . . . . . . . . 29 3.4.1 Model 1: A Prior Distribution for SPN Trees . . . . . . . 29 3.4.2 Model 2: A Prior Distribution for a Class of dag-SPNs . . 34 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.1 History of Online Learning of Sum-Product Networks . . 38 3.5.2 Toward Lifelong Learning of Deep Neural Networks . . . 38 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 Structure Learning for Lifelong Learning: Dual Memory Architecture 42 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Complementary Learning Systems Theory . . . . . . . . . . . . . 44 4.3 Dual Memory Architectures . . . . . . . . . . . . . . . . . . . . . 46 4.4 Online Learning of Multiplicative-Gaussian Hypernetworks . . . 50 4.4.1 Multiplicative-Gaussian Hypernetworks . . . . . . . . . . 50 4.4.2 Evolutionary Structure Learning . . . . . . . . . . . . . . 52 4.4.3 Online Learning on Incremental Features . . . . . . . . . 53 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5.1 Non-stationary Image Data Stream . . . . . . . . . . . . . 56 4.5.2 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.1 Parameter-Decomposability in Deep Learning . . . . . . . 65 4.6.2 Online Bayesian Optimization . . . . . . . . . . . . . . . . 65 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 Sequential Bayesian for Lifelong Learning: Incremental Moment Matching 68 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 Incremental Moment Matching . . . . . . . . . . . . . . . . . . . 69 5.2.1 Mean-based Incremental Moment Matching (mean-IMM) 70 5.2.2 Mode-based Incremental Moment Matching (mode-IMM) 71 5.3 Transfer Techniques for Incremental Moment Matching . . . . . . 74 5.3.1 Weight-Transfer . . . . . . . . . . . . . . . . . . . . . . . 74 5.3.2 L2-transfer . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.3 Drop-transfer . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.4 IMM Procedure . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.1 Disjoint MNIST Experiment . . . . . . . . . . . . . . . . 80 5.4.2 Shuffled MNIST Experiment . . . . . . . . . . . . . . . . 83 5.4.3 ImageNet to CUB Dataset . . . . . . . . . . . . . . . . . 85 5.4.4 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 88 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.5.1 A Shift of Optimal Hyperparameter via Space Smoothing 89 5.5.2 Bayesian Approach on lifelong learning. . . . . . . . . . . 90 5.5.3 Balancing the Information of an Old and a New Task. . . 90 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6 Concluding Remarks 92 6.1 Summary of Methods and Contributions . . . . . . . . . . . . . . 92 6.2 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 93 ์ดˆ๋ก 109Docto

    ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ํŒจํ„ด ๋ถ„์„์„ ์œ„ํ•œ ์ข…๋‹จ ์‹ฌ์ธต ํ•™์Šต๋ง ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ์žฅ๋ณ‘ํƒ.Pattern recognition within time series data became an important avenue of research in artificial intelligence following the paradigm shift of the fourth industrial revolution. A number of studies related to this have been conducted over the past few years, and research using deep learning techniques are becoming increasingly popular. Due to the nonstationary, nonlinear and noisy nature of time series data, it is essential to design an appropriate model to extract its significant features for pattern recognition. This dissertation not only discusses the study of pattern recognition using various hand-crafted feature engineering techniques using physiological time series signals, but also suggests an end-to-end deep learning design methodology without any feature engineering. Time series signal can be classified into signals having periodic and non-periodic characteristics in the time domain. This thesis proposes two end-to-end deep learning design methodologies for pattern recognition of periodic and non-periodic signals. The first proposed deep learning design methodology is Deep ECGNet. Deep ECGNet offers a design scheme for an end-to-end deep learning model using periodic characteristics of Electrocardiogram (ECG) signals. ECG, recorded from the electrophysiologic patterns of heart muscle during heartbeat, could be a promising candidate to provide a biomarker to estimate event-based stress level. Conventionally, the beat-to-beat alternations, heart rate variability (HRV), from ECG have been utilized to monitor the mental stress status as well as the mortality of cardiac patients. These HRV parameters have the disadvantage of having a 5-minute measurement period. In this thesis, human's stress states were estimated without special hand-crafted feature engineering using only 10-second interval data with the deep learning model. The design methodology of this model incorporates the periodic characteristics of the ECG signal into the model. The main parameters of 1D CNNs and RNNs reflecting the periodic characteristics of ECG were updated corresponding to the stress states. The experimental results proved that the proposed method yielded better performance than those of the existing HRV parameter extraction methods and spectrogram methods. The second proposed methodology is an automatic end-to-end deep learning design methodology using Bayesian optimization for non-periodic signals. Electroencephalogram (EEG) is elicited from the central nervous system (CNS) to yield genuine emotional states, even at the unconscious level. Due to the low signal-to-noise ratio (SNR) of EEG signals, spectral analysis in frequency domain has been conventionally applied to EEG studies. As a general methodology, EEG signals are filtered into several frequency bands using Fourier or wavelet analyses and these band features are then fed into a classifier. This thesis proposes an end-to-end deep learning automatic design method using optimization techniques without this basic feature engineering. Bayesian optimization is a popular optimization technique for machine learning to optimize model hyperparameters. It is often used in optimization problems to evaluate expensive black box functions. In this thesis, we propose a method to perform whole model hyperparameters and structural optimization by using 1D CNNs and RNNs as basic deep learning models and Bayesian optimization. In this way, this thesis proposes the Deep EEGNet model as a method to discriminate human emotional states from EEG signals. Experimental results proved that the proposed method showed better performance than that of conventional method based on the conventional band power feature method. In conclusion, this thesis has proposed several methodologies for time series pattern recognition problems from the feature engineering-based conventional methods to the end-to-end deep learning design methodologies with only raw time series signals. Experimental results showed that the proposed methodologies can be effectively applied to pattern recognition problems using time series data.์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ํŒจํ„ด ์ธ์‹ ๋ฌธ์ œ๋Š” 4์ฐจ ์‚ฐ์—… ํ˜๋ช…์˜ ํŒจ๋Ÿฌ๋‹ค์ž„ ์ „ํ™˜๊ณผ ํ•จ๊ป˜ ๋งค์šฐ ์ค‘์š”ํ•œ ์ธ๊ณต ์ง€๋Šฅ์˜ ํ•œ ๋ถ„์•ผ๊ฐ€ ๋˜์—ˆ๋‹ค. ์ด์— ๋”ฐ๋ผ, ์ง€๋‚œ ๋ช‡ ๋…„๊ฐ„ ์ด์™€ ๊ด€๋ จ๋œ ๋งŽ์€ ์—ฐ๊ตฌ๋“ค์ด ์ด๋ฃจ์–ด์ ธ ์™”์œผ๋ฉฐ, ์ตœ๊ทผ์—๋Š” ์‹ฌ์ธต ํ•™์Šต๋ง (deep learning networks) ๋ชจ๋ธ์„ ์ด์šฉํ•œ ์—ฐ๊ตฌ๋“ค์ด ์ฃผ๋ฅผ ์ด๋ฃจ์–ด ์™”๋‹ค. ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋Š” ๋น„์ •์ƒ, ๋น„์„ ํ˜• ๊ทธ๋ฆฌ๊ณ  ์žก์Œ (nonstationary, nonlinear and noisy) ํŠน์„ฑ์œผ๋กœ ์ธํ•˜์—ฌ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ํŒจํ„ด ์ธ์‹ ์ˆ˜ํ–‰์„ ์œ„ํ•ด์„ , ๋ฐ์ดํ„ฐ์˜ ์ฃผ์š”ํ•œ ํŠน์ง•์ ์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ ์ตœ์ ํ™”๋œ ๋ชจ๋ธ์˜ ์„ค๊ณ„๊ฐ€ ํ•„์ˆ˜์ ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋Œ€ํ‘œ์ ์ธ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์ธ ์ƒ์ฒด ์‹ ํ˜ธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ถœ ๋ฐฉ๋ฒ• (hand-crafted feature engineering methods)์„ ์ด์šฉํ•œ ํŒจํ„ด ์ธ์‹ ๊ธฐ๋ฒ•์— ๋Œ€ํ•˜์—ฌ ๋…ผํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๊ถ๊ทน์ ์œผ๋กœ๋Š” ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ถœ ๊ณผ์ •์ด ์—†๋Š” ์ข…๋‹จ ์‹ฌ์ธต ํ•™์Šต๋ง ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ๋‹ค. ์‹œ๊ณ„์—ด ์‹ ํ˜ธ๋Š” ์‹œ๊ฐ„ ์ถ• ์ƒ์—์„œ ํฌ๊ฒŒ ์ฃผ๊ธฐ์  ์‹ ํ˜ธ์™€ ๋น„์ฃผ๊ธฐ์  ์‹ ํ˜ธ๋กœ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ๋‘ ์œ ํ˜•์˜ ์‹ ํ˜ธ๋“ค์— ๋Œ€ํ•œ ํŒจํ„ด ์ธ์‹์„ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ์ข…๋‹จ ์‹ฌ์ธต ํ•™์Šต๋ง์— ๋Œ€ํ•œ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ์ด์šฉํ•ด ์„ค๊ณ„๋œ ๋ชจ๋ธ์€ ์‹ ํ˜ธ์˜ ์ฃผ๊ธฐ์  ํŠน์„ฑ์„ ์ด์šฉํ•œ Deep ECGNet์ด๋‹ค. ์‹ฌ์žฅ ๊ทผ์œก์˜ ์ „๊ธฐ ์ƒ๋ฆฌํ•™์  ํŒจํ„ด์œผ๋กœ๋ถ€ํ„ฐ ๊ธฐ๋ก๋œ ์‹ฌ์ „๋„ (Electrocardiogram, ECG)๋Š” ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์ŠคํŠธ๋ ˆ์Šค ์ˆ˜์ค€์„ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•œ ์ฒ™๋„ (bio marker)๋ฅผ ์ œ๊ณตํ•˜๋Š” ์œ ํšจํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค. ์ „ํ†ต์ ์œผ๋กœ ์‹ฌ์ „๋„์˜ ์‹ฌ๋ฐ•์ˆ˜ ๋ณ€๋™์„ฑ (Herat Rate Variability, HRV) ๋งค๊ฐœ๋ณ€์ˆ˜ (parameter)๋Š” ์‹ฌ์žฅ ์งˆํ™˜ ํ™˜์ž์˜ ์ •์‹ ์  ์ŠคํŠธ๋ ˆ์Šค ์ƒํƒœ ๋ฐ ์‚ฌ๋ง๋ฅ ์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ, ํ‘œ์ค€ ์‹ฌ๋ฐ•์ˆ˜ ๋ณ€๋™์„ฑ ๋งค๊ฐœ ๋ณ€์ˆ˜๋Š” ์ธก์ • ์ฃผ๊ธฐ๊ฐ€ 5๋ถ„ ์ด์ƒ์œผ๋กœ, ์ธก์ • ์‹œ๊ฐ„์ด ๊ธธ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‹ฌ์ธต ํ•™์Šต๋ง ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ 10์ดˆ ๊ฐ„๊ฒฉ์˜ ECG ๋ฐ์ดํ„ฐ๋งŒ์„ ์ด์šฉํ•˜์—ฌ, ์ถ”๊ฐ€์ ์ธ ํŠน์ง• ๋ฒกํ„ฐ์˜ ์ถ”์ถœ ๊ณผ์ • ์—†์ด ์ธ๊ฐ„์˜ ์ŠคํŠธ๋ ˆ์Šค ์ƒํƒœ๋ฅผ ์ธ์‹ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ์ œ์•ˆ๋œ ์„ค๊ณ„ ๊ธฐ๋ฒ•์€ ECG ์‹ ํ˜ธ์˜ ์ฃผ๊ธฐ์  ํŠน์„ฑ์„ ๋ชจ๋ธ์— ๋ฐ˜์˜ํ•˜์˜€๋Š”๋ฐ, ECG์˜ ์€๋‹‰ ํŠน์ง• ์ถ”์ถœ๊ธฐ๋กœ ์‚ฌ์šฉ๋œ 1D CNNs ๋ฐ RNNs ๋ชจ๋ธ์˜ ์ฃผ์š” ๋งค๊ฐœ ๋ณ€์ˆ˜์— ์ฃผ๊ธฐ์  ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•จ์œผ๋กœ์จ, ํ•œ ์ฃผ๊ธฐ ์‹ ํ˜ธ์˜ ์ŠคํŠธ๋ ˆ์Šค ์ƒํƒœ์— ๋”ฐ๋ฅธ ์ฃผ์š” ํŠน์ง•์ ์„ ์ข…๋‹จ ํ•™์Šต๋ง ๋‚ด๋ถ€์ ์œผ๋กœ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ์‹ฌ๋ฐ•์ˆ˜ ๋ณ€๋™์„ฑ ๋งค๊ฐœ๋ณ€์ˆ˜์™€ spectrogram ์ถ”์ถœ ๊ธฐ๋ฒ• ๊ธฐ๋ฐ˜์˜ ํŒจํ„ด ์ธ์‹ ๋ฐฉ๋ฒ•๋ณด๋‹ค ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์€ ๋น„ ์ฃผ๊ธฐ์ ์ด๋ฉฐ ๋น„์ •์ƒ, ๋น„์„ ํ˜• ๊ทธ๋ฆฌ๊ณ  ์žก์Œ ํŠน์„ฑ์„ ์ง€๋‹Œ ์‹ ํ˜ธ์˜ ํŒจํ„ด์ธ์‹์„ ์œ„ํ•œ ์ตœ์  ์ข…๋‹จ ์‹ฌ์ธต ํ•™์Šต๋ง ์ž๋™ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์ด๋‹ค. ๋‡ŒํŒŒ ์‹ ํ˜ธ (Electroencephalogram, EEG)๋Š” ์ค‘์ถ” ์‹ ๊ฒฝ๊ณ„ (CNS)์—์„œ ๋ฐœ์ƒ๋˜์–ด ๋ฌด์˜์‹ ์ƒํƒœ์—์„œ๋„ ๋ณธ์—ฐ์˜ ๊ฐ์ • ์ƒํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š”๋ฐ, EEG ์‹ ํ˜ธ์˜ ๋‚ฎ์€ ์‹ ํ˜ธ ๋Œ€ ์žก์Œ๋น„ (SNR)๋กœ ์ธํ•ด ๋‡ŒํŒŒ๋ฅผ ์ด์šฉํ•œ ๊ฐ์ • ์ƒํƒœ ํŒ์ •์„ ์œ„ํ•ด์„œ ์ฃผ๋กœ ์ฃผํŒŒ์ˆ˜ ์˜์—ญ์˜ ์ŠคํŽ™ํŠธ๋Ÿผ ๋ถ„์„์ด ๋‡ŒํŒŒ ์—ฐ๊ตฌ์— ์ ์šฉ๋˜์–ด ์™”๋‹ค. ํ†ต์ƒ์ ์œผ๋กœ ๋‡ŒํŒŒ ์‹ ํ˜ธ๋Š” ํ‘ธ๋ฆฌ์— (Fourier) ๋˜๋Š” ์›จ์ด๋ธ”๋ › (wavelet) ๋ถ„์„์„ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ์ฃผํŒŒ์ˆ˜ ๋Œ€์—ญ์œผ๋กœ ํ•„ํ„ฐ๋ง ๋œ๋‹ค. ์ด๋ ‡๊ฒŒ ์ถ”์ถœ๋œ ์ฃผํŒŒ์ˆ˜ ํŠน์ง• ๋ฒกํ„ฐ๋Š” ๋ณดํ†ต ์–•์€ ํ•™์Šต ๋ถ„๋ฅ˜๊ธฐ (shallow machine learning classifier)์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉ๋˜์–ด ํŒจํ„ด ์ธ์‹์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๊ธฐ๋ณธ์ ์ธ ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ถœ ๊ณผ์ •์ด ์—†๋Š” ๋ฒ ์ด์ง€์•ˆ ์ตœ์ ํ™” (Bayesian optimization) ๊ธฐ๋ฒ•์„ ์ด์šฉํ•œ ์ข…๋‹จ ์‹ฌ์ธต ํ•™์Šต๋ง ์ž๋™ ์„ค๊ณ„ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฒ ์ด์ง€์•ˆ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์€ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ (hyperparamters)๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๊ณ„ ํ•™์Šต ๋ถ„์•ผ์˜ ๋Œ€ํ‘œ์ ์ธ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์ธ๋ฐ, ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ํ‰๊ฐ€ ์‹œ๊ฐ„์ด ๋งŽ์ด ์†Œ์š”๋˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜ (expensive black box function)๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š” ์ตœ์ ํ™” ๋ฌธ์ œ์— ์ ํ•ฉํ•˜๋‹ค. ์ด๋Ÿฌํ•œ ๋ฒ ์ด์ง€์•ˆ ์ตœ์ ํ™”๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ธฐ๋ณธ์ ์ธ ํ•™์Šต ๋ชจ๋ธ์ธ 1D CNNs ๋ฐ RNNs์˜ ์ „์ฒด ๋ชจ๋ธ์˜ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ๋ฐ ๊ตฌ์กฐ์  ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€์œผ๋ฉฐ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ๋ฐ”ํƒ•์œผ๋กœ Deep EEGNet์ด๋ผ๋Š” ์ธ๊ฐ„์˜ ๊ฐ์ •์ƒํƒœ๋ฅผ ํŒ๋ณ„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์—ฌ๋Ÿฌ ์‹คํ—˜์„ ํ†ตํ•ด ์ œ์•ˆ๋œ ๋ชจ๋ธ์ด ๊ธฐ์กด์˜ ์ฃผํŒŒ์ˆ˜ ํŠน์ง• ๋ฒกํ„ฐ (band power feature) ์ถ”์ถœ ๊ธฐ๋ฒ• ๊ธฐ๋ฐ˜์˜ ์ „ํ†ต์ ์ธ ๊ฐ์ • ํŒจํ„ด ์ธ์‹ ๋ฐฉ๋ฒ•๋ณด๋‹ค ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ฒฐ๋ก ์ ์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ํŒจํ„ด ์ธ์‹๋ฌธ์ œ๋ฅผ ์—ฌ๋Ÿฌ ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ถœ ๊ธฐ๋ฒ• ๊ธฐ๋ฐ˜์˜ ์ „ํ†ต์ ์ธ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์„ค๊ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ถ€ํ„ฐ, ์ถ”๊ฐ€์ ์ธ ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ถœ ๊ณผ์ • ์—†์ด ์›๋ณธ ๋ฐ์ดํ„ฐ๋งŒ์„ ์ด์šฉํ•˜์—ฌ ์ข…๋‹จ ์‹ฌ์ธต ํ•™์Šต๋ง์„ ์„ค๊ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•๊นŒ์ง€ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋˜ํ•œ, ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์ด ์‹œ๊ณ„์—ด ์‹ ํ˜ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ํŒจํ„ด ์ธ์‹ ๋ฌธ์ œ์— ํšจ๊ณผ์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค.Chapter 1 Introduction 1 1.1 Pattern Recognition in Time Series 1 1.2 Major Problems in Conventional Approaches 7 1.3 The Proposed Approach and its Contribution 8 1.4 Thesis Organization 10 Chapter 2 Related Works 12 2.1 Pattern Recognition in Time Series using Conventional Methods 12 2.1.1 Time Domain Features 12 2.1.2 Frequency Domain Features 14 2.1.3 Signal Processing based on Multi-variate Empirical Mode Decomposition (MEMD) 15 2.1.4 Statistical Time Series Model (ARIMA) 18 2.2 Fundamental Deep Learning Algorithms 20 2.2.1 Convolutional Neural Networks (CNNs) 20 2.2.2 Recurrent Neural Networks (RNNs) 22 2.3 Hyper Parameters and Structural Optimization Techniques 24 2.3.1 Grid and Random Search Algorithms 24 2.3.2 Bayesian Optimization 25 2.3.3 Neural Architecture Search 28 2.4 Research Trends related to Time Series Data 29 2.4.1 Generative Model of Raw Audio Waveform 30 Chapter 3 Preliminary Researches: Patten Recognition in Time Series using Various Feature Extraction Methods 31 3.1 Conventional Methods using Time and Frequency Features: Motor Imagery Brain Response Classification 31 3.1.1 Introduction 31 3.1.2 Methods 32 3.1.3 Ensemble Classification Method (Stacking & AdaBoost) 32 3.1.4 Sensitivity Analysis 33 3.1.5 Classification Results 36 3.2 Statistical Feature Extraction Methods: ARIMA Model Based Feature Extraction Methodology 38 3.2.1 Introduction 38 3.2.2 ARIMA Model 38 3.2.3 Signal Processing 39 3.2.4 ARIMA Model Conformance Test 40 3.2.5 Experimental Results 40 3.2.6 Summary 43 3.3 Application on Specific Time Series Data: Human Stress States Recognition using Ultra-Short-Term ECG Spectral Feature 44 3.3.1 Introduction 44 3.3.2 Experiments 45 3.3.3 Classification Methods 49 3.3.4 Experimental Results 49 3.3.5 Summary 56 Chapter 4 Master Framework for Pattern Recognition in Time Series 57 4.1 The Concept of the Proposed Framework for Pattern Recognition in Time Series 57 4.1.1 Optimal Basic Deep Learning Models for the Proposed Framework 57 4.2 Two Categories for Pattern Recognition in Time Series Data 59 4.2.1 The Proposed Deep Learning Framework for Periodic Time Series Signals 59 4.2.2 The Proposed Deep Learning Framework for Non-periodic Time Series Signals 61 4.3 Expanded Models of the Proposed Master Framework for Pattern Recogntion in Time Series 63 Chapter 5 Deep Learning Model Design Methodology for Periodic Signals using Prior Knowledge: Deep ECGNet 65 5.1 Introduction 65 5.2 Materials and Methods 67 5.2.1 Subjects and Data Acquisition 67 5.2.2 Conventional ECG Analysis Methods 72 5.2.3 The Initial Setup of the Deep Learning Architecture 75 5.2.4 The Deep ECGNet 78 5.3 Experimental Results 83 5.4 Summary 98 Chapter 6 Deep Learning Model Design Methodology for Non-periodic Time Series Signals using Optimization Techniques: Deep EEGNet 100 6.1 Introduction 100 6.2 Materials and Methods 104 6.2.1 Subjects and Data Acquisition 104 6.2.2 Conventional EEG Analysis Methods 106 6.2.3 Basic Deep Learning Units and Optimization Technique 108 6.2.4 Optimization for Deep EEGNet 109 6.2.5 Deep EEGNet Architectures using the EEG Channel Grouping Scheme 111 6.3 Experimental Results 113 6.4 Summary 124 Chapter 7 Concluding Remarks 126 7.1 Summary of Thesis and Contributions 126 7.2 Limitations of the Proposed Methods 128 7.3 Suggestions for Future Works 129 Bibliography 131 ์ดˆ ๋ก 139Docto

    Mixed Order Hyper-Networks for Function Approximation and Optimisation

    Get PDF
    Many systems take inputs, which can be measured and sometimes controlled, and outputs, which can also be measured and which depend on the inputs. Taking numerous measurements from such systems produces data, which may be used to either model the system with the goal of predicting the output associated with a given input (function approximation, or regression) or of finding the input settings required to produce a desired output (optimisation, or search). Approximating or optimising a function is central to the field of computational intelligence. There are many existing methods for performing regression and optimisation based on samples of data but they all have limitations. Multi layer perceptrons (MLPs) are universal approximators, but they suffer from the black box problem, which means their structure and the function they implement is opaque to the user. They also suffer from a propensity to become trapped in local minima or large plateaux in the error function during learning. A regression method with a structure that allows models to be compared, human knowledge to be extracted, optimisation searches to be guided and model complexity to be controlled is desirable. This thesis presents such as method. This thesis presents a single framework for both regression and optimisation: the mixed order hyper network (MOHN). A MOHN implements a function f:{-1,1}^n ->R to arbitrary precision. The structure of a MOHN makes the ways in which input variables interact to determine the function output explicit, which allows human insights and complexity control that are very difficult in neural networks with hidden units. The explicit structure representation also allows efficient algorithms for searching for an input pattern that leads to a desired output. A number of learning rules for estimating the weights based on a sample of data are presented along with a heuristic method for choosing which connections to include in a model. Several methods for searching a MOHN for inputs that lead to a desired output are compared. Experiments compare a MOHN to an MLP on regression tasks. The MOHN is found to achieve a comparable level of accuracy to an MLP but suffers less from local minima in the error function and shows less variance across multiple training trials. It is also easier to interpret and combine from an ensemble. The trade-off between the fit of a model to its training data and that to an independent set of test data is shown to be easier to control in a MOHN than an MLP. A MOHN is also compared to a number of existing optimisation methods including those using estimation of distribution algorithms, genetic algorithms and simulated annealing. The MOHN is able to find optimal solutions in far fewer function evaluations than these methods on tasks selected from the literature
    corecore