Search CORE

5 research outputs found

동적 멀티모달 데이터 학습을 위한 심층 하이퍼네트워크

Author: 하정우
Publication venue: 서울대학교 대학원
Publication date: 01/02/2015
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 장병탁.Recent advancements in information communication technology has led the explosive increase of data. Dissimilar to traditional data which are structured and unimodal, in particular, the characteristics of recent data generated from dynamic environments are summarized as high-dimensionality, multimodality, and structurelessness as well as huge-scale size. The learning from non-stationary multimodal data is essential for solving many difficult problems in artificial intelligence. However, despite many successful reports, existing machine learning methods have mainly focused on solving practical problems represented by large-scaled but static databases, such as image classification, tagging, and retrieval. Hypernetworks are a probabilistic graphical model representing empirical distribution, using a hypergraph structure that is a large collection of many hyperedges encoding the associations among variables. This representation allows the model to be suitable for characterizing the complex relationships between features with a population of building blocks. However, since a hypernetwork is represented by a huge combinatorial feature space, the model requires a large number of hyperedges for handling the multimodal large-scale data and thus faces the scalability problem. In this dissertation, we propose a deep architecture of hypernetworks for dealing with the scalability issue for learning from multimodal data with non-stationary properties such as videos, i.e., deep hypernetworks. Deep hypernetworks handle the issues through the abstraction at multiple levels using a hierarchy of multiple hypergraphs. We use a stochastic method based on Monte-Carlo simulation, a graph MC, for efficiently constructing hypergraphs representing the empirical distribution of the observed data. The structure of a deep hypernetwork continuously changes as the learning proceeds, and this flexibility is contrasted to other deep learning models. The proposed model incrementally learns from the data, thus handling the nonstationary properties such as concept drift. The abstract representations in the learned models play roles of multimodal knowledge on data, which are used for the content-aware crossmodal transformation including vision-language conversion. We view the vision-language conversion as a machine translation, and thus formulate the vision-language translation in terms of the statistical machine translation. Since the knowledge on the video stories are used for translation, we call this story-aware vision-language translation. We evaluate deep hypernetworks on large-scale vision-language multimodal data including benmarking datasets and cartoon video series. The experimental results show the deep hypernetworks effectively represent visual-linguistic information abstracted at multiple levels of the data contents as well as the associations between vision and language. We explain how the introduction of a hierarchy deals with the scalability and non-stationary properties. In addition, we present the story-aware vision-language translation on cartoon videos by generating scene images from sentences and descriptive subtitles from scene images. Furthermore, we discuss the meaning of our model for lifelong learning and the improvement direction for achieving human-level artificial intelligence.1 Introduction 1.1 Background and Motivation 1.2 Problems to be Addressed 1.3 The Proposed Approach and its Contribution 1.4 Organization of the Dissertation 2 RelatedWork 2.1 Multimodal Leanring 2.2 Models for Learning from Multimodal Data 2.2.1 Topic Model-Based Multimodal Leanring 2.2.2 Deep Network-based Multimodal Leanring 2.3 Higher-Order Graphical Models 2.3.1 Hypernetwork Models 2.3.2 Bayesian Evolutionary Learning of Hypernetworks 3 Multimodal Hypernetworks for Text-to-Image Retrievals 3.1 Overview 3.2 Hypernetworks for Multimodal Associations 3.2.1 Multimodal Hypernetworks 3.2.2 Incremental Learning of Multimodal Hypernetworks 3.3 Text-to-Image Crossmodal Inference 3.3.1 Representatation of Textual-Visual Data 3.3.2 Text-to-Image Query Expansion 3.4 Text-to-Image Retrieval via Multimodal Hypernetworks 3.4.1 Data and Experimental Settings 3.4.2 Text-to-Image Retrieval Performance 3.4.3 Incremental Learning for Text-to-Image Retrieval 3.5 Summary 4 Deep Hypernetworks for Multimodal Cocnept Learning from Cartoon Videos 4.1 Overview 4.2 Visual-Linguistic Concept Representation of Catoon Videos 4.3 Deep Hypernetworks for Modeling Visual-Linguistic Concepts 4.3.1 Sparse Population Coding 4.3.2 Deep Hypernetworks for Concept Hierarchies 4.3.3 Implication of Deep Hypernetworks on Cognitive Modeling 4.4 Learning of Deep Hypernetworks 4.4.1 Problem Space of Deep Hypernetworks 4.4.2 Graph Monte-Carlo Simulation 4.4.3 Learning of Concept Layers 4.4.4 Incremental Concept Construction 4.5 Incremental Concept Construction from Catoon Videos 4.5.1 Data Description and Parameter Setup 4.5.2 Concept Representation and Development 4.5.3 Character Classification via Concept Learning 4.5.4 Vision-Language Conversion via Concept Learning 4.6 Summary 5 Story-awareVision-LanguageTranslation usingDeepConcept Hiearachies 5.1 Overview 5.2 Vision-Language Conversion as a Machine Translation 5.2.1 Statistical Machine Translation 5.2.2 Vision-Language Translation 5.3 Story-aware Vision-Language Translation using Deep Concept Hierarchies 5.3.1 Story-aware Vision-Language Translation 5.3.2 Vision-to-Language Translation 5.3.3 Language-to-Vision Translation 5.4 Story-aware Vision-Language Translation on Catoon Videos 5.4.1 Data and Experimental Setting 5.4.2 Scene-to-Sentence Generation 5.4.3 Sentence-to-Scene Generation 5.4.4 Visual-Linguistic Story Summarization of Cartoon Videos 5.5 Summary 6 Concluding Remarks 6.1 Summary of the Dissertation 6.2 Directions for Further Research Bibliography 한글초록Docto

SNU Open Repository and Archive

Multimodal Learning from TV Drama using Deep Hypernetworks

Author: 남장군
Publication venue: 서울대학교 대학원
Publication date: 01/02/2017
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 컴퓨터공학부, 2017. 2. 장병탁.최근 인터넷기술의 발전과 딥 러닝 연구의 활성화를 통해 인공지능 연구에 관련된 데이터가 급격히 증가하고 있다. ImageNet, WordNet과 같은 정형화된 단일 모달리티 데이터는 물론, Flickr 8K, Flickr 30K, Microsoft COCO와 같은 대표적인 멀티모달 데이터들도 있다. 이러한 정적 데이터로부터 학습된 인공지능 기술은 이미지 검색, 시각-언어 번역 등 많은 분야에서 성공사례들을 보이고 있다. 하지만 실세계에서 더욱 다양한 문제를 다루기 위해서는 동적 멀티모달 데이터를 효율적으로 학습할 수 있는 인공지능 기술이 필요하다. TV드라마는 인간 사회의 엄청난 지식을 포함하고 있는 대용량 데이터이다. 이러한 비디오 데이터는 자유로운 스토리 전개를 통해 인물들 간의 관계뿐만 아니라 경제, 정치, 문화 등 다양한 지식을 사람들에게 전달해주고 있다. 특히 다양한 장소에서 인간의 대화 습성과 행동 패턴은 사회관계를 분석하는데 있어서 아주 중요한 정보이다. 하지만 TV드라마의 멀티모달과 동적인 특성으로 인해 학습모델이 비디오로부터 자동으로 지식을 습득하기에는 아직 많은 어려움이 있다. 이러한 문제점들을 해결하려면 효과적인 동적 멀티모달 데이터 학습 기술과 다양한 영상처리 기술들이 필요하다. 본 논문에서는 TV드라마의 지식을 자동으로 학습하고 분석하는 딥 하이퍼네트워크(Deep hypernetworks) 기반 멀티모달 학습 방법론을 제안한다. 딥 하이퍼네트워크는 계층적 구조를 이용하여 다양한 단계의 추상화를 통해 데이터로부터 지식을 학습한다. 이러한 특징으로 인해 모델이 복잡한 멀티모달 학습을 효율적으로 진행할 수 있다. 기존의 고정된 신경망 모델의 구조와는 달리 딥 하이퍼네트워크의 구조는 유동적으로 변할 수 있어 동적인 정보를 다루기에 적합하다. 제안된 방법론을 통해 본 논문에서는 TV드라마를 분석하였다. 실험을 위해 183편 에피소드, 총 4400분 분량의 TV드라마 'Friends'를 사용했고 다양한 영상처리 기법을 통해 장소와 등장인물 등 시각 정보를 추출하였다. 본 논문에서는 딥 하이퍼네트워크 모델을 통해 자동으로 소셜 네트워크를 생성하여 TV드라마에서 출현하는 다양한 장면에서의 인물 관계 변화를 분석하였다. 이러한 소셜 네트워크 분석으로부터 제안된 방법이 멀티모달 학습을 할 수 있음을 알 수 있었다. 또한 스토리의 전개에 따른 인물관계 변화로부터 동적 멀티모달 데이터를 학습할 수 있었음을 확인하였다. 모델의 학습정도를 평가하기 위해 본 논문에서는 데이터로부터 학습된 지식을 활용하여 시각-언어 번역 실험을 진행하였다. 실험결과로부터 멀티모달 학습을 통해 추출된 지식이 시각-언어 번역 정확도에 기여하였음을 알 수가 있고 스토리의 축적에 따라 정확도가 높아졌음을 확인하였다.I. 서 론 1 1. 연구 배경 및 목적 1 2. 논문 구성 4 II. 관련 연구 5 1. 딥 네트워크 기반 멀티모달 학습 연구 5 2. 멀티모달 데이터 분석 연구 7 2.1. 소셜 미디어의 정보 추출 7 2.2. 비디오 데이터의 소셜 정보 분석 8 3. 시각-언어 번역 연구 9 III. 딥 하이퍼네트워크 11 1. 하이퍼네트워크 11 1.1. 하이퍼네트워크 구조 11 1.2. 하이퍼네트워크 학습 14 2. 딥 하이퍼네트워크 15 2.1. 딥 하이퍼네트워크 구조 15 2.2. 딥 하이퍼네트워크 학습 18 IV. 데이터 전처리 23 1. TV드라마 시각 정보의 추출 23 1.1. 등장인물 인식 방법 23 1.2. 장소 분류 방법 26 2. 데이터 전처리 및 실험 설정 28 V. 결과 및 논의 30 1. 소셜 네트워크 분석 30 1.1. 인물 중심 네트워크 시각화 기법 30 1.2. 장소 기반 네트워크의 정량적 평가 34 2. 시각-언어 번역 38 VI. 결 론 42 참고문헌 43 영문요약 51Maste

SNU Open Repository and Archive

깊은 신경망 기반 일상 행동에 대한 평생 학습: 듀얼 메모리 아키텍쳐와 점진적 모멘트 매칭

Author: 이상우
Publication venue: 서울대학교 대학원
Publication date: 01/08/2018
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 8. 장병탁.Learning from human behaviors in the real world is imperative for building human-aware intelligent systems. We attempt to train a personalized context recognizer continuously in a wearable device by rapidly adapting deep neural networks from sensor data streams of user behaviors. However, training deep neural networks from the data stream is challenging because learning new data through neural networks often results in loss of previously acquired information, referred to as catastrophic forgetting. This catastrophic forgetting problem has been studied for nearly three decades but has not been solved yet because the mechanism of deep learning has been not understood enough. We introduce two methods to deal with the catastrophic forgetting problem in deep neural networks. The first method is motivated by the concept of complementary learning systems (CLS) theory - contending that effective learning of the data stream in a lifetime requires complementary systems that comprise the neocortex and hippocampus in the human brain. We propose a dual memory architecture (DMA), which trains two learning structures: one gradually acquires structured knowledge representations, and the other rapidly learns the specifics of individual experiences. The ability of online learning is achieved by new techniques, such as weight transfer for the new deep module and hypernetworks for fast adaptation. The second method is incremental moment matching (IMM) algorithm. IMM incrementally matches the moment of the posterior distribution of neural networks, which is trained for the previous and the current task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. To provide an insight into the success of two proposed lifelong learning methods, we introduce an insight by introducing two online learning methods of sum-product network, which is a kind of deep probabilistic graphical model. We discuss online learning approaches which are valid in probabilistic models and explain how these approaches can be extended to the lifelong learning algorithms of deep neural networks. We evaluate proposed DMA and IMM on two types of datasets: the various artificial benchmarks devised for evaluating the performance of lifelong learning and the lifelog dataset collected through the Google Glass for 46 days. The experimental results show that our methods outperform comparative models in various experimental settings and that our trials for overcoming catastrophic forgetting are valuable and promising.1 Introduction 1 1.1 Wearable Devices and Lifelog Dataset . . . . . . . . . . . . . . . 1 1.2 Lifelong Learning and Catastrophic Forgetting . . . . . . . . . . 2 1.3 Approach and Contribution . . . . . . . . . . . . . . . . . . . . . 3 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . 6 2 Related Works 8 2.1 Lifelong Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Application-driven Lifelong Learning . . . . . . . . . . . . . . . . 9 2.3 Classical Approach for Preventing Catastrophic Forgetting . . . . 9 2.4 Learning Parameter Distribution for for Preventing Catastrophic Forgetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Sequential Bayesian . . . . . . . . . . . . . . . . . . . . . 12 2.4.2 Approach to Simulating Parameter Distribution . . . . . 14 2.5 Learning Data Distribution for Preventing Catstrophic Forgetting 15 3 Preliminary Study: Online Learning of Sum-Product Networks 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Sum-Product Networks . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Representation of Sum-Product Networks . . . . . . . . . 19 3.2.2 Structure Learning of Sum-Product Networks . . . . . . . 22 3.3 Online Incremental Structure Learning of Sum-Product Networks 23 3.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Non-Parametric Bayesian Sum-Product Networks . . . . . . . . . 29 3.4.1 Model 1: A Prior Distribution for SPN Trees . . . . . . . 29 3.4.2 Model 2: A Prior Distribution for a Class of dag-SPNs . . 34 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.1 History of Online Learning of Sum-Product Networks . . 38 3.5.2 Toward Lifelong Learning of Deep Neural Networks . . . 38 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 Structure Learning for Lifelong Learning: Dual Memory Architecture 42 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Complementary Learning Systems Theory . . . . . . . . . . . . . 44 4.3 Dual Memory Architectures . . . . . . . . . . . . . . . . . . . . . 46 4.4 Online Learning of Multiplicative-Gaussian Hypernetworks . . . 50 4.4.1 Multiplicative-Gaussian Hypernetworks . . . . . . . . . . 50 4.4.2 Evolutionary Structure Learning . . . . . . . . . . . . . . 52 4.4.3 Online Learning on Incremental Features . . . . . . . . . 53 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5.1 Non-stationary Image Data Stream . . . . . . . . . . . . . 56 4.5.2 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.1 Parameter-Decomposability in Deep Learning . . . . . . . 65 4.6.2 Online Bayesian Optimization . . . . . . . . . . . . . . . . 65 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 Sequential Bayesian for Lifelong Learning: Incremental Moment Matching 68 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 Incremental Moment Matching . . . . . . . . . . . . . . . . . . . 69 5.2.1 Mean-based Incremental Moment Matching (mean-IMM) 70 5.2.2 Mode-based Incremental Moment Matching (mode-IMM) 71 5.3 Transfer Techniques for Incremental Moment Matching . . . . . . 74 5.3.1 Weight-Transfer . . . . . . . . . . . . . . . . . . . . . . . 74 5.3.2 L2-transfer . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.3 Drop-transfer . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.4 IMM Procedure . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.1 Disjoint MNIST Experiment . . . . . . . . . . . . . . . . 80 5.4.2 Shuffled MNIST Experiment . . . . . . . . . . . . . . . . 83 5.4.3 ImageNet to CUB Dataset . . . . . . . . . . . . . . . . . 85 5.4.4 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 88 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.5.1 A Shift of Optimal Hyperparameter via Space Smoothing 89 5.5.2 Bayesian Approach on lifelong learning. . . . . . . . . . . 90 5.5.3 Balancing the Information of an Old and a New Task. . . 90 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6 Concluding Remarks 92 6.1 Summary of Methods and Contributions . . . . . . . . . . . . . . 92 6.2 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 93 초록 109Docto

SNU Open Repository and Archive

시계열 데이터 패턴 분석을 위한 종단 심층 학습망 설계 방법론

Author: 황보선
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 장병탁.Pattern recognition within time series data became an important avenue of research in artificial intelligence following the paradigm shift of the fourth industrial revolution. A number of studies related to this have been conducted over the past few years, and research using deep learning techniques are becoming increasingly popular. Due to the nonstationary, nonlinear and noisy nature of time series data, it is essential to design an appropriate model to extract its significant features for pattern recognition. This dissertation not only discusses the study of pattern recognition using various hand-crafted feature engineering techniques using physiological time series signals, but also suggests an end-to-end deep learning design methodology without any feature engineering. Time series signal can be classified into signals having periodic and non-periodic characteristics in the time domain. This thesis proposes two end-to-end deep learning design methodologies for pattern recognition of periodic and non-periodic signals. The first proposed deep learning design methodology is Deep ECGNet. Deep ECGNet offers a design scheme for an end-to-end deep learning model using periodic characteristics of Electrocardiogram (ECG) signals. ECG, recorded from the electrophysiologic patterns of heart muscle during heartbeat, could be a promising candidate to provide a biomarker to estimate event-based stress level. Conventionally, the beat-to-beat alternations, heart rate variability (HRV), from ECG have been utilized to monitor the mental stress status as well as the mortality of cardiac patients. These HRV parameters have the disadvantage of having a 5-minute measurement period. In this thesis, human's stress states were estimated without special hand-crafted feature engineering using only 10-second interval data with the deep learning model. The design methodology of this model incorporates the periodic characteristics of the ECG signal into the model. The main parameters of 1D CNNs and RNNs reflecting the periodic characteristics of ECG were updated corresponding to the stress states. The experimental results proved that the proposed method yielded better performance than those of the existing HRV parameter extraction methods and spectrogram methods. The second proposed methodology is an automatic end-to-end deep learning design methodology using Bayesian optimization for non-periodic signals. Electroencephalogram (EEG) is elicited from the central nervous system (CNS) to yield genuine emotional states, even at the unconscious level. Due to the low signal-to-noise ratio (SNR) of EEG signals, spectral analysis in frequency domain has been conventionally applied to EEG studies. As a general methodology, EEG signals are filtered into several frequency bands using Fourier or wavelet analyses and these band features are then fed into a classifier. This thesis proposes an end-to-end deep learning automatic design method using optimization techniques without this basic feature engineering. Bayesian optimization is a popular optimization technique for machine learning to optimize model hyperparameters. It is often used in optimization problems to evaluate expensive black box functions. In this thesis, we propose a method to perform whole model hyperparameters and structural optimization by using 1D CNNs and RNNs as basic deep learning models and Bayesian optimization. In this way, this thesis proposes the Deep EEGNet model as a method to discriminate human emotional states from EEG signals. Experimental results proved that the proposed method showed better performance than that of conventional method based on the conventional band power feature method. In conclusion, this thesis has proposed several methodologies for time series pattern recognition problems from the feature engineering-based conventional methods to the end-to-end deep learning design methodologies with only raw time series signals. Experimental results showed that the proposed methodologies can be effectively applied to pattern recognition problems using time series data.시계열 데이터의 패턴 인식 문제는 4차 산업 혁명의 패러다임 전환과 함께 매우 중요한 인공 지능의 한 분야가 되었다. 이에 따라, 지난 몇 년간 이와 관련된 많은 연구들이 이루어져 왔으며, 최근에는 심층 학습망 (deep learning networks) 모델을 이용한 연구들이 주를 이루어 왔다. 시계열 데이터는 비정상, 비선형 그리고 잡음 (nonstationary, nonlinear and noisy) 특성으로 인하여 시계열 데이터의 패턴 인식 수행을 위해선, 데이터의 주요한 특징점을 추출하기 위한 최적화된 모델의 설계가 필수적이다. 본 논문은 대표적인 시계열 데이터인 생체 신호를 사용하여 여러 특징 벡터 추출 방법 (hand-crafted feature engineering methods)을 이용한 패턴 인식 기법에 대하여 논할 뿐만 아니라, 궁극적으로는 특징 벡터 추출 과정이 없는 종단 심층 학습망 설계 방법론에 대한 연구 내용을 담고 있다. 시계열 신호는 시간 축 상에서 크게 주기적 신호와 비주기적 신호로 구분할 수 있는데, 본 연구는 이러한 두 유형의 신호들에 대한 패턴 인식을 위해 두 가지 종단 심층 학습망에 대한 설계 방법론을 제안한다. 첫 번째 제안된 방법론을 이용해 설계된 모델은 신호의 주기적 특성을 이용한 Deep ECGNet이다. 심장 근육의 전기 생리학적 패턴으로부터 기록된 심전도 (Electrocardiogram, ECG)는 이벤트 기반 스트레스 수준을 추정하기 위한 척도 (bio marker)를 제공하는 유효한 데이터가 될 수 있다. 전통적으로 심전도의 심박수 변동성 (Herat Rate Variability, HRV) 매개변수 (parameter)는 심장 질환 환자의 정신적 스트레스 상태 및 사망률을 모니터링하는 데 사용되었다. 하지만, 표준 심박수 변동성 매개 변수는 측정 주기가 5분 이상으로, 측정 시간이 길다는 단점이 있다. 본 논문에서는 심층 학습망 모델을 이용하여 10초 간격의 ECG 데이터만을 이용하여, 추가적인 특징 벡터의 추출 과정 없이 인간의 스트레스 상태를 인식할 수 있음을 보인다. 제안된 설계 기법은 ECG 신호의 주기적 특성을 모델에 반영하였는데, ECG의 은닉 특징 추출기로 사용된 1D CNNs 및 RNNs 모델의 주요 매개 변수에 주기적 특성을 반영함으로써, 한 주기 신호의 스트레스 상태에 따른 주요 특징점을 종단 학습망 내부적으로 추출할 수 있음을 보였다. 실험 결과 제안된 방법이 기존 심박수 변동성 매개변수와 spectrogram 추출 기법 기반의 패턴 인식 방법보다 좋은 성능을 나타내고 있음을 확인할 수 있었다. 두 번째 제안된 방법론은 비 주기적이며 비정상, 비선형 그리고 잡음 특성을 지닌 신호의 패턴인식을 위한 최적 종단 심층 학습망 자동 설계 방법론이다. 뇌파 신호 (Electroencephalogram, EEG)는 중추 신경계 (CNS)에서 발생되어 무의식 상태에서도 본연의 감정 상태를 나타내는데, EEG 신호의 낮은 신호 대 잡음비 (SNR)로 인해 뇌파를 이용한 감정 상태 판정을 위해서 주로 주파수 영역의 스펙트럼 분석이 뇌파 연구에 적용되어 왔다. 통상적으로 뇌파 신호는 푸리에 (Fourier) 또는 웨이블렛 (wavelet) 분석을 사용하여 여러 주파수 대역으로 필터링 된다. 이렇게 추출된 주파수 특징 벡터는 보통 얕은 학습 분류기 (shallow machine learning classifier)의 입력으로 사용되어 패턴 인식을 수행하게 된다. 본 논문에서는 이러한 기본적인 특징 벡터 추출 과정이 없는 베이지안 최적화 (Bayesian optimization) 기법을 이용한 종단 심층 학습망 자동 설계 기법을 제안한다. 베이지안 최적화 기법은 초 매개변수 (hyperparamters)를 최적화하기 위한 기계 학습 분야의 대표적인 최적화 기법인데, 최적화 과정에서 평가 시간이 많이 소요되는 목적 함수 (expensive black box function)를 갖고 있는 최적화 문제에 적합하다. 이러한 베이지안 최적화를 이용하여 기본적인 학습 모델인 1D CNNs 및 RNNs의 전체 모델의 초 매개변수 및 구조적 최적화를 수행하는 방법을 제안하였으며, 제안된 방법론을 바탕으로 Deep EEGNet이라는 인간의 감정상태를 판별할 수 있는 모델을 제안하였다. 여러 실험을 통해 제안된 모델이 기존의 주파수 특징 벡터 (band power feature) 추출 기법 기반의 전통적인 감정 패턴 인식 방법보다 좋은 성능을 나타내고 있음을 확인할 수 있었다. 결론적으로 본 논문은 시계열 데이터를 이용한 패턴 인식문제를 여러 특징 벡터 추출 기법 기반의 전통적인 방법을 통해 설계하는 방법부터, 추가적인 특징 벡터 추출 과정 없이 원본 데이터만을 이용하여 종단 심층 학습망을 설계하는 방법까지 제안하였다. 또한, 다양한 실험을 통해 제안된 방법론이 시계열 신호 데이터를 이용한 패턴 인식 문제에 효과적으로 적용될 수 있음을 보였다.Chapter 1 Introduction 1 1.1 Pattern Recognition in Time Series 1 1.2 Major Problems in Conventional Approaches 7 1.3 The Proposed Approach and its Contribution 8 1.4 Thesis Organization 10 Chapter 2 Related Works 12 2.1 Pattern Recognition in Time Series using Conventional Methods 12 2.1.1 Time Domain Features 12 2.1.2 Frequency Domain Features 14 2.1.3 Signal Processing based on Multi-variate Empirical Mode Decomposition (MEMD) 15 2.1.4 Statistical Time Series Model (ARIMA) 18 2.2 Fundamental Deep Learning Algorithms 20 2.2.1 Convolutional Neural Networks (CNNs) 20 2.2.2 Recurrent Neural Networks (RNNs) 22 2.3 Hyper Parameters and Structural Optimization Techniques 24 2.3.1 Grid and Random Search Algorithms 24 2.3.2 Bayesian Optimization 25 2.3.3 Neural Architecture Search 28 2.4 Research Trends related to Time Series Data 29 2.4.1 Generative Model of Raw Audio Waveform 30 Chapter 3 Preliminary Researches: Patten Recognition in Time Series using Various Feature Extraction Methods 31 3.1 Conventional Methods using Time and Frequency Features: Motor Imagery Brain Response Classification 31 3.1.1 Introduction 31 3.1.2 Methods 32 3.1.3 Ensemble Classification Method (Stacking & AdaBoost) 32 3.1.4 Sensitivity Analysis 33 3.1.5 Classification Results 36 3.2 Statistical Feature Extraction Methods: ARIMA Model Based Feature Extraction Methodology 38 3.2.1 Introduction 38 3.2.2 ARIMA Model 38 3.2.3 Signal Processing 39 3.2.4 ARIMA Model Conformance Test 40 3.2.5 Experimental Results 40 3.2.6 Summary 43 3.3 Application on Specific Time Series Data: Human Stress States Recognition using Ultra-Short-Term ECG Spectral Feature 44 3.3.1 Introduction 44 3.3.2 Experiments 45 3.3.3 Classification Methods 49 3.3.4 Experimental Results 49 3.3.5 Summary 56 Chapter 4 Master Framework for Pattern Recognition in Time Series 57 4.1 The Concept of the Proposed Framework for Pattern Recognition in Time Series 57 4.1.1 Optimal Basic Deep Learning Models for the Proposed Framework 57 4.2 Two Categories for Pattern Recognition in Time Series Data 59 4.2.1 The Proposed Deep Learning Framework for Periodic Time Series Signals 59 4.2.2 The Proposed Deep Learning Framework for Non-periodic Time Series Signals 61 4.3 Expanded Models of the Proposed Master Framework for Pattern Recogntion in Time Series 63 Chapter 5 Deep Learning Model Design Methodology for Periodic Signals using Prior Knowledge: Deep ECGNet 65 5.1 Introduction 65 5.2 Materials and Methods 67 5.2.1 Subjects and Data Acquisition 67 5.2.2 Conventional ECG Analysis Methods 72 5.2.3 The Initial Setup of the Deep Learning Architecture 75 5.2.4 The Deep ECGNet 78 5.3 Experimental Results 83 5.4 Summary 98 Chapter 6 Deep Learning Model Design Methodology for Non-periodic Time Series Signals using Optimization Techniques: Deep EEGNet 100 6.1 Introduction 100 6.2 Materials and Methods 104 6.2.1 Subjects and Data Acquisition 104 6.2.2 Conventional EEG Analysis Methods 106 6.2.3 Basic Deep Learning Units and Optimization Technique 108 6.2.4 Optimization for Deep EEGNet 109 6.2.5 Deep EEGNet Architectures using the EEG Channel Grouping Scheme 111 6.3 Experimental Results 113 6.4 Summary 124 Chapter 7 Concluding Remarks 126 7.1 Summary of Thesis and Contributions 126 7.2 Limitations of the Proposed Methods 128 7.3 Suggestions for Future Works 129 Bibliography 131 초 록 139Docto

SNU Open Repository and Archive

Mixed Order Hyper-Networks for Function Approximation and Optimisation

Author: Swingler Kevin
Publication venue: University of Stirling
Publication date: 01/05/2016
Field of study

Many systems take inputs, which can be measured and sometimes controlled, and outputs, which can also be measured and which depend on the inputs. Taking numerous measurements from such systems produces data, which may be used to either model the system with the goal of predicting the output associated with a given input (function approximation, or regression) or of finding the input settings required to produce a desired output (optimisation, or search). Approximating or optimising a function is central to the field of computational intelligence. There are many existing methods for performing regression and optimisation based on samples of data but they all have limitations. Multi layer perceptrons (MLPs) are universal approximators, but they suffer from the black box problem, which means their structure and the function they implement is opaque to the user. They also suffer from a propensity to become trapped in local minima or large plateaux in the error function during learning. A regression method with a structure that allows models to be compared, human knowledge to be extracted, optimisation searches to be guided and model complexity to be controlled is desirable. This thesis presents such as method. This thesis presents a single framework for both regression and optimisation: the mixed order hyper network (MOHN). A MOHN implements a function f:{-1,1}^n ->R to arbitrary precision. The structure of a MOHN makes the ways in which input variables interact to determine the function output explicit, which allows human insights and complexity control that are very difficult in neural networks with hidden units. The explicit structure representation also allows efficient algorithms for searching for an input pattern that leads to a desired output. A number of learning rules for estimating the weights based on a sample of data are presented along with a heuristic method for choosing which connections to include in a model. Several methods for searching a MOHN for inputs that lead to a desired output are compared. Experiments compare a MOHN to an MLP on regression tasks. The MOHN is found to achieve a comparable level of accuracy to an MLP but suffers less from local minima in the error function and shows less variance across multiple training trials. It is also easier to interpret and combine from an ensemble. The trade-off between the fit of a model to its training data and that to an independent set of test data is shown to be easier to control in a MOHN than an MLP. A MOHN is also compared to a number of existing optimisation methods including those using estimation of distribution algorithms, genetic algorithms and simulated annealing. The MOHN is able to find optimal solutions in far fewer function evaluations than these methods on tasks selected from the literature

Stirling Online Research Repository