22 research outputs found

    ๋™์  ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ ํ•™์Šต์„ ์œ„ํ•œ ์‹ฌ์ธต ํ•˜์ดํผ๋„คํŠธ์›Œํฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 2. ์žฅ๋ณ‘ํƒ.Recent advancements in information communication technology has led the explosive increase of data. Dissimilar to traditional data which are structured and unimodal, in particular, the characteristics of recent data generated from dynamic environments are summarized as high-dimensionality, multimodality, and structurelessness as well as huge-scale size. The learning from non-stationary multimodal data is essential for solving many difficult problems in artificial intelligence. However, despite many successful reports, existing machine learning methods have mainly focused on solving practical problems represented by large-scaled but static databases, such as image classification, tagging, and retrieval. Hypernetworks are a probabilistic graphical model representing empirical distribution, using a hypergraph structure that is a large collection of many hyperedges encoding the associations among variables. This representation allows the model to be suitable for characterizing the complex relationships between features with a population of building blocks. However, since a hypernetwork is represented by a huge combinatorial feature space, the model requires a large number of hyperedges for handling the multimodal large-scale data and thus faces the scalability problem. In this dissertation, we propose a deep architecture of hypernetworks for dealing with the scalability issue for learning from multimodal data with non-stationary properties such as videos, i.e., deep hypernetworks. Deep hypernetworks handle the issues through the abstraction at multiple levels using a hierarchy of multiple hypergraphs. We use a stochastic method based on Monte-Carlo simulation, a graph MC, for efficiently constructing hypergraphs representing the empirical distribution of the observed data. The structure of a deep hypernetwork continuously changes as the learning proceeds, and this flexibility is contrasted to other deep learning models. The proposed model incrementally learns from the data, thus handling the nonstationary properties such as concept drift. The abstract representations in the learned models play roles of multimodal knowledge on data, which are used for the content-aware crossmodal transformation including vision-language conversion. We view the vision-language conversion as a machine translation, and thus formulate the vision-language translation in terms of the statistical machine translation. Since the knowledge on the video stories are used for translation, we call this story-aware vision-language translation. We evaluate deep hypernetworks on large-scale vision-language multimodal data including benmarking datasets and cartoon video series. The experimental results show the deep hypernetworks effectively represent visual-linguistic information abstracted at multiple levels of the data contents as well as the associations between vision and language. We explain how the introduction of a hierarchy deals with the scalability and non-stationary properties. In addition, we present the story-aware vision-language translation on cartoon videos by generating scene images from sentences and descriptive subtitles from scene images. Furthermore, we discuss the meaning of our model for lifelong learning and the improvement direction for achieving human-level artificial intelligence.1 Introduction 1.1 Background and Motivation 1.2 Problems to be Addressed 1.3 The Proposed Approach and its Contribution 1.4 Organization of the Dissertation 2 RelatedWork 2.1 Multimodal Leanring 2.2 Models for Learning from Multimodal Data 2.2.1 Topic Model-Based Multimodal Leanring 2.2.2 Deep Network-based Multimodal Leanring 2.3 Higher-Order Graphical Models 2.3.1 Hypernetwork Models 2.3.2 Bayesian Evolutionary Learning of Hypernetworks 3 Multimodal Hypernetworks for Text-to-Image Retrievals 3.1 Overview 3.2 Hypernetworks for Multimodal Associations 3.2.1 Multimodal Hypernetworks 3.2.2 Incremental Learning of Multimodal Hypernetworks 3.3 Text-to-Image Crossmodal Inference 3.3.1 Representatation of Textual-Visual Data 3.3.2 Text-to-Image Query Expansion 3.4 Text-to-Image Retrieval via Multimodal Hypernetworks 3.4.1 Data and Experimental Settings 3.4.2 Text-to-Image Retrieval Performance 3.4.3 Incremental Learning for Text-to-Image Retrieval 3.5 Summary 4 Deep Hypernetworks for Multimodal Cocnept Learning from Cartoon Videos 4.1 Overview 4.2 Visual-Linguistic Concept Representation of Catoon Videos 4.3 Deep Hypernetworks for Modeling Visual-Linguistic Concepts 4.3.1 Sparse Population Coding 4.3.2 Deep Hypernetworks for Concept Hierarchies 4.3.3 Implication of Deep Hypernetworks on Cognitive Modeling 4.4 Learning of Deep Hypernetworks 4.4.1 Problem Space of Deep Hypernetworks 4.4.2 Graph Monte-Carlo Simulation 4.4.3 Learning of Concept Layers 4.4.4 Incremental Concept Construction 4.5 Incremental Concept Construction from Catoon Videos 4.5.1 Data Description and Parameter Setup 4.5.2 Concept Representation and Development 4.5.3 Character Classification via Concept Learning 4.5.4 Vision-Language Conversion via Concept Learning 4.6 Summary 5 Story-awareVision-LanguageTranslation usingDeepConcept Hiearachies 5.1 Overview 5.2 Vision-Language Conversion as a Machine Translation 5.2.1 Statistical Machine Translation 5.2.2 Vision-Language Translation 5.3 Story-aware Vision-Language Translation using Deep Concept Hierarchies 5.3.1 Story-aware Vision-Language Translation 5.3.2 Vision-to-Language Translation 5.3.3 Language-to-Vision Translation 5.4 Story-aware Vision-Language Translation on Catoon Videos 5.4.1 Data and Experimental Setting 5.4.2 Scene-to-Sentence Generation 5.4.3 Sentence-to-Scene Generation 5.4.4 Visual-Linguistic Story Summarization of Cartoon Videos 5.5 Summary 6 Concluding Remarks 6.1 Summary of the Dissertation 6.2 Directions for Further Research Bibliography ํ•œ๊ธ€์ดˆ๋กDocto

    A Survey on Few-Shot Class-Incremental Learning

    Get PDF
    Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup can easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental learning, focusing on introducing FSCIL from two perspectives, while reviewing over 30 theoretical research studies and more than 20 applied research studies. From the theoretical perspective, we provide a novel categorization approach that divides the field into five subcategories, including traditional machine learning methods, meta learning-based methods, feature and feature space-based methods, replay-based methods, and dynamic network structure-based methods. We also evaluate the performance of recent theoretical research on benchmark datasets of FSCIL. From the application perspective, FSCIL has achieved impressive achievements in various fields of computer vision such as image classification, object detection, and image segmentation, as well as in natural language processing and graph. We summarize the important applications. Finally, we point out potential future research directions, including applications, problem setups, and theory development. Overall, this paper offers a comprehensive analysis of the latest advances in FSCIL from a methodological, performance, and application perspective

    A Survey on Few-Shot Class-Incremental Learning

    Full text link
    Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental learning, focusing on introducing FSCIL from two perspectives, while reviewing over 30 theoretical research studies and more than 20 applied research studies. From the theoretical perspective, we provide a novel categorization approach that divides the field into five subcategories, including traditional machine learning methods, meta-learning based methods, feature and feature space-based methods, replay-based methods, and dynamic network structure-based methods. We also evaluate the performance of recent theoretical research on benchmark datasets of FSCIL. From the application perspective, FSCIL has achieved impressive achievements in various fields of computer vision such as image classification, object detection, and image segmentation, as well as in natural language processing and graph. We summarize the important applications. Finally, we point out potential future research directions, including applications, problem setups, and theory development. Overall, this paper offers a comprehensive analysis of the latest advances in FSCIL from a methodological, performance, and application perspective

    ๊นŠ์€ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ์ผ์ƒ ํ–‰๋™์— ๋Œ€ํ•œ ํ‰์ƒ ํ•™์Šต: ๋“€์–ผ ๋ฉ”๋ชจ๋ฆฌ ์•„ํ‚คํ…์ณ์™€ ์ ์ง„์  ๋ชจ๋ฉ˜ํŠธ ๋งค์นญ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2018. 8. ์žฅ๋ณ‘ํƒ.Learning from human behaviors in the real world is imperative for building human-aware intelligent systems. We attempt to train a personalized context recognizer continuously in a wearable device by rapidly adapting deep neural networks from sensor data streams of user behaviors. However, training deep neural networks from the data stream is challenging because learning new data through neural networks often results in loss of previously acquired information, referred to as catastrophic forgetting. This catastrophic forgetting problem has been studied for nearly three decades but has not been solved yet because the mechanism of deep learning has been not understood enough. We introduce two methods to deal with the catastrophic forgetting problem in deep neural networks. The first method is motivated by the concept of complementary learning systems (CLS) theory - contending that effective learning of the data stream in a lifetime requires complementary systems that comprise the neocortex and hippocampus in the human brain. We propose a dual memory architecture (DMA), which trains two learning structures: one gradually acquires structured knowledge representations, and the other rapidly learns the specifics of individual experiences. The ability of online learning is achieved by new techniques, such as weight transfer for the new deep module and hypernetworks for fast adaptation. The second method is incremental moment matching (IMM) algorithm. IMM incrementally matches the moment of the posterior distribution of neural networks, which is trained for the previous and the current task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. To provide an insight into the success of two proposed lifelong learning methods, we introduce an insight by introducing two online learning methods of sum-product network, which is a kind of deep probabilistic graphical model. We discuss online learning approaches which are valid in probabilistic models and explain how these approaches can be extended to the lifelong learning algorithms of deep neural networks. We evaluate proposed DMA and IMM on two types of datasets: the various artificial benchmarks devised for evaluating the performance of lifelong learning and the lifelog dataset collected through the Google Glass for 46 days. The experimental results show that our methods outperform comparative models in various experimental settings and that our trials for overcoming catastrophic forgetting are valuable and promising.1 Introduction 1 1.1 Wearable Devices and Lifelog Dataset . . . . . . . . . . . . . . . 1 1.2 Lifelong Learning and Catastrophic Forgetting . . . . . . . . . . 2 1.3 Approach and Contribution . . . . . . . . . . . . . . . . . . . . . 3 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . 6 2 Related Works 8 2.1 Lifelong Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Application-driven Lifelong Learning . . . . . . . . . . . . . . . . 9 2.3 Classical Approach for Preventing Catastrophic Forgetting . . . . 9 2.4 Learning Parameter Distribution for for Preventing Catastrophic Forgetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Sequential Bayesian . . . . . . . . . . . . . . . . . . . . . 12 2.4.2 Approach to Simulating Parameter Distribution . . . . . 14 2.5 Learning Data Distribution for Preventing Catstrophic Forgetting 15 3 Preliminary Study: Online Learning of Sum-Product Networks 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Sum-Product Networks . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Representation of Sum-Product Networks . . . . . . . . . 19 3.2.2 Structure Learning of Sum-Product Networks . . . . . . . 22 3.3 Online Incremental Structure Learning of Sum-Product Networks 23 3.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Non-Parametric Bayesian Sum-Product Networks . . . . . . . . . 29 3.4.1 Model 1: A Prior Distribution for SPN Trees . . . . . . . 29 3.4.2 Model 2: A Prior Distribution for a Class of dag-SPNs . . 34 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.1 History of Online Learning of Sum-Product Networks . . 38 3.5.2 Toward Lifelong Learning of Deep Neural Networks . . . 38 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 Structure Learning for Lifelong Learning: Dual Memory Architecture 42 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Complementary Learning Systems Theory . . . . . . . . . . . . . 44 4.3 Dual Memory Architectures . . . . . . . . . . . . . . . . . . . . . 46 4.4 Online Learning of Multiplicative-Gaussian Hypernetworks . . . 50 4.4.1 Multiplicative-Gaussian Hypernetworks . . . . . . . . . . 50 4.4.2 Evolutionary Structure Learning . . . . . . . . . . . . . . 52 4.4.3 Online Learning on Incremental Features . . . . . . . . . 53 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5.1 Non-stationary Image Data Stream . . . . . . . . . . . . . 56 4.5.2 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.1 Parameter-Decomposability in Deep Learning . . . . . . . 65 4.6.2 Online Bayesian Optimization . . . . . . . . . . . . . . . . 65 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 Sequential Bayesian for Lifelong Learning: Incremental Moment Matching 68 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 Incremental Moment Matching . . . . . . . . . . . . . . . . . . . 69 5.2.1 Mean-based Incremental Moment Matching (mean-IMM) 70 5.2.2 Mode-based Incremental Moment Matching (mode-IMM) 71 5.3 Transfer Techniques for Incremental Moment Matching . . . . . . 74 5.3.1 Weight-Transfer . . . . . . . . . . . . . . . . . . . . . . . 74 5.3.2 L2-transfer . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.3 Drop-transfer . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.4 IMM Procedure . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.1 Disjoint MNIST Experiment . . . . . . . . . . . . . . . . 80 5.4.2 Shuffled MNIST Experiment . . . . . . . . . . . . . . . . 83 5.4.3 ImageNet to CUB Dataset . . . . . . . . . . . . . . . . . 85 5.4.4 Lifelog Dataset . . . . . . . . . . . . . . . . . . . . . . . . 88 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.5.1 A Shift of Optimal Hyperparameter via Space Smoothing 89 5.5.2 Bayesian Approach on lifelong learning. . . . . . . . . . . 90 5.5.3 Balancing the Information of an Old and a New Task. . . 90 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6 Concluding Remarks 92 6.1 Summary of Methods and Contributions . . . . . . . . . . . . . . 92 6.2 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 93 ์ดˆ๋ก 109Docto

    ์ž ์žฌ ์ž„๋ฒ ๋”ฉ์„ ํ†ตํ•œ ์‹œ๊ฐ์  ์Šคํ† ๋ฆฌ๋กœ๋ถ€ํ„ฐ์˜ ์„œ์‚ฌ ํ…์ŠคํŠธ ์ƒ์„ฑ๊ธฐ ํ•™์Šต

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ์žฅ๋ณ‘ํƒ.The ability to understand the story is essential to make humans unique from other primates as well as animals. The capability of story understanding is crucial for AI agents to live with people in everyday life and understand their context. However, most research on story AI focuses on automated story generation based on closed worlds designed manually, which are widely used for computation authoring. Machine learning techniques on story corpora face similar problems of natural language processing such as omitting details and commonsense knowledge. Since the remarkable success of deep learning on computer vision field, increasing our interest in research on bridging between vision and language, vision-grounded story data will potentially improve the performance of story understanding and narrative text generation. Let us assume that AI agents lie in the environment in which the sensing information is input by the camera. Those agents observe the surroundings, translate them into the story in natural language, and predict the following event or multiple ones sequentially. This dissertation study on the related problems: learning stories or generating the narrative text from image streams or videos. The first problem is to generate a narrative text from a sequence of ordered images. As a solution, we introduce a GLAC Net (Global-local Attention Cascading Network). It translates from image sequences to narrative paragraphs in text as a encoder-decoder framework with sequence-to-sequence setting. It has convolutional neural networks for extracting information from images, and recurrent neural networks for text generation. We introduce visual cue encoders with stacked bidirectional LSTMs, and all of the outputs of each layer are aggregated as contextualized image vectors to extract visual clues. The coherency of the generated text is further improved by conveying (cascading) the information of the previous sentence to the next sentence serially in the decoders. We evaluate the performance of it on the Visual storytelling (VIST) dataset. It outperforms other state-of-the-art results and shows the best scores in total score and all of 6 aspects in the visual storytelling challenge with evaluation of human judges. The second is to predict the following events or narrative texts with the former parts of stories. It should be possible to predict at any step with an arbitrary length. We propose recurrent event retrieval models as a solution. They train a context accumulation function and two embedding functions, where make close the distance between the cumulative context at current time and the next probable events on a latent space. They update the cumulative context with a new event as a input using bilinear operations, and we can find the next event candidates with the updated cumulative context. We evaluate them for Story Cloze Test, they show competitive performance and the best in open-ended generation setting. Also, it demonstrates the working examples in an interactive setting. The third deals with the study on composite representation learning for semantics and order for video stories. We embed each episode as a trajectory-like sequence of events on the latent space, and propose a ViStoryNet to regenerate video stories with them (tasks of story completion). We convert event sentences to thought vectors, and train functions to make successive event embed close each other to form episodes as trajectories. Bi-directional LSTMs are trained as sequence models, and decoders to generate event sentences with GRUs. We test them experimentally with PororoQA dataset, and observe that most of episodes show the form of trajectories. We use them to complete the blocked part of stories, and they show not perfect but overall similar result. Those results above can be applied to AI agents in the living area sensing with their cameras, explain the situation as stories, infer some unobserved parts, and predict the future story.์Šคํ† ๋ฆฌ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋Šฅ๋ ฅ์€ ๋™๋ฌผ๋“ค ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค๋ฅธ ์œ ์ธ์›๊ณผ ์ธ๋ฅ˜๋ฅผ ๊ตฌ๋ณ„์ง“๋Š” ์ค‘์š”ํ•œ ๋Šฅ๋ ฅ์ด๋‹ค. ์ธ๊ณต์ง€๋Šฅ์ด ์ผ์ƒ์ƒํ™œ ์†์—์„œ ์‚ฌ๋žŒ๋“ค๊ณผ ํ•จ๊ป˜ ์ง€๋‚ด๋ฉด์„œ ๊ทธ๋“ค์˜ ์ƒํ™œ ์† ๋งฅ๋ฝ์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์Šคํ† ๋ฆฌ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋Šฅ๋ ฅ์ด ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ, ๊ธฐ์กด์˜ ์Šคํ† ๋ฆฌ์— ๊ด€ํ•œ ์—ฐ๊ตฌ๋Š” ์–ธ์–ด์ฒ˜๋ฆฌ์˜ ์–ด๋ ค์›€์œผ๋กœ ์ธํ•ด ์‚ฌ์ „์— ์ •์˜๋œ ์„ธ๊ณ„ ๋ชจ๋ธ ํ•˜์—์„œ ์ข‹์€ ํ’ˆ์งˆ์˜ ์ €์ž‘๋ฌผ์„ ์ƒ์„ฑํ•˜๋ ค๋Š” ๊ธฐ์ˆ ์ด ์ฃผ๋กœ ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค. ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์Šคํ† ๋ฆฌ๋ฅผ ๋‹ค๋ฃจ๋ ค๋Š” ์‹œ๋„๋“ค์€ ๋Œ€์ฒด๋กœ ์ž์—ฐ์–ด๋กœ ํ‘œํ˜„๋œ ๋ฐ์ดํ„ฐ์— ๊ธฐ๋ฐ˜ํ•  ์ˆ˜ ๋ฐ–์— ์—†์–ด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์—์„œ ๊ฒช๋Š” ๋ฌธ์ œ๋“ค์„ ๋™์ผํ•˜๊ฒŒ ๊ฒช๋Š”๋‹ค. ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์‹œ๊ฐ์  ์ •๋ณด๊ฐ€ ํ•จ๊ป˜ ์—ฐ๋™๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋‹ค. ์ตœ๊ทผ ๋”ฅ๋Ÿฌ๋‹์˜ ๋ˆˆ๋ถ€์‹  ๋ฐœ์ „์— ํž˜์ž…์–ด ์‹œ๊ฐ๊ณผ ์–ธ์–ด ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ๋‹ค๋ฃจ๋Š” ์—ฐ๊ตฌ๋“ค์ด ๋Š˜์–ด๋‚˜๊ณ  ์žˆ๋‹ค. ์—ฐ๊ตฌ์˜ ๋น„์ „์œผ๋กœ์„œ, ์ธ๊ณต์ง€๋Šฅ ์—์ด์ „ํŠธ๊ฐ€ ์ฃผ๋ณ€ ์ •๋ณด๋ฅผ ์นด๋ฉ”๋ผ๋กœ ์ž…๋ ฅ๋ฐ›๋Š” ํ™˜๊ฒฝ ์†์— ๋†“์—ฌ์žˆ๋Š” ์ƒํ™ฉ์„ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด ์•ˆ์—์„œ ์ธ๊ณต์ง€๋Šฅ ์—์ด์ „ํŠธ๋Š” ์ฃผ๋ณ€์„ ๊ด€์ฐฐํ•˜๋ฉด์„œ ๊ทธ์— ๋Œ€ํ•œ ์Šคํ† ๋ฆฌ๋ฅผ ์ž์—ฐ์–ด ํ˜•ํƒœ๋กœ ์ƒ์„ฑํ•˜๊ณ , ์ƒ์„ฑ๋œ ์Šคํ† ๋ฆฌ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ์— ์ผ์–ด๋‚  ์Šคํ† ๋ฆฌ๋ฅผ ํ•œ ๋‹จ๊ณ„์—์„œ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๊นŒ์ง€ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ์ง„ ๋ฐ ๋น„๋””์˜ค ์†์— ๋‚˜ํƒ€๋‚˜๋Š” ์Šคํ† ๋ฆฌ(visual story)๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•, ๋‚ด๋Ÿฌํ‹ฐ๋ธŒ ํ…์ŠคํŠธ๋กœ์˜ ๋ณ€ํ™˜, ๊ฐ€๋ ค์ง„ ์‚ฌ๊ฑด ๋ฐ ๋‹ค์Œ ์‚ฌ๊ฑด์„ ์ถ”๋ก ํ•˜๋Š” ์—ฐ๊ตฌ๋“ค์„ ๋‹ค๋ฃฌ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์—ฌ๋Ÿฌ ์žฅ์˜ ์‚ฌ์ง„์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์Šคํ† ๋ฆฌ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฌธ์ œ(๋น„์ฃผ์–ผ ์Šคํ† ๋ฆฌํ…”๋ง)๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ์ด ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•ด ๊ธ€๋ž™๋„ท(GLAC Net)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋จผ์ €, ์‚ฌ์ง„๋“ค๋กœ๋ถ€ํ„ฐ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง, ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ˆœํ™˜์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•œ๋‹ค. ์‹œํ€€์Šค-์‹œํ€€์Šค ๊ตฌ์กฐ์˜ ์ธ์ฝ”๋”๋กœ์„œ, ์ „์ฒด์ ์ธ ์ด์•ผ๊ธฐ ๊ตฌ์กฐ์˜ ํ‘œํ˜„์„ ์œ„ํ•ด ๋‹ค๊ณ„์ธต ์–‘๋ฐฉํ–ฅ ์ˆœํ™˜์‹ ๊ฒฝ๋ง์„ ๋ฐฐ์น˜ํ•˜๋˜ ๊ฐ ์‚ฌ์ง„ ๋ณ„ ์ •๋ณด๋ฅผ ํ•จ๊ป˜ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ „์—ญ์ -๊ตญ๋ถ€์  ์ฃผ์˜์ง‘์ค‘ ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์—ฌ๋Ÿฌ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๋Š” ๋™์•ˆ ๋งฅ๋ฝ์ •๋ณด์™€ ๊ตญ๋ถ€์ •๋ณด๋ฅผ ์žƒ์ง€ ์•Š๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ์•ž์„  ๋ฌธ์žฅ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์œ„ ์ œ์•ˆ ๋ฐฉ๋ฒ•์œผ๋กœ ๋น„์ŠคํŠธ(VIST) ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ํ•™์Šตํ•˜์˜€๊ณ , ์ œ 1 ํšŒ ์‹œ๊ฐ์  ์Šคํ† ๋ฆฌํ…”๋ง ๋Œ€ํšŒ(visual storytelling challenge)์—์„œ ์‚ฌ๋žŒ ํ‰๊ฐ€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ „์ฒด ์ ์ˆ˜ ๋ฐ 6 ํ•ญ๋ชฉ ๋ณ„๋กœ ๋ชจ๋‘ ์ตœ๊ณ ์ ์„ ๋ฐ›์•˜๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ์Šคํ† ๋ฆฌ์˜ ์ผ๋ถ€๊ฐ€ ๋ฌธ์žฅ๋“ค๋กœ ์ฃผ์–ด์กŒ์„ ๋•Œ ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ ๋ฌธ์žฅ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ์ž„์˜์˜ ๊ธธ์ด์˜ ์Šคํ† ๋ฆฌ์— ๋Œ€ํ•ด ์ž„์˜์˜ ์œ„์น˜์—์„œ ์˜ˆ์ธก์ด ๊ฐ€๋Šฅํ•ด์•ผ ํ•˜๊ณ , ์˜ˆ์ธกํ•˜๋ ค๋Š” ๋‹จ๊ณ„ ์ˆ˜์— ๋ฌด๊ด€ํ•˜๊ฒŒ ์ž‘๋™ํ•ด์•ผ ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์ˆœํ™˜ ์‚ฌ๊ฑด ์ธ์ถœ ๋ชจ๋ธ(Recurrent Event Retrieval Models)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์€๋‹‰ ๊ณต๊ฐ„ ์ƒ์—์„œ ํ˜„์žฌ๊นŒ์ง€ ๋ˆ„์ ๋œ ๋งฅ๋ฝ๊ณผ ๋‹ค์Œ์— ๋ฐœ์ƒํ•  ์œ ๋ ฅ ์‚ฌ๊ฑด ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ฐ€๊น๊ฒŒ ํ•˜๋„๋ก ๋งฅ๋ฝ๋ˆ„์ ํ•จ์ˆ˜์™€ ๋‘ ๊ฐœ์˜ ์ž„๋ฒ ๋”ฉ ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ด๋ฏธ ์ž…๋ ฅ๋˜์–ด ์žˆ๋˜ ์Šคํ† ๋ฆฌ์— ์ƒˆ๋กœ์šด ์‚ฌ๊ฑด์ด ์ž…๋ ฅ๋˜๋ฉด ์Œ์„ ํ˜•์  ์—ฐ์‚ฐ์„ ํ†ตํ•ด ๊ธฐ์กด์˜ ๋งฅ๋ฝ์„ ๊ฐœ์„ ํ•˜์—ฌ ๋‹ค์Œ์— ๋ฐœ์ƒํ•  ์œ ๋ ฅํ•œ ์‚ฌ๊ฑด๋“ค์„ ์ฐพ๋Š”๋‹ค. ์ด ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฝ์Šคํ† ๋ฆฌ(ROCStories) ๋ฐ์ดํ„ฐ์ง‘ํ•ฉ์„ ํ•™์Šตํ•˜์˜€๊ณ , ์Šคํ† ๋ฆฌ ํด๋กœ์ฆˆ ํ…Œ์ŠคํŠธ(Story Cloze Test)๋ฅผ ํ†ตํ•ด ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ํŠนํžˆ ์ž„์˜์˜ ๊ธธ์ด๋กœ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ฒ• ์ค‘์— ์ตœ๊ณ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ, ๋น„๋””์˜ค ์Šคํ† ๋ฆฌ์—์„œ ์‚ฌ๊ฑด ์‹œํ€€์Šค ์ค‘ ์ผ๋ถ€๊ฐ€ ๊ฐ€๋ ค์กŒ์„ ๋•Œ ์ด๋ฅผ ๋ณต๊ตฌํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ํŠนํžˆ, ๊ฐ ์‚ฌ๊ฑด์˜ ์˜๋ฏธ ์ •๋ณด์™€ ์ˆœ์„œ๋ฅผ ๋ชจ๋ธ์˜ ํ‘œํ˜„ ํ•™์Šต์— ๋ฐ˜์˜ํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์€๋‹‰ ๊ณต๊ฐ„ ์ƒ์— ๊ฐ ์—ํ”ผ์†Œ๋“œ๋“ค์„ ๊ถค์  ํ˜•ํƒœ๋กœ ์ž„๋ฒ ๋”ฉํ•˜๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์Šคํ† ๋ฆฌ๋ฅผ ์žฌ์ƒ์„ฑ์„ ํ•˜์—ฌ ์Šคํ† ๋ฆฌ ์™„์„ฑ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์ธ ๋น„์Šคํ† ๋ฆฌ๋„ท(ViStoryNet)์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ฐ ์—ํ”ผ์†Œ๋“œ๋ฅผ ๊ถค์  ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ๊ฑด ๋ฌธ์žฅ์„ ์‚ฌ๊ณ ๋ฒกํ„ฐ(thought vector)๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ์—ฐ์† ์ด๋ฒคํŠธ ์ˆœ์„œ ์ž„๋ฒ ๋”ฉ์„ ํ†ตํ•ด ์ „ํ›„ ์‚ฌ๊ฑด๋“ค์ด ์„œ๋กœ ๊ฐ€๊น๊ฒŒ ์ž„๋ฒ ๋”ฉ๋˜๋„๋ก ํ•˜์—ฌ ํ•˜๋‚˜์˜ ์—ํ”ผ์†Œ๋“œ๊ฐ€ ๊ถค์ ์˜ ๋ชจ์–‘์„ ๊ฐ€์ง€๋„๋ก ํ•™์Šตํ•˜์˜€๋‹ค. ๋ฝ€๋กœ๋กœQA ๋ฐ์ดํ„ฐ์ง‘ํ•ฉ์„ ํ†ตํ•ด ์‹คํ—˜์ ์œผ๋กœ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค. ์ž„๋ฒ ๋”ฉ ๋œ ์—ํ”ผ์†Œ๋“œ๋“ค์€ ๊ถค์  ํ˜•ํƒœ๋กœ ์ž˜ ๋‚˜ํƒ€๋‚ฌ์œผ๋ฉฐ, ์—ํ”ผ์†Œ๋“œ๋“ค์„ ์žฌ์ƒ์„ฑ ํ•ด๋ณธ ๊ฒฐ๊ณผ ์ „์ฒด์ ์ธ ์ธก๋ฉด์—์„œ ์œ ์‚ฌํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค. ์œ„ ๊ฒฐ๊ณผ๋ฌผ๋“ค์€ ์นด๋ฉ”๋ผ๋กœ ์ž…๋ ฅ๋˜๋Š” ์ฃผ๋ณ€ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์Šคํ† ๋ฆฌ๋ฅผ ์ดํ•ดํ•˜๊ณ  ์ผ๋ถ€ ๊ด€์ธก๋˜์ง€ ์•Š์€ ๋ถ€๋ถ„์„ ์ถ”๋ก ํ•˜๋ฉฐ, ํ–ฅํ›„ ์Šคํ† ๋ฆฌ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€์‘๋œ๋‹ค.Abstract i Chapter 1 Introduction 1 1.1 Story of Everyday lives in Videos and Story Understanding . . . 1 1.2 Problems to be addressed . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Approach and Contribution . . . . . . . . . . . . . . . . . . . . . 6 1.4 Organization of Dissertation . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2 Background and Related Work 10 2.1 Why We Study Stories . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Latent Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Order Embedding and Ordinal Embedding . . . . . . . . . . . . 14 2.4 Comparison to Story Understanding . . . . . . . . . . . . . . . . 15 2.5 Story Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.1 Abstract Event Representations . . . . . . . . . . . . . . . 17 2.5.2 Seq-to-seq Attentional Models . . . . . . . . . . . . . . . . 18 2.5.3 Story Generation from Images . . . . . . . . . . . . . . . 19 Chapter 3 Visual Storytelling via Global-local Attention Cascading Networks 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Evaluation for Visual Storytelling . . . . . . . . . . . . . . . . . . 26 3.3 Global-local Attention Cascading Networks (GLAC Net) . . . . . 27 3.3.1 Encoder: Contextualized Image Vector Extractor . . . . . 28 3.3.2 Decoder: Story Generator with Attention and Cascading Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.1 VIST Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.2 Experiment Settings . . . . . . . . . . . . . . . . . . . . . 33 3.4.3 Network Training Details . . . . . . . . . . . . . . . . . . 36 3.4.4 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . 38 3.4.5 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . 38 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 4 Common Space Learning on Cumulative Contexts and the Next Events: Recurrent Event Retrieval Models 44 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Problems of Context Accumulation . . . . . . . . . . . . . . . . . 45 4.3 Recurrent Event Retrieval Models for Next Event Prediction . . 46 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4.2 Story Cloze Test . . . . . . . . . . . . . . . . . . . . . . . 52 4.4.3 Open-ended Story Generation . . . . . . . . . . . . . . . . 53 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Chapter 5 ViStoryNet: Order Embedding of Successive Events and the Networks for Story Regeneration 58 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2 Order Embedding with Triple Learning . . . . . . . . . . . . . . 60 5.2.1 Embedding Ordered Objects in Sequences . . . . . . . . . 62 5.3 Problems and Contextual Events . . . . . . . . . . . . . . . . . . 62 5.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 62 5.3.2 Contextual Event Vectors from Kids Videos . . . . . . . . 64 5.4 Architectures for the Story Regeneration Task . . . . . . . . . . . 67 5.4.1 Two Sentence Generators as Decoders . . . . . . . . . . . 68 5.4.2 Successive Event Order Embedding (SEOE) . . . . . . . . 68 5.4.3 Sequence Models of the Event Space . . . . . . . . . . . . 72 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 73 5.5.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . 73 5.5.3 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . 74 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter 6 Concluding Remarks 80 6.1 Summary of Methods and Contributions . . . . . . . . . . . . . . 80 6.2 Limitation and Outlook . . . . . . . . . . . . . . . . . . . . . . . 81 6.3 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 81 ์ดˆ๋ก 101Docto

    Towards Deep Learning with Competing Generalisation Objectives

    Get PDF
    The unreasonable effectiveness of Deep Learning continues to deliver unprecedented Artificial Intelligence capabilities to billions of people. Growing datasets and technological advances keep extending the reach of expressive model architectures trained through efficient optimisations. Thus, deep learning approaches continue to provide increasingly proficient subroutines for, among others, computer vision and natural interaction through speech and text. Due to their scalable learning and inference priors, higher performance is often gained cost-effectively through largely automatic training. As a result, new and improved capabilities empower more people while the costs of access drop. The arising opportunities and challenges have profoundly influenced research. Quality attributes of scalable software became central desiderata of deep learning paradigms, including reusability, efficiency, robustness and safety. Ongoing research into continual, meta- and robust learning aims to maximise such scalability metrics in addition to multiple generalisation criteria, despite possible conflicts. A significant challenge is to satisfy competing criteria automatically and cost-effectively. In this thesis, we introduce a unifying perspective on learning with competing generalisation objectives and make three additional contributions. When autonomous learning through multi-criteria optimisation is impractical, it is reasonable to ask whether knowledge of appropriate trade-offs could make it simultaneously effective and efficient. Informed by explicit trade-offs of interest to particular applications, we developed and evaluated bespoke model architecture priors. We introduced a novel architecture for sim-to-real transfer of robotic control policies by learning progressively to generalise anew. Competing desiderata of continual learning were balanced through disjoint capacity and hierarchical reuse of previously learnt representations. A new state-of-the-art meta-learning approach is then proposed. We showed that meta-trained hypernetworks efficiently store and flexibly reuse knowledge for new generalisation criteria through few-shot gradient-based optimisation. Finally, we characterised empirical trade-offs between the many desiderata of adversarial robustness and demonstrated a novel defensive capability of implicit neural networks to hinder many attacks simultaneously

    Dynamic Mathematics for Automated Machine Learning Techniques

    Get PDF
    Machine Learning and Neural Networks have been gaining popularity and are widely considered as the driving force of the Fourth Industrial Revolution. However, modern machine learning techniques such as backpropagation training was firmly established in 1986 while computer vision was revolutionised in 2012 with the introduction of AlexNet. Given all these accomplishments, why are neural networks still not an integral part of our society? ``Because they are difficult to implement in practice.'' I'd like to use machine learning, but I can't invest much time. The concept of Automated Machine Learning (AutoML) was first proposed by Professor Frank Hutter of the University of Freiburg. Machine learning is not simple; it requires a practitioner to have thorough understanding on the attributes of their data and the components which their model entails. AutoML is the effort to automate all tedious aspects of machine learning to form a clean data analysis pipeline. This thesis is our effort to develop and to understand ways to automate machine learning. Specifically, we focused on Recurrent Neural Networks (RNNs), Meta-Learning, and Continual Learning. We studied continual learning to enable a network to sequentially acquire skills in a dynamic environment; we studied meta-learning to understand how a network can be configured efficiently; and we studied RNNs to understand the consequences of consecutive actions. Our RNN-study focused on mathematical interpretability. We described a large variety of RNNs as one mathematical class to understand their core network mechanism. This enabled us to extend meta-learning beyond network configuration for network pruning and continual learning. This also provided insights for us to understand how a single network should be consecutively configured and led us to the creation of a simple generic patch that is compatible to several existing continual learning archetypes. This patch enhanced the robustness of continual learning techniques and allowed them to generalise data better. By and large, this thesis presented a series of extensions to enable AutoML to be made simple, efficient, and robust. More importantly, all of our methods are motivated with mathematical understandings through the lens of dynamical systems. Thus, we also increased the interpretability of AutoML concepts

    A Unified Framework for Gradient-based Hyperparameter Optimization and Meta-learning

    Get PDF
    Machine learning algorithms and systems are progressively becoming part of our societies, leading to a growing need of building a vast multitude of accurate, reliable and interpretable models which should possibly exploit similarities among tasks. Automating segments of machine learning itself seems to be a natural step to undertake to deliver increasingly capable systems able to perform well in both the big-data and the few-shot learning regimes. Hyperparameter optimization (HPO) and meta-learning (MTL) constitute two building blocks of this growing effort. We explore these two topics under a unifying perspective, presenting a mathematical framework linked to bilevel programming that captures existing similarities and translates into procedures of practical interest rooted in algorithmic differentiation. We discuss the derivation, applicability and computational complexity of these methods and establish several approximation properties for a class of objective functions of the underlying bilevel programs. In HPO, these algorithms generalize and extend previous work on gradient-based methods. In MTL, the resulting framework subsumes classic and emerging strategies and provides a starting basis from which to build and analyze novel techniques. A series of examples and numerical simulations offer insight and highlight some limitations of these approaches. Experiments on larger-scale problems show the potential gains of the proposed methods in real-world applications. Finally, we develop two extensions of the basic algorithms apt to optimize a class of discrete hyperparameters (graph edges) in an application to relational learning and to tune online learning rate schedules for training neural network models, an old but crucially important issue in machine learning
    corecore