283 research outputs found

    Effective Graph-Based Content--Based Image Retrieval Systems for Large-Scale and Small-Scale Image Databases

    Get PDF
    This dissertation proposes two novel manifold graph-based ranking systems for Content-Based Image Retrieval (CBIR). The two proposed systems exploit the synergism between relevance feedback-based transductive short-term learning and semantic feature-based long-term learning to improve retrieval performance. Proposed systems first apply the active learning mechanism to construct users\u27 relevance feedback log and extract high-level semantic features for each image. These systems then create manifold graphs by incorporating both the low-level visual similarity and the high-level semantic similarity to achieve more meaningful structures for the image space. Finally, asymmetric relevance vectors are created to propagate relevance scores of labeled images to unlabeled images via manifold graphs. The extensive experimental results demonstrate two proposed systems outperform the other state-of-the-art CBIR systems in the context of both correct and erroneous users\u27 feedback

    Sampling-Based Methods for Factored Task and Motion Planning

    Full text link
    This paper presents a general-purpose formulation of a large class of discrete-time planning problems, with hybrid state and control-spaces, as factored transition systems. Factoring allows state transitions to be described as the intersection of several constraints each affecting a subset of the state and control variables. Robotic manipulation problems with many movable objects involve constraints that only affect several variables at a time and therefore exhibit large amounts of factoring. We develop a theoretical framework for solving factored transition systems with sampling-based algorithms. The framework characterizes conditions on the submanifold in which solutions lie, leading to a characterization of robust feasibility that incorporates dimensionality-reducing constraints. It then connects those conditions to corresponding conditional samplers that can be composed to produce values on this submanifold. We present two domain-independent, probabilistically complete planning algorithms that take, as input, a set of conditional samplers. We demonstrate the empirical efficiency of these algorithms on a set of challenging task and motion planning problems involving picking, placing, and pushing

    잠재 μž„λ² λ”©μ„ ν†΅ν•œ μ‹œκ°μ  μŠ€ν† λ¦¬λ‘œλΆ€ν„°μ˜ μ„œμ‚¬ ν…μŠ€νŠΈ 생성기 ν•™μŠ΅

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2019. 2. μž₯병탁.The ability to understand the story is essential to make humans unique from other primates as well as animals. The capability of story understanding is crucial for AI agents to live with people in everyday life and understand their context. However, most research on story AI focuses on automated story generation based on closed worlds designed manually, which are widely used for computation authoring. Machine learning techniques on story corpora face similar problems of natural language processing such as omitting details and commonsense knowledge. Since the remarkable success of deep learning on computer vision field, increasing our interest in research on bridging between vision and language, vision-grounded story data will potentially improve the performance of story understanding and narrative text generation. Let us assume that AI agents lie in the environment in which the sensing information is input by the camera. Those agents observe the surroundings, translate them into the story in natural language, and predict the following event or multiple ones sequentially. This dissertation study on the related problems: learning stories or generating the narrative text from image streams or videos. The first problem is to generate a narrative text from a sequence of ordered images. As a solution, we introduce a GLAC Net (Global-local Attention Cascading Network). It translates from image sequences to narrative paragraphs in text as a encoder-decoder framework with sequence-to-sequence setting. It has convolutional neural networks for extracting information from images, and recurrent neural networks for text generation. We introduce visual cue encoders with stacked bidirectional LSTMs, and all of the outputs of each layer are aggregated as contextualized image vectors to extract visual clues. The coherency of the generated text is further improved by conveying (cascading) the information of the previous sentence to the next sentence serially in the decoders. We evaluate the performance of it on the Visual storytelling (VIST) dataset. It outperforms other state-of-the-art results and shows the best scores in total score and all of 6 aspects in the visual storytelling challenge with evaluation of human judges. The second is to predict the following events or narrative texts with the former parts of stories. It should be possible to predict at any step with an arbitrary length. We propose recurrent event retrieval models as a solution. They train a context accumulation function and two embedding functions, where make close the distance between the cumulative context at current time and the next probable events on a latent space. They update the cumulative context with a new event as a input using bilinear operations, and we can find the next event candidates with the updated cumulative context. We evaluate them for Story Cloze Test, they show competitive performance and the best in open-ended generation setting. Also, it demonstrates the working examples in an interactive setting. The third deals with the study on composite representation learning for semantics and order for video stories. We embed each episode as a trajectory-like sequence of events on the latent space, and propose a ViStoryNet to regenerate video stories with them (tasks of story completion). We convert event sentences to thought vectors, and train functions to make successive event embed close each other to form episodes as trajectories. Bi-directional LSTMs are trained as sequence models, and decoders to generate event sentences with GRUs. We test them experimentally with PororoQA dataset, and observe that most of episodes show the form of trajectories. We use them to complete the blocked part of stories, and they show not perfect but overall similar result. Those results above can be applied to AI agents in the living area sensing with their cameras, explain the situation as stories, infer some unobserved parts, and predict the future story.μŠ€ν† λ¦¬λ₯Ό μ΄ν•΄ν•˜λŠ” λŠ₯λ ₯은 동물듀 뿐만 μ•„λ‹ˆλΌ λ‹€λ₯Έ μœ μΈμ›κ³Ό 인λ₯˜λ₯Ό κ΅¬λ³„μ§“λŠ” μ€‘μš”ν•œ λŠ₯λ ₯이닀. 인곡지λŠ₯이 μΌμƒμƒν™œ μ†μ—μ„œ μ‚¬λžŒλ“€κ³Ό ν•¨κ»˜ μ§€λ‚΄λ©΄μ„œ κ·Έλ“€μ˜ μƒν™œ 속 λ§₯락을 μ΄ν•΄ν•˜κΈ° μœ„ν•΄μ„œλŠ” μŠ€ν† λ¦¬λ₯Ό μ΄ν•΄ν•˜λŠ” λŠ₯λ ₯이 맀우 μ€‘μš”ν•˜λ‹€. ν•˜μ§€λ§Œ, 기쑴의 μŠ€ν† λ¦¬μ— κ΄€ν•œ μ—°κ΅¬λŠ” μ–Έμ–΄μ²˜λ¦¬μ˜ μ–΄λ €μ›€μœΌλ‘œ 인해 사전에 μ •μ˜λœ 세계 λͺ¨λΈ ν•˜μ—μ„œ 쒋은 ν’ˆμ§ˆμ˜ μ €μž‘λ¬Όμ„ μƒμ„±ν•˜λ €λŠ” 기술이 주둜 μ—°κ΅¬λ˜μ–΄ μ™”λ‹€. κΈ°κ³„ν•™μŠ΅ 기법을 톡해 μŠ€ν† λ¦¬λ₯Ό λ‹€λ£¨λ €λŠ” μ‹œλ„λ“€μ€ λŒ€μ²΄λ‘œ μžμ—°μ–΄λ‘œ ν‘œν˜„λœ 데이터에 κΈ°λ°˜ν•  수 밖에 μ—†μ–΄ μžμ—°μ–΄ μ²˜λ¦¬μ—μ„œ κ²ͺλŠ” λ¬Έμ œλ“€μ„ λ™μΌν•˜κ²Œ κ²ͺλŠ”λ‹€. 이λ₯Ό κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄μ„œλŠ” μ‹œκ°μ  정보가 ν•¨κ»˜ μ—°λ™λœ 데이터가 도움이 될 수 μžˆλ‹€. 졜근 λ”₯λŸ¬λ‹μ˜ λˆˆλΆ€μ‹  λ°œμ „μ— νž˜μž…μ–΄ μ‹œκ°κ³Ό μ–Έμ–΄ μ‚¬μ΄μ˜ 관계λ₯Ό λ‹€λ£¨λŠ” 연ꡬ듀이 λŠ˜μ–΄λ‚˜κ³  μžˆλ‹€. μ—°κ΅¬μ˜ λΉ„μ „μœΌλ‘œμ„œ, 인곡지λŠ₯ μ—μ΄μ „νŠΈκ°€ μ£Όλ³€ 정보λ₯Ό μΉ΄λ©”λΌλ‘œ μž…λ ₯λ°›λŠ” ν™˜κ²½ 속에 λ†“μ—¬μžˆλŠ” 상황을 생각해 λ³Ό 수 μžˆλ‹€. 이 μ•ˆμ—μ„œ 인곡지λŠ₯ μ—μ΄μ „νŠΈλŠ” 주변을 κ΄€μ°°ν•˜λ©΄μ„œ 그에 λŒ€ν•œ μŠ€ν† λ¦¬λ₯Ό μžμ—°μ–΄ ν˜•νƒœλ‘œ μƒμ„±ν•˜κ³ , μƒμ„±λœ μŠ€ν† λ¦¬λ₯Ό λ°”νƒ•μœΌλ‘œ λ‹€μŒμ— 일어날 μŠ€ν† λ¦¬λ₯Ό ν•œ λ‹¨κ³„μ—μ„œ μ—¬λŸ¬ λ‹¨κ³„κΉŒμ§€ μ˜ˆμΈ‘ν•  수 μžˆλ‹€. λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œλŠ” 사진 및 λΉ„λ””μ˜€ 속에 λ‚˜νƒ€λ‚˜λŠ” μŠ€ν† λ¦¬(visual story)λ₯Ό ν•™μŠ΅ν•˜λŠ” 방법, λ‚΄λŸ¬ν‹°λΈŒ ν…μŠ€νŠΈλ‘œμ˜ λ³€ν™˜, 가렀진 사건 및 λ‹€μŒ 사건을 μΆ”λ‘ ν•˜λŠ” 연ꡬ듀을 닀룬닀. 첫 번째둜, μ—¬λŸ¬ μž₯의 사진이 μ£Όμ–΄μ‘Œμ„ λ•Œ 이λ₯Ό λ°”νƒ•μœΌλ‘œ μŠ€ν† λ¦¬ ν…μŠ€νŠΈλ₯Ό μƒμ„±ν•˜λŠ” 문제(λΉ„μ£Όμ–Ό μŠ€ν† λ¦¬ν…”λ§)λ₯Ό 닀룬닀. 이 문제 해결을 μœ„ν•΄ κΈ€λž™λ„·(GLAC Net)을 μ œμ•ˆν•˜μ˜€λ‹€. λ¨Όμ €, μ‚¬μ§„λ“€λ‘œλΆ€ν„° 정보λ₯Ό μΆ”μΆœν•˜κΈ° μœ„ν•œ μ»¨λ³Όλ£¨μ…˜ 신경망, λ¬Έμž₯을 μƒμ„±ν•˜κΈ° μœ„ν•΄ μˆœν™˜μ‹ κ²½λ§μ„ μ΄μš©ν•œλ‹€. μ‹œν€€μŠ€-μ‹œν€€μŠ€ ꡬ쑰의 μΈμ½”λ”λ‘œμ„œ, 전체적인 이야기 ꡬ쑰의 ν‘œν˜„μ„ μœ„ν•΄ 닀계측 μ–‘λ°©ν–₯ μˆœν™˜μ‹ κ²½λ§μ„ λ°°μΉ˜ν•˜λ˜ 각 사진 별 정보λ₯Ό ν•¨κ»˜ μ΄μš©ν•˜κΈ° μœ„ν•΄ 전역적-ꡭ뢀적 μ£Όμ˜μ§‘μ€‘ λͺ¨λΈμ„ μ œμ•ˆν•˜μ˜€λ‹€. λ˜ν•œ, μ—¬λŸ¬ λ¬Έμž₯을 μƒμ„±ν•˜λŠ” λ™μ•ˆ λ§₯락정보와 ꡭ뢀정보λ₯Ό μžƒμ§€ μ•Šκ²Œ ν•˜κΈ° μœ„ν•΄ μ•žμ„  λ¬Έμž₯ 정보λ₯Ό μ „λ‹¬ν•˜λŠ” λ©”μ»€λ‹ˆμ¦˜μ„ μ œμ•ˆν•˜μ˜€λ‹€. μœ„ μ œμ•ˆ λ°©λ²•μœΌλ‘œ λΉ„μŠ€νŠΈ(VIST) 데이터 집합을 ν•™μŠ΅ν•˜μ˜€κ³ , 제 1 회 μ‹œκ°μ  μŠ€ν† λ¦¬ν…”λ§ λŒ€νšŒ(visual storytelling challenge)μ—μ„œ μ‚¬λžŒ 평가λ₯Ό κΈ°μ€€μœΌλ‘œ 전체 점수 및 6 ν•­λͺ© λ³„λ‘œ λͺ¨λ‘ μ΅œκ³ μ μ„ λ°›μ•˜λ‹€. 두 번째둜, μŠ€ν† λ¦¬μ˜ 일뢀가 λ¬Έμž₯λ“€λ‘œ μ£Όμ–΄μ‘Œμ„ λ•Œ 이λ₯Ό λ°”νƒ•μœΌλ‘œ λ‹€μŒ λ¬Έμž₯을 μ˜ˆμΈ‘ν•˜λŠ” 문제λ₯Ό 닀룬닀. μž„μ˜μ˜ 길이의 μŠ€ν† λ¦¬μ— λŒ€ν•΄ μž„μ˜μ˜ μœ„μΉ˜μ—μ„œ 예츑이 κ°€λŠ₯ν•΄μ•Ό ν•˜κ³ , μ˜ˆμΈ‘ν•˜λ €λŠ” 단계 μˆ˜μ— λ¬΄κ΄€ν•˜κ²Œ μž‘λ™ν•΄μ•Ό ν•œλ‹€. 이λ₯Ό μœ„ν•œ λ°©λ²•μœΌλ‘œ μˆœν™˜ 사건 인좜 λͺ¨λΈ(Recurrent Event Retrieval Models)을 μ œμ•ˆν•˜μ˜€λ‹€. 이 방법은 은닉 곡간 μƒμ—μ„œ ν˜„μž¬κΉŒμ§€ λˆ„μ λœ λ§₯락과 λ‹€μŒμ— λ°œμƒν•  유λ ₯ 사건 μ‚¬μ΄μ˜ 거리λ₯Ό κ°€κΉκ²Œ ν•˜λ„λ‘ λ§₯λ½λˆ„μ ν•¨μˆ˜μ™€ 두 개의 μž„λ² λ”© ν•¨μˆ˜λ₯Ό ν•™μŠ΅ν•œλ‹€. 이λ₯Ό 톡해 이미 μž…λ ₯λ˜μ–΄ 있던 μŠ€ν† λ¦¬μ— μƒˆλ‘œμš΄ 사건이 μž…λ ₯되면 μŒμ„ ν˜•μ  연산을 톡해 기쑴의 λ§₯락을 κ°œμ„ ν•˜μ—¬ λ‹€μŒμ— λ°œμƒν•  유λ ₯ν•œ 사건듀을 μ°ΎλŠ”λ‹€. 이 λ°©λ²•μœΌλ‘œ λ½μŠ€ν† λ¦¬(ROCStories) 데이터집합을 ν•™μŠ΅ν•˜μ˜€κ³ , μŠ€ν† λ¦¬ 클둜즈 ν…ŒμŠ€νŠΈ(Story Cloze Test)λ₯Ό 톡해 ν‰κ°€ν•œ κ²°κ³Ό 경쟁λ ₯ μžˆλŠ” μ„±λŠ₯을 λ³΄μ˜€μœΌλ©°, 특히 μž„μ˜μ˜ 길이둜 μΆ”λ‘ ν•  수 μžˆλŠ” 기법 쀑에 μ΅œκ³ μ„±λŠ₯을 λ³΄μ˜€λ‹€. μ„Έ 번째둜, λΉ„λ””μ˜€ μŠ€ν† λ¦¬μ—μ„œ 사건 μ‹œν€€μŠ€ 쀑 일뢀가 κ°€λ €μ‘Œμ„ λ•Œ 이λ₯Ό λ³΅κ΅¬ν•˜λŠ” 문제λ₯Ό 닀룬닀. 특히, 각 μ‚¬κ±΄μ˜ 의미 정보와 μˆœμ„œλ₯Ό λͺ¨λΈμ˜ ν‘œν˜„ ν•™μŠ΅μ— λ°˜μ˜ν•˜κ³ μž ν•˜μ˜€λ‹€. 이λ₯Ό μœ„ν•΄ 은닉 곡간 상에 각 μ—ν”Όμ†Œλ“œλ“€μ„ ꢀ적 ν˜•νƒœλ‘œ μž„λ² λ”©ν•˜κ³ , 이λ₯Ό λ°”νƒ•μœΌλ‘œ μŠ€ν† λ¦¬λ₯Ό μž¬μƒμ„±μ„ ν•˜μ—¬ μŠ€ν† λ¦¬ 완성을 ν•  수 μžˆλŠ” λͺ¨λΈμΈ λΉ„μŠ€ν† λ¦¬λ„·(ViStoryNet)을 μ œμ•ˆν•˜μ˜€λ‹€. 각 μ—ν”Όμ†Œλ“œλ₯Ό ꢀ적 ν˜•νƒœλ₯Ό κ°€μ§€κ²Œ ν•˜κΈ° μœ„ν•΄ 사건 λ¬Έμž₯을 사고벑터(thought vector)둜 λ³€ν™˜ν•˜κ³ , 연속 이벀트 μˆœμ„œ μž„λ² λ”©μ„ 톡해 μ „ν›„ 사건듀이 μ„œλ‘œ κ°€κΉκ²Œ μž„λ² λ”©λ˜λ„λ‘ ν•˜μ—¬ ν•˜λ‚˜μ˜ μ—ν”Όμ†Œλ“œκ°€ ꢀ적의 λͺ¨μ–‘을 가지도둝 ν•™μŠ΅ν•˜μ˜€λ‹€. λ½€λ‘œλ‘œQA 데이터집합을 톡해 μ‹€ν—˜μ μœΌλ‘œ κ²°κ³Όλ₯Ό ν™•μΈν•˜μ˜€λ‹€. μž„λ² λ”© 된 μ—ν”Όμ†Œλ“œλ“€μ€ ꢀ적 ν˜•νƒœλ‘œ 잘 λ‚˜νƒ€λ‚¬μœΌλ©°, μ—ν”Όμ†Œλ“œλ“€μ„ μž¬μƒμ„± ν•΄λ³Έ κ²°κ³Ό 전체적인 μΈ‘λ©΄μ—μ„œ μœ μ‚¬ν•œ κ²°κ³Όλ₯Ό λ³΄μ˜€λ‹€. μœ„ 결과물듀은 μΉ΄λ©”λΌλ‘œ μž…λ ₯λ˜λŠ” μ£Όλ³€ 정보λ₯Ό λ°”νƒ•μœΌλ‘œ μŠ€ν† λ¦¬λ₯Ό μ΄ν•΄ν•˜κ³  일뢀 κ΄€μΈ‘λ˜μ§€ μ•Šμ€ 뢀뢄을 μΆ”λ‘ ν•˜λ©°, ν–₯ν›„ μŠ€ν† λ¦¬λ₯Ό μ˜ˆμΈ‘ν•˜λŠ” 방법듀에 λŒ€μ‘λœλ‹€.Abstract i Chapter 1 Introduction 1 1.1 Story of Everyday lives in Videos and Story Understanding . . . 1 1.2 Problems to be addressed . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Approach and Contribution . . . . . . . . . . . . . . . . . . . . . 6 1.4 Organization of Dissertation . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2 Background and Related Work 10 2.1 Why We Study Stories . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Latent Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Order Embedding and Ordinal Embedding . . . . . . . . . . . . 14 2.4 Comparison to Story Understanding . . . . . . . . . . . . . . . . 15 2.5 Story Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.1 Abstract Event Representations . . . . . . . . . . . . . . . 17 2.5.2 Seq-to-seq Attentional Models . . . . . . . . . . . . . . . . 18 2.5.3 Story Generation from Images . . . . . . . . . . . . . . . 19 Chapter 3 Visual Storytelling via Global-local Attention Cascading Networks 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Evaluation for Visual Storytelling . . . . . . . . . . . . . . . . . . 26 3.3 Global-local Attention Cascading Networks (GLAC Net) . . . . . 27 3.3.1 Encoder: Contextualized Image Vector Extractor . . . . . 28 3.3.2 Decoder: Story Generator with Attention and Cascading Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.1 VIST Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.2 Experiment Settings . . . . . . . . . . . . . . . . . . . . . 33 3.4.3 Network Training Details . . . . . . . . . . . . . . . . . . 36 3.4.4 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . 38 3.4.5 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . 38 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 4 Common Space Learning on Cumulative Contexts and the Next Events: Recurrent Event Retrieval Models 44 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Problems of Context Accumulation . . . . . . . . . . . . . . . . . 45 4.3 Recurrent Event Retrieval Models for Next Event Prediction . . 46 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4.2 Story Cloze Test . . . . . . . . . . . . . . . . . . . . . . . 52 4.4.3 Open-ended Story Generation . . . . . . . . . . . . . . . . 53 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Chapter 5 ViStoryNet: Order Embedding of Successive Events and the Networks for Story Regeneration 58 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2 Order Embedding with Triple Learning . . . . . . . . . . . . . . 60 5.2.1 Embedding Ordered Objects in Sequences . . . . . . . . . 62 5.3 Problems and Contextual Events . . . . . . . . . . . . . . . . . . 62 5.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 62 5.3.2 Contextual Event Vectors from Kids Videos . . . . . . . . 64 5.4 Architectures for the Story Regeneration Task . . . . . . . . . . . 67 5.4.1 Two Sentence Generators as Decoders . . . . . . . . . . . 68 5.4.2 Successive Event Order Embedding (SEOE) . . . . . . . . 68 5.4.3 Sequence Models of the Event Space . . . . . . . . . . . . 72 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 73 5.5.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . 73 5.5.3 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . 74 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter 6 Concluding Remarks 80 6.1 Summary of Methods and Contributions . . . . . . . . . . . . . . 80 6.2 Limitation and Outlook . . . . . . . . . . . . . . . . . . . . . . . 81 6.3 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 81 초둝 101Docto

    Hyperbolic Deep Neural Networks: A Survey

    Full text link
    Recently, there has been a rising surge of momentum for deep representation learning in hyperbolic spaces due to theirhigh capacity of modeling data like knowledge graphs or synonym hierarchies, possessing hierarchical structure. We refer to the model as hyperbolic deep neural network in this paper. Such a hyperbolic neural architecture potentially leads to drastically compact model withmuch more physical interpretability than its counterpart in Euclidean space. To stimulate future research, this paper presents acoherent and comprehensive review of the literature around the neural components in the construction of hyperbolic deep neuralnetworks, as well as the generalization of the leading deep approaches to the Hyperbolic space. It also presents current applicationsaround various machine learning tasks on several publicly available datasets, together with insightful observations and identifying openquestions and promising future directions

    동쒅, 이쒅, 그리고 λ‚˜λ¬΄ ν˜•νƒœμ˜ κ·Έλž˜ν”„λ₯Ό μœ„ν•œ 비지도 ν‘œν˜„ ν•™μŠ΅

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2022. 8. μ΅œμ§„μ˜.κ·Έλž˜ν”„ 데이터에 λŒ€ν•œ 비지도 ν‘œν˜„ ν•™μŠ΅μ˜ λͺ©μ μ€ κ·Έλž˜ν”„μ˜ ꡬ쑰와 λ…Έλ“œμ˜ 속성을 잘 λ°˜μ˜ν•˜λŠ” μœ μš©ν•œ λ…Έλ“œ λ‹¨μœ„ ν˜Ήμ€ κ·Έλž˜ν”„ λ‹¨μœ„μ˜ 벑터 ν˜•νƒœ ν‘œν˜„μ„ ν•™μŠ΅ν•˜λŠ” 것이닀. 졜근, κ·Έλž˜ν”„ 데이터에 λŒ€ν•΄ κ°•λ ₯ν•œ ν‘œν˜„ ν•™μŠ΅ λŠ₯λ ₯을 κ°–μΆ˜ κ·Έλž˜ν”„ 신경망을 ν™œμš©ν•œ 비지도 κ·Έλž˜ν”„ ν‘œν˜„ ν•™μŠ΅ λͺ¨λΈμ˜ 섀계가 μ£Όλͺ©μ„ λ°›κ³  μžˆλ‹€. λ§Žμ€ 방법듀은 ν•œ μ’…λ₯˜μ˜ 엣지와 ν•œ μ’…λ₯˜μ˜ λ…Έλ“œκ°€ μ‘΄μž¬ν•˜λŠ” 동쒅 κ·Έλž˜ν”„μ— λŒ€ν•œ ν•™μŠ΅μ— 집쀑을 ν•œλ‹€. ν•˜μ§€λ§Œ 이 세상에 μˆ˜λ§Žμ€ μ’…λ₯˜μ˜ 관계가 μ‘΄μž¬ν•˜κΈ° λ•Œλ¬Έμ—, κ·Έλž˜ν”„ λ˜ν•œ ꡬ쑰적, 의미둠적 속성을 톡해 λ‹€μ–‘ν•œ μ’…λ₯˜λ‘œ λΆ„λ₯˜ν•  수 μžˆλ‹€. κ·Έλž˜μ„œ, κ·Έλž˜ν”„λ‘œλΆ€ν„° μœ μš©ν•œ ν‘œν˜„μ„ ν•™μŠ΅ν•˜κΈ° μœ„ν•΄μ„œλŠ” 비지도 ν•™μŠ΅ ν”„λ ˆμž„μ›Œν¬λŠ” μž…λ ₯ κ·Έλž˜ν”„μ˜ νŠΉμ§•μ„ μ œλŒ€λ‘œ κ³ λ €ν•΄μ•Όλ§Œ ν•œλ‹€. λ³Έ ν•™μœ„λ…Όλ¬Έμ—μ„œ μš°λ¦¬λŠ” 널리 μ ‘ν•  수 μžˆλŠ” 세가지 κ·Έλž˜ν”„ ꡬ쑰인 동쒅 κ·Έλž˜ν”„, 트리 ν˜•νƒœμ˜ κ·Έλž˜ν”„, 그리고 이쒅 κ·Έλž˜ν”„μ— λŒ€ν•œ κ·Έλž˜ν”„ 신경망을 ν™œμš©ν•˜λŠ” 비지도 ν•™μŠ΅ λͺ¨λΈλ“€μ„ μ œμ•ˆν•œλ‹€. 처음으둜, μš°λ¦¬λŠ” 동쒅 κ·Έλž˜ν”„μ˜ λ…Έλ“œμ— λŒ€ν•˜μ—¬ 저차원 ν‘œν˜„μ„ ν•™μŠ΅ν•˜λŠ” κ·Έλž˜ν”„ μ»¨λ³Όλ£¨μ…˜ μ˜€ν† μΈμ½”λ” λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. 기쑴의 κ·Έλž˜ν”„ μ˜€ν† μΈμ½”λ”λŠ” ꡬ쑰의 전체가 ν•™μŠ΅μ΄ λΆˆκ°€λŠ₯ν•΄μ„œ μ œν•œμ μΈ ν‘œν˜„ ν•™μŠ΅ λŠ₯λ ₯을 κ°€μ§ˆ 수 μžˆλŠ” λ°˜λ©΄μ—, μ œμ•ˆν•˜λŠ” μ˜€ν† μΈμ½”λ”λŠ” λ…Έλ“œμ˜ 피쳐λ₯Ό λ³΅μ›ν•˜λ©°,ꡬ쑰의 전체가 ν•™μŠ΅μ΄ κ°€λŠ₯ν•˜λ‹€. λ…Έλ“œμ˜ 피쳐λ₯Ό λ³΅μ›ν•˜κΈ° μœ„ν•΄μ„œ, μš°λ¦¬λŠ” 인코더 λΆ€λΆ„μ˜ 역할이 μ΄μ›ƒν•œ λ…Έλ“œλΌλ¦¬ μœ μ‚¬ν•œ ν‘œν˜„μ„ κ°€μ§€κ²Œ ν•˜λŠ” λΌν”ŒλΌμ‹œμ•ˆ μŠ€λ¬΄λ”©μ΄λΌλŠ” 것에 μ£Όλͺ©ν•˜μ—¬ 디코더 λΆ€λΆ„μ—μ„œλŠ” 이웃 λ…Έλ“œμ˜ ν‘œν˜„κ³Ό λ©€μ–΄μ§€κ²Œ ν•˜λŠ” λΌν”ŒλΌμ‹œμ•ˆ 샀프닝을 ν•˜λ„λ‘ μ„€κ³„ν•˜μ˜€λ‹€. λ˜ν•œ λΌν”ŒλΌμ‹œμ•ˆ 샀프닝을 κ·ΈλŒ€λ‘œ μ μš©ν•˜λ©΄ λΆˆμ•ˆμ •μ„±μ„ μœ λ°œν•  수 있기 λ•Œλ¬Έμ—, μ—£μ§€μ˜ κ°€μ€‘μΉ˜ 값에 음의 값을 쀄 수 μžˆλŠ” λΆ€ν˜Έν˜• κ·Έλž˜ν”„λ₯Ό ν™œμš©ν•˜μ—¬ μ•ˆμ •μ μΈ λΌν”ŒλΌμ‹œμ•ˆ μƒ€ν”„λ‹μ˜ ν˜•νƒœλ₯Ό μ œμ•ˆν•˜μ˜€λ‹€. 동쒅 κ·Έλž˜ν”„μ— λŒ€ν•œ λ…Έλ“œ ν΄λŸ¬μŠ€ν„°λ§κ³Ό 링크 예츑 μ‹€ν—˜μ„ ν†΅ν•˜μ—¬ μ œμ•ˆν•˜λŠ” 방법이 μ•ˆμ •μ μœΌλ‘œ μš°μˆ˜ν•œ μ„±λŠ₯을 λ³΄μž„μ„ ν™•μΈν•˜μ˜€λ‹€. λ‘˜μ§Έλ‘œ, μš°λ¦¬λŠ” 트리의 ν˜•νƒœλ₯Ό κ°€μ§€λŠ” 계측적인 관계λ₯Ό 가지고 μžˆλŠ” κ·Έλž˜ν”„μ˜ λ…Έλ“œ ν‘œν˜„μ„ μ •ν™•ν•˜κ²Œ ν•™μŠ΅ν•˜κΈ° μœ„ν•˜μ—¬ μŒκ³‘μ„  κ³΅κ°„μ—μ„œ λ™μž‘ν•˜λŠ” μ˜€ν† μΈμ½”λ” λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. μœ ν΄λ¦¬λ””μ–Έ 곡간은 트리λ₯Ό μ‚¬μƒν•˜κΈ°μ— λΆ€μ μ ˆν•˜λ‹€λŠ” 졜근의 뢄석을 ν†΅ν•˜μ—¬, μŒκ³‘μ„  κ³΅κ°„μ—μ„œ κ·Έλž˜ν”„ μ‹ κ²½λ§μ˜ λ ˆμ΄μ–΄λ₯Ό ν™œμš©ν•˜μ—¬ λ…Έλ“œμ˜ 저차원 ν‘œν˜„μ„ ν•™μŠ΅ν•˜κ²Œ λœλ‹€. 이 λ•Œ, κ·Έλž˜ν”„ 신경망이 μŒκ³‘μ„  κΈ°ν•˜ν•™μ—μ„œ 계측 정보λ₯Ό λ‹΄κ³  μžˆλŠ” 거리의 값을 ν™œμš©ν•˜μ—¬ λ…Έλ“œμ˜ μ΄μ›ƒμ‚¬μ΄μ˜ μ€‘μš”λ„λ₯Ό ν™œμš©ν•˜λ„λ‘ μ„€κ³„ν•˜μ˜€λ‹€. μš°λ¦¬λŠ” λ…Όλ¬Έ 인용 관계 λ„€νŠΈμ›Œν¬, 계톡도, 이미지 μ‚¬μ΄μ˜ λ„€νŠΈμ›Œν¬λ“±μ— λŒ€ν•΄ μ œμ•ˆν•œ λͺ¨λΈμ„ μ μš©ν•˜μ—¬ λ…Έλ“œ ν΄λŸ¬μŠ€ν„°λ§κ³Ό 링크 예츑 μ‹€ν—˜μ„ ν•˜μ˜€μœΌλ©°, 트리의 ν˜•νƒœλ₯Ό κ°€μ§€λŠ” κ·Έλž˜ν”„μ— λŒ€ν•΄μ„œ μ œμ•ˆν•œ λͺ¨λΈμ΄ μœ ν΄λ¦¬λ””μ–Έ κ³΅κ°„μ—μ„œ μˆ˜ν–‰ν•˜λŠ” λͺ¨λΈμ— λΉ„ν•΄ ν–₯μƒλœ μ„±λŠ₯을 λ³΄μ˜€λ‹€λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μš°λ¦¬λŠ” μ—¬λŸ¬ μ’…λ₯˜μ˜ λ…Έλ“œμ™€ 엣지λ₯Ό κ°€μ§€λŠ” μ΄μ’…κ·Έλž˜ν”„μ— λŒ€ν•œ λŒ€μ‘° ν•™μŠ΅ λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. μš°λ¦¬λŠ” 기쑴의 방법듀이 ν•™μŠ΅ν•˜κΈ° 이전에 μΆ©λΆ„ν•œ 도메인 지식을 μ‚¬μš©ν•˜μ—¬ μ„€κ³„ν•œ λ©”νƒ€νŒ¨μŠ€λ‚˜ λ©”νƒ€κ·Έλž˜ν”„μ— μ˜μ‘΄ν•œλ‹€λŠ” 단점과 λ§Žμ€ μ΄μ’…κ·Έλž˜ν”„μ˜ 엣지가 λ‹€λ₯Έ λ…Έλ“œ μ’…λ₯˜μ‚¬μ΄μ˜ 관계에 μ§‘μ€‘ν•˜κ³  μžˆλ‹€λŠ” 점을 μ£Όλͺ©ν•˜μ˜€λ‹€. 이λ₯Ό 톡해 μš°λ¦¬λŠ” 사전과정이 ν•„μš”μ—†μœΌλ©° λ‹€λ₯Έ μ’…λ₯˜ μ‚¬μ΄μ˜ 관계에 λ”ν•˜μ—¬ 같은 μ’…λ₯˜ μ‚¬μ΄μ˜ 관계도 λ™μ‹œμ— 효율적으둜 ν•™μŠ΅ν•˜κ²Œ ν•˜λŠ” λ©”νƒ€λ…Έλ“œλΌλŠ” κ°œλ…μ„ μ œμ•ˆν•˜μ˜€λ‹€. λ˜ν•œ λ©”νƒ€λ…Έλ“œλ₯Ό κΈ°λ°˜μœΌλ‘œν•˜λŠ” κ·Έλž˜ν”„ 신경망과 λŒ€μ‘° ν•™μŠ΅ λͺ¨λΈμ„ μ œμ•ˆν•˜μ˜€λ‹€. μš°λ¦¬λŠ” μ œμ•ˆν•œ λͺ¨λΈμ„ λ©”νƒ€νŒ¨μŠ€λ₯Ό μ‚¬μš©ν•˜λŠ” μ΄μ’…κ·Έλž˜ν”„ ν•™μŠ΅ λͺ¨λΈκ³Ό λ…Έλ“œ ν΄λŸ¬μŠ€ν„°λ§ λ“±μ˜ μ‹€ν—˜ μ„±λŠ₯으둜 λΉ„κ΅ν•΄λ³΄μ•˜μ„ λ•Œ, λΉ„λ“±ν•˜κ±°λ‚˜ 높은 μ„±λŠ₯을 λ³΄μ˜€μŒμ„ ν™•μΈν•˜μ˜€λ‹€.The goal of unsupervised graph representation learning is extracting useful node-wise or graph-wise vector representation that is aware of the intrinsic structures of the graph and its attributes. These days, designing methodology of unsupervised graph representation learning based on graph neural networks has growing attention due to their powerful representation ability. Many methods are focused on a homogeneous graph that is a network with a single type of node and a single type of edge. However, as many types of relationships exist in this world, graphs can also be classified into various types by structural and semantic properties. For this reason, to learn useful representations from graphs, the unsupervised learning framework must consider the characteristics of the input graph. In this dissertation, we focus on designing unsupervised learning models using graph neural networks for three graph structures that are widely available: homogeneous graphs, tree-like graphs, and heterogeneous graphs. First, we propose a symmetric graph convolutional autoencoder which produces a low-dimensional latent representation from a homogeneous graph. In contrast to the existing graph autoencoders with asymmetric decoder parts, the proposed autoencoder has a newly designed decoder which builds a completely symmetric autoencoder form. For the reconstruction of node features, the decoder is designed based on Laplacian sharpening as the counterpart of Laplacian smoothing of the encoder, which allows utilizing the graph structure in the whole processes of the proposed autoencoder architecture. In order to prevent the numerical instability of the network caused by the Laplacian sharpening introduction, we further propose a new numerically stable form of the Laplacian sharpening by incorporating the signed graphs. The experimental results of clustering, link prediction and visualization tasks on homogeneous graphs strongly support that the proposed model is stable and outperforms various state-of-the-art algorithms. Second, we analyze how unsupervised tasks can benefit from learned representations in hyperbolic space. To explore how well the hierarchical structure of unlabeled data can be represented in hyperbolic spaces, we design a novel hyperbolic message passing autoencoder whose overall auto-encoding is performed in hyperbolic space. The proposed model conducts auto-encoding the networks via fully utilizing hyperbolic geometry in message passing. Through extensive quantitative and qualitative analyses, we validate the properties and benefits of the unsupervised hyperbolic representations of tree-like graphs. Third, we propose the novel concept of metanode for message passing to learn both heterogeneous and homogeneous relationships between any two nodes without meta-paths and meta-graphs. Unlike conventional methods, metanodes do not require a predetermined step to manipulate the given relations between different types to enrich relational information. Going one step further, we propose a metanode-based message passing layer and a contrastive learning model using the proposed layer. In our experiments, we show the competitive performance of the proposed metanode-based message passing method on node clustering and node classification tasks, when compared to state-of-the-art methods for message passing networks for heterogeneous graphs.1 Introduction 1 2 Representation Learning on Graph-Structured Data 4 2.1 Basic Introduction 4 2.1.1 Notations 5 2.2 Traditional Approaches 5 2.2.1 Graph Statistic 5 2.2.2 Neighborhood Overlap 7 2.2.3 Graph Kernel 9 2.2.4 Spectral Approaches 10 2.3 Node Embeddings I: Factorization and Random Walks 15 2.3.1 Factorization-based Methods 15 2.3.2 Random Walk-based Methods 16 2.4 Node Embeddings II: Graph Neural Networks 17 2.4.1 Overview of Framework 17 2.4.2 Representative Models 18 2.5 Learning in Unsupervised Environments 21 2.5.1 Predictive Coding 21 2.5.2 Contrastive Coding 22 2.6 Applications 24 2.6.1 Classifications 24 2.6.2 Link Prediction 26 3 Autoencoder Architecture for Homogeneous Graphs 27 3.1 Overview 27 3.2 Preliminaries 30 3.2.1 Spectral Convolution on Graphs 30 3.2.2 Laplacian Smoothing 32 3.3 Methodology 33 3.3.1 Laplacian Sharpening 33 3.3.2 Numerically Stable Laplacian Sharpening 34 3.3.3 Subspace Clustering Cost for Image Clustering 37 3.3.4 Training 39 3.4 Experiments 40 3.4.1 Datasets 40 3.4.2 Experimental Settings 42 3.4.3 Comparing Methods 42 3.4.4 Node Clustering 43 3.4.5 Image Clustering 45 3.4.6 Ablation Studies 46 3.4.7 Link Prediction 47 3.4.8 Visualization 47 3.5 Summary 49 4 Autoencoder Architecture for Tree-like Graphs 50 4.1 Overview 50 4.2 Preliminaries 52 4.2.1 Hyperbolic Embeddings 52 4.2.2 Hyperbolic Geometry 53 4.3 Methodology 55 4.3.1 Geometry-Aware Message Passing 56 4.3.2 Nonlinear Activation 57 4.3.3 Loss Function 58 4.4 Experiments 58 4.4.1 Datasets 59 4.4.2 Compared Methods 61 4.4.3 Experimental Details 62 4.4.4 Node Clustering and Link Prediction 64 4.4.5 Image Clustering 66 4.4.6 Structure-Aware Unsupervised Embeddings 68 4.4.7 Hyperbolic Distance to Filter Training Samples 71 4.4.8 Ablation Studies 74 4.5 Further Discussions 75 4.5.1 Connection to Contrastive Learning 75 4.5.2 Failure Cases of Hyperbolic Embedding Spaces 75 4.6 Summary 77 5 Contrastive Learning for Heterogeneous Graphs 78 5.1 Overview 78 5.2 Preliminaries 82 5.2.1 Meta-path 82 5.2.2 Representation Learning on Heterogeneous Graphs 82 5.2.3 Contrastive methods for Heterogeneous Graphs 83 5.3 Methodology 84 5.3.1 Definitions 84 5.3.2 Metanode-based Message Passing Layer 86 5.3.3 Contrastive Learning Framework 88 5.4 Experiments 89 5.4.1 Experimental Details 90 5.4.2 Node Classification 94 5.4.3 Node Clustering 96 5.4.4 Visualization 96 5.4.5 Effectiveness of Metanodes 97 5.5 Summary 99 6 Conclusions 101λ°•

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201
    • …
    corecore