11,046 research outputs found

    Semantic Object Parsing with Graph LSTM

    Full text link
    By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data. Particularly, instead of evenly and fixedly dividing an image to pixels or patches in existing multi-dimensional LSTM structures (e.g., Row, Grid and Diagonal LSTMs), we take each arbitrary-shaped superpixel as a semantically consistent node, and adaptively construct an undirected graph for each image, where the spatial relations of the superpixels are naturally used as edges. Constructed on such an adaptive graph topology, the Graph LSTM is more naturally aligned with the visual patterns in the image (e.g., object boundaries or appearance similarities) and provides a more economical information propagation route. Furthermore, for each optimization step over Graph LSTM, we propose to use a confidence-driven scheme to update the hidden and memory states of nodes progressively till all nodes are updated. In addition, for each node, the forgets gates are adaptively learned to capture different degrees of semantic correlation with neighboring nodes. Comprehensive evaluations on four diverse semantic object parsing datasets well demonstrate the significant superiority of our Graph LSTM over other state-of-the-art solutions.Comment: 18 page

    ์˜๋ฏธ๋ก ์  ์˜์ƒ ๋ถ„ํ• ์„ ์œ„ํ•œ ๋งฅ๋ฝ ์ธ์‹ ๊ธฐ๋ฐ˜ ํ‘œํ˜„ ํ•™์Šต

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 2. ์ด๊ฒฝ๋ฌด.Semantic segmentation, segmenting all the objects and identifying their categories, is a fundamental and important problem in computer vision. Traditional approaches to semantic segmentation are based on two main elements: visual appearance features and semantic context. Visual appearance features such as color, edge, shape and so on, are a primary source of information for reasoning the objects in an image. However, image data are sometimes unable to fully capture diversity in the object classes, since the appearance of the objects presented in real world scenes is affected by imaging conditions such as illumination, texture, occlusion, and viewpoint. Therefore, semantic context, obtained from not only the presence but also the location of other objects, can help to disambiguate the visual appearance in semantic segmentation tasks. The modern contextualized semantic segmentation systems have successfully improved segmentation performance by refining inconsistently labeled pixels via modeling of contextual interactions. However, they considered semantic context and visual appearance features independently due to the absence of the suitable representation model. Motivated by this issue, this dissertation proposes a novel framework for learning semantic context-aware representations in which appearance features is enhanced and enriched by semantic context and vice versa. The first part of the dissertation will be devoted to semantic context-aware appearance modeling for semantic segmentation. Adaptive context aggregation network is studied to capture semantic context adequately while multiple steps of reasoning. Secondly, semantic context will be reinforced by utilizing visual appearance. Graph and example-based context model is presented for estimating contextual relationships according to the visual appearance of objects. Finally, we propose a Multiscale Conditional Random Fields (CRFs), for integrating context-aware appearance and appearance-aware semantic context to produce accurate segmentations. Experimental evaluations show the effectiveness of the proposed context-aware representations on various challenging datasets.1 Introduction 1 1.1 Backgrounds 3 1.2 Context Modeling for Semantic Segmentation Systems 4 1.3 Dissertation Goal and Contribution 6 1.4 Organization of Dissertation 7 2 Adaptive Context Aggregation Network 11 2.1 Introduction 11 2.2 Related Works 13 2.3 Proposed Method 15 2.3.1 Embedding Network 15 2.3.2 Deeply Supervised Context Aggregation Network 16 2.4 Experiments 20 2.4.1 PASCAL VOC 2012 dataset 22 2.4.2 SIFT Flow dataset 23 2.5 Summary 25 3 Second-order Semantic Relationships 27 3.1 Introduction 27 3.2 Related Work 30 3.3 Our Approach 32 3.3.1 Overview 32 3.3.2 Retrieval System 34 3.3.3 Graph Construction 35 3.3.4 Context Exemplar Description 35 3.3.5 Context Link Prediction 37 3.4 Inference 40 3.5 Experiements 42 3.6 Summary 52 4 High-order Semantic Relationships 53 4.1 Introduction 53 4.2 Related work 55 4.3 The high-order semantic relation transfer algorithm 58 4.3.1 Problem statement 58 4.3.2 Objective function 59 4.3.3 Approximate algorithm 61 4.4 Semantic segmentation through semantic relation transfer 65 4.4.1 Scene retrieval 65 4.4.2 Inference 65 4.5 Experiements 67 4.6 Summary 73 5 Multiscale CRF formulation 75 5.1 Introduction 75 5.2 Proposed Method 76 5.2.1 Multiscale Potentials 77 5.2.2 Non Convex Optimization 79 5.3 Experiments 79 5.3.1 SiftFlow dataset 79 6 Conclusion 83 6.1 Summary of the dissertation 83 6.2 Future Works 84 Abstract (In Korean) 98Docto
    • โ€ฆ
    corecore