969 research outputs found

    Geometric Supervision and Deep Structured Models for Image Segmentation

    Get PDF
    The task of semantic segmentation aims at understanding an image at a pixel level. Due to its applicability in many areas, such as autonomous vehicles, robotics and medical surgery assistance, semantic segmentation has become an essential task in image analysis. During the last few years a lot of progress have been made for image segmentation algorithms, mainly due to the introduction of deep learning methods, in particular the use of Convolutional Neural Networks (CNNs). CNNs are powerful for modeling complex connections between input and output data but have two drawbacks when it comes to semantic segmentation. Firstly, CNNs lack the ability to directly model dependent output structures, for instance, explicitly enforcing properties such as label smoothness and coherence. This drawback motivates the use of Conditional Random Fields (CRFs), applied as a post-processing step in semantic segmentation. Secondly, training CNNs requires large amounts of annotated data. For segmentation this amounts to dense, pixel-level, annotations that are very time-consuming to acquire.This thesis summarizes the content of five papers addressing the two aforementioned drawbacks of CNNs. The first two papers present methods on how geometric 3D models can be used to improve segmentation models. The 3D models can be created with little human labour and can be used as a supervisory signal to improve the robustness of semantic segmentation and long-term visual localization methods. The last three papers focuses on models combining CNNs and CRFs for semantic segmentation. The models consist of a CNN capable of learning complex image features coupled with a CRF capable of learning dependencies between output variables. Emphasis has been on creating models that are possible to train end-to-end, giving the CNN and the CRF a chance to learn how to interact and exploit complementary information to achieve better performance

    Recurrent Pixel Embedding for Instance Grouping

    Full text link
    We introduce a differentiable, end-to-end trainable framework for solving pixel-level grouping problems such as instance segmentation consisting of two novel components. First, we regress pixels into a hyper-spherical embedding space so that pixels from the same group have high cosine similarity while those from different groups have similarity below a specified margin. We analyze the choice of embedding dimension and margin, relating them to theoretical results on the problem of distributing points uniformly on the sphere. Second, to group instances, we utilize a variant of mean-shift clustering, implemented as a recurrent neural network parameterized by kernel bandwidth. This recurrent grouping module is differentiable, enjoys convergent dynamics and probabilistic interpretability. Backpropagating the group-weighted loss through this module allows learning to focus on only correcting embedding errors that won't be resolved during subsequent clustering. Our framework, while conceptually simple and theoretically abundant, is also practically effective and computationally efficient. We demonstrate substantial improvements over state-of-the-art instance segmentation for object proposal generation, as well as demonstrating the benefits of grouping loss for classification tasks such as boundary detection and semantic segmentation

    End-to-End Learning of Deep Structured Models for Semantic Segmentation

    Get PDF
    The task of semantic segmentation aims at understanding an image at a pixel level. This means assigning a label to each pixel of an image, describing the object it is depicting. Due to its applicability in many areas, such as autonomous vehicles, robotics and medical surgery assistance, semantic segmentation has become an essential task in image analysis. During the last few years a lot of progress have been made for image segmentation algorithms, mainly due to the introduction of deep learning methods, in particular the use of Convolutional Neural Networks (CNNs). CNNs are powerful for modeling complex connections between input and output data but lack the ability to directly model dependent output structures, for instance, enforcing properties such as label smoothness and coherence. This drawback motivates the use of Conditional Random Fields (CRFs), widely applied as a post-processing step in semantic segmentation.This thesis summarizes the content of three papers, all of them presenting solutions to semantic segmentation problems. The applications have varied widely and several different types of data have been considered, ranging from 3D CT images to RGB images of horses. The main focus has been on developing robust and accurate models to solve these problems. The models consist of a CNN capable of learning complex image features coupled with a CRF capable of learning dependencies between output variables. Emphasis has been on creating models that are possible to train end-to-end, as well as developing corresponding optimization methods needed to enable efficient training. End-to-end training gives the CNN and the CRF a chance to learn how to interact and exploit complementary information to achieve better performance

    MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

    Full text link
    In this work, we tackle the problem of instance segmentation, the task of simultaneously solving object detection and semantic segmentation. Towards this goal, we present a model, called MaskLab, which produces three outputs: box detection, semantic segmentation, and direction prediction. Building on top of the Faster-RCNN object detector, the predicted boxes provide accurate localization of object instances. Within each region of interest, MaskLab performs foreground/background segmentation by combining semantic and direction prediction. Semantic segmentation assists the model in distinguishing between objects of different semantic classes including background, while the direction prediction, estimating each pixel's direction towards its corresponding center, allows separating instances of the same semantic class. Moreover, we explore the effect of incorporating recent successful methods from both segmentation and detection (i.e. atrous convolution and hypercolumn). Our proposed model is evaluated on the COCO instance segmentation benchmark and shows comparable performance with other state-of-art models.Comment: 10 pages including referenc
    • …
    corecore