5,423 research outputs found

    Weakly supervised structured output learning for semantic segmentation

    Get PDF

    Multi-utility Learning: Structured-output Learning with Multiple Annotation-specific Loss Functions

    Full text link
    Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training. In this paper we try to overcome this difficulty by presenting a multi-utility learning framework for structured prediction that can learn from training instances with different forms of supervision. We propose a unified technique for inferring the loss functions most suitable for quantifying the consistency of solutions with the given weak annotation. We demonstrate the effectiveness of our framework on the challenging semantic image segmentation problem for which a wide variety of annotations can be used. For instance, the popular training datasets for semantic segmentation are composed of images with hard-to-generate full pixel labellings, as well as images with easy-to-obtain weak annotations, such as bounding boxes around objects, or image-level labels that specify which object categories are present in an image. Experimental evaluation shows that the use of annotation-specific loss functions dramatically improves segmentation accuracy compared to the baseline system where only one type of weak annotation is used

    Learning to segment with image-level supervision

    Full text link
    Deep convolutional networks have achieved the state-of-the-art for semantic image segmentation tasks. However, training these networks requires access to densely labeled images, which are known to be very expensive to obtain. On the other hand, the web provides an almost unlimited source of images annotated at the image level. How can one utilize this much larger weakly annotated set for tasks that require dense labeling? Prior work often relied on localization cues, such as saliency maps, objectness priors, bounding boxes etc., to address this challenging problem. In this paper, we propose a model that generates auxiliary labels for each image, while simultaneously forcing the output of the CNN to satisfy the mean-field constraints imposed by a conditional random field. We show that one can enforce the CRF constraints by forcing the distribution at each pixel to be close to the distribution of its neighbors. This is in stark contrast with methods that compute a recursive expansion of the mean-field distribution using a recurrent architecture and train the resultant distribution. Instead, the proposed model adds an extra loss term to the output of the CNN, and hence, is faster than recursive implementations. We achieve the state-of-the-art for weakly supervised semantic image segmentation on VOC 2012 dataset, assuming no manually labeled pixel level information is available. Furthermore, the incorporation of conditional random fields in CNN incurs little extra time during training.Comment: Published in WACV 201
    corecore