5,423 research outputs found
Multi-utility Learning: Structured-output Learning with Multiple Annotation-specific Loss Functions
Structured-output learning is a challenging problem; particularly so because
of the difficulty in obtaining large datasets of fully labelled instances for
training. In this paper we try to overcome this difficulty by presenting a
multi-utility learning framework for structured prediction that can learn from
training instances with different forms of supervision. We propose a unified
technique for inferring the loss functions most suitable for quantifying the
consistency of solutions with the given weak annotation. We demonstrate the
effectiveness of our framework on the challenging semantic image segmentation
problem for which a wide variety of annotations can be used. For instance, the
popular training datasets for semantic segmentation are composed of images with
hard-to-generate full pixel labellings, as well as images with easy-to-obtain
weak annotations, such as bounding boxes around objects, or image-level labels
that specify which object categories are present in an image. Experimental
evaluation shows that the use of annotation-specific loss functions
dramatically improves segmentation accuracy compared to the baseline system
where only one type of weak annotation is used
Learning to segment with image-level supervision
Deep convolutional networks have achieved the state-of-the-art for semantic
image segmentation tasks. However, training these networks requires access to
densely labeled images, which are known to be very expensive to obtain. On the
other hand, the web provides an almost unlimited source of images annotated at
the image level. How can one utilize this much larger weakly annotated set for
tasks that require dense labeling? Prior work often relied on localization
cues, such as saliency maps, objectness priors, bounding boxes etc., to address
this challenging problem. In this paper, we propose a model that generates
auxiliary labels for each image, while simultaneously forcing the output of the
CNN to satisfy the mean-field constraints imposed by a conditional random
field. We show that one can enforce the CRF constraints by forcing the
distribution at each pixel to be close to the distribution of its neighbors.
This is in stark contrast with methods that compute a recursive expansion of
the mean-field distribution using a recurrent architecture and train the
resultant distribution. Instead, the proposed model adds an extra loss term to
the output of the CNN, and hence, is faster than recursive implementations. We
achieve the state-of-the-art for weakly supervised semantic image segmentation
on VOC 2012 dataset, assuming no manually labeled pixel level information is
available. Furthermore, the incorporation of conditional random fields in CNN
incurs little extra time during training.Comment: Published in WACV 201
- …