1,399 research outputs found
Learning to Segment Every Thing
Most methods for object instance segmentation require all training examples
to be labeled with segmentation masks. This requirement makes it expensive to
annotate new categories and has restricted instance segmentation models to ~100
well-annotated classes. The goal of this paper is to propose a new partially
supervised training paradigm, together with a novel weight transfer function,
that enables training instance segmentation models on a large set of categories
all of which have box annotations, but only a small fraction of which have mask
annotations. These contributions allow us to train Mask R-CNN to detect and
segment 3000 visual concepts using box annotations from the Visual Genome
dataset and mask annotations from the 80 classes in the COCO dataset. We
evaluate our approach in a controlled study on the COCO dataset. This work is a
first step towards instance segmentation models that have broad comprehension
of the visual world
Learning to segment with image-level supervision
Deep convolutional networks have achieved the state-of-the-art for semantic
image segmentation tasks. However, training these networks requires access to
densely labeled images, which are known to be very expensive to obtain. On the
other hand, the web provides an almost unlimited source of images annotated at
the image level. How can one utilize this much larger weakly annotated set for
tasks that require dense labeling? Prior work often relied on localization
cues, such as saliency maps, objectness priors, bounding boxes etc., to address
this challenging problem. In this paper, we propose a model that generates
auxiliary labels for each image, while simultaneously forcing the output of the
CNN to satisfy the mean-field constraints imposed by a conditional random
field. We show that one can enforce the CRF constraints by forcing the
distribution at each pixel to be close to the distribution of its neighbors.
This is in stark contrast with methods that compute a recursive expansion of
the mean-field distribution using a recurrent architecture and train the
resultant distribution. Instead, the proposed model adds an extra loss term to
the output of the CNN, and hence, is faster than recursive implementations. We
achieve the state-of-the-art for weakly supervised semantic image segmentation
on VOC 2012 dataset, assuming no manually labeled pixel level information is
available. Furthermore, the incorporation of conditional random fields in CNN
incurs little extra time during training.Comment: Published in WACV 201
Learning to Segment Breast Biopsy Whole Slide Images
We trained and applied an encoder-decoder model to semantically segment
breast biopsy images into biologically meaningful tissue labels. Since
conventional encoder-decoder networks cannot be applied directly on large
biopsy images and the different sized structures in biopsies present novel
challenges, we propose four modifications: (1) an input-aware encoding block to
compensate for information loss, (2) a new dense connection pattern between
encoder and decoder, (3) dense and sparse decoders to combine multi-level
features, (4) a multi-resolution network that fuses the results of
encoder-decoders run on different resolutions. Our model outperforms a
feature-based approach and conventional encoder-decoders from the literature.
We use semantic segmentations produced with our model in an automated diagnosis
task and obtain higher accuracies than a baseline approach that employs an SVM
for feature-based segmentation, both using the same segmentation-based
diagnostic features.Comment: Added more WSI images in appendi
- …