6,091 research outputs found
Budget-aware Semi-Supervised Semantic and Instance Segmentation
Methods that move towards less supervised scenarios are key for image
segmentation, as dense labels demand significant human intervention. Generally,
the annotation burden is mitigated by labeling datasets with weaker forms of
supervision, e.g. image-level labels or bounding boxes. Another option are
semi-supervised settings, that commonly leverage a few strong annotations and a
huge number of unlabeled/weakly-labeled data. In this paper, we revisit
semi-supervised segmentation schemes and narrow down significantly the
annotation budget (in terms of total labeling time of the training set)
compared to previous approaches. With a very simple pipeline, we demonstrate
that at low annotation budgets, semi-supervised methods outperform by a wide
margin weakly-supervised ones for both semantic and instance segmentation. Our
approach also outperforms previous semi-supervised works at a much reduced
labeling cost. We present results for the Pascal VOC benchmark and unify weakly
and semi-supervised approaches by considering the total annotation budget, thus
allowing a fairer comparison between methods.Comment: To appear in CVPR-W 2019 (DeepVision workshop
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation
Large-scale data is of crucial importance for learning semantic segmentation
models, but annotating per-pixel masks is a tedious and inefficient procedure.
We note that for the topic of interactive image segmentation, scribbles are
very widely used in academic research and commercial software, and are
recognized as one of the most user-friendly ways of interacting. In this paper,
we propose to use scribbles to annotate images, and develop an algorithm to
train convolutional networks for semantic segmentation supervised by scribbles.
Our algorithm is based on a graphical model that jointly propagates information
from scribbles to unmarked pixels and learns network parameters. We present
competitive object semantic segmentation results on the PASCAL VOC dataset by
using scribbles as annotations. Scribbles are also favored for annotating stuff
(e.g., water, sky, grass) that has no well-defined shape, and our method shows
excellent results on the PASCAL-CONTEXT dataset thanks to extra inexpensive
scribble annotations. Our scribble annotations on PASCAL VOC are available at
http://research.microsoft.com/en-us/um/people/jifdai/downloads/scribble_supComment: accepted by CVPR 201
Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation
We propose an approach to discover class-specific pixels for the
weakly-supervised semantic segmentation task. We show that properly combining
saliency and attention maps allows us to obtain reliable cues capable of
significantly boosting the performance. First, we propose a simple yet powerful
hierarchical approach to discover the class-agnostic salient regions, obtained
using a salient object detector, which otherwise would be ignored. Second, we
use fully convolutional attention maps to reliably localize the class-specific
regions in a given image. We combine these two cues to discover class-specific
pixels which are then used as an approximate ground truth for training a CNN.
While solving the weakly supervised semantic segmentation task, we ensure that
the image-level classification task is also solved in order to enforce the CNN
to assign at least one pixel to each object present in the image.
Experimentally, on the PASCAL VOC12 val and test sets, we obtain the mIoU of
60.8% and 61.9%, achieving the performance gains of 5.1% and 5.2% compared to
the published state-of-the-art results. The code is made publicly available
Learning to Segment Human by Watching YouTube
An intuition on human segmentation is that when a human is moving in a video,
the video-context (e.g., appearance and motion clues) may potentially infer
reasonable mask information for the whole human body. Inspired by this, based
on popular deep convolutional neural networks (CNN), we explore a very-weakly
supervised learning framework for human segmentation task, where only an
imperfect human detector is available along with massive weakly-labeled YouTube
videos. In our solution, the video-context guided human mask inference and CNN
based segmentation network learning iterate to mutually enhance each other
until no further improvement gains. In the first step, each video is decomposed
into supervoxels by the unsupervised video segmentation. The superpixels within
the supervoxels are then classified as human or non-human by graph optimization
with unary energies from the imperfect human detection results and the
predicted confidence maps by the CNN trained in the previous iteration. In the
second step, the video-context derived human masks are used as direct labels to
train CNN. Extensive experiments on the challenging PASCAL VOC 2012 semantic
segmentation benchmark demonstrate that the proposed framework has already
achieved superior results than all previous weakly-supervised methods with
object class or bounding box annotations. In addition, by augmenting with the
annotated masks from PASCAL VOC 2012, our method reaches a new state-of-the-art
performance on the human segmentation task.Comment: Very-weakly supervised learning framework. New state-of-the-art
  performance on the human segmentation task! (Published in T-PAMI 2017
- …
