904 research outputs found
Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection
Top-down saliency models produce a probability map that peaks at target
locations specified by a task/goal such as object detection. They are usually
trained in a fully supervised setting involving pixel-level annotations of
objects. We propose a weakly supervised top-down saliency framework using only
binary labels that indicate the presence/absence of an object in an image.
First, the probabilistic contribution of each image region to the confidence of
a CNN-based image classifier is computed through a backtracking strategy to
produce top-down saliency. From a set of saliency maps of an image produced by
fast bottom-up saliency approaches, we select the best saliency map suitable
for the top-down task. The selected bottom-up saliency map is combined with the
top-down saliency map. Features having high combined saliency are used to train
a linear SVM classifier to estimate feature saliency. This is integrated with
combined saliency and further refined through a multi-scale
superpixel-averaging of saliency map. We evaluate the performance of the
proposed weakly supervised topdown saliency and achieve comparable performance
with fully supervised approaches. Experiments are carried out on seven
challenging datasets and quantitative results are compared with 40 closely
related approaches across 4 different applications.Comment: 14 pages, 7 figure
Exploiting saliency for object segmentation from image level labels
There have been remarkable improvements in the semantic labelling task in the
recent years. However, the state of the art methods rely on large-scale
pixel-level annotations. This paper studies the problem of training a
pixel-wise semantic labeller network from image-level annotations of the
present object classes. Recently, it has been shown that high quality seeds
indicating discriminative object regions can be obtained from image-level
labels. Without additional information, obtaining the full extent of the object
is an inherently ill-posed problem due to co-occurrences. We propose using a
saliency model as additional information and hereby exploit prior knowledge on
the object extent and image statistics. We show how to combine both information
sources in order to recover 80% of the fully supervised performance - which is
the new state of the art in weakly supervised training for pixel-wise semantic
labelling. The code is available at https://goo.gl/KygSeb.Comment: CVPR 201
Harvesting Information from Captions for Weakly Supervised Semantic Segmentation
Since acquiring pixel-wise annotations for training convolutional neural
networks for semantic image segmentation is time-consuming, weakly supervised
approaches that only require class tags have been proposed. In this work, we
propose another form of supervision, namely image captions as they can be found
on the Internet. These captions have two advantages. They do not require
additional curation as it is the case for the clean class tags used by current
weakly supervised approaches and they provide textual context for the classes
present in an image. To leverage such textual context, we deploy a multi-modal
network that learns a joint embedding of the visual representation of the image
and the textual representation of the caption. The network estimates text
activation maps (TAMs) for class names as well as compound concepts, i.e.
combinations of nouns and their attributes. The TAMs of compound concepts
describing classes of interest substantially improve the quality of the
estimated class activation maps which are then used to train a network for
semantic segmentation. We evaluate our method on the COCO dataset where it
achieves state of the art results for weakly supervised image segmentation
- …