23 research outputs found
Two-Phase Learning for Weakly Supervised Object Localization
Weakly supervised semantic segmentation and localiza- tion have a problem of
focusing only on the most important parts of an image since they use only
image-level annota- tions. In this paper, we solve this problem fundamentally
via two-phase learning. Our networks are trained in two steps. In the first
step, a conventional fully convolutional network (FCN) is trained to find the
most discriminative parts of an image. In the second step, the activations on
the most salient parts are suppressed by inference conditional feedback, and
then the second learning is performed to find the area of the next most
important parts. By combining the activations of both phases, the entire
portion of the tar- get object can be captured. Our proposed training scheme is
novel and can be utilized in well-designed techniques for weakly supervised
semantic segmentation, salient region detection, and object location
prediction. Detailed experi- ments demonstrate the effectiveness of our
two-phase learn- ing in each task.Comment: Accepted at ICCV 201
Combining Weakly and Webly Supervised Learning for Classifying Food Images
Food classification from images is a fine-grained classification problem.
Manual curation of food images is cost, time and scalability prohibitive. On
the other hand, web data is available freely but contains noise. In this paper,
we address the problem of classifying food images with minimal data curation.
We also tackle a key problems with food images from the web where they often
have multiple cooccuring food types but are weakly labeled with a single label.
We first demonstrate that by sequentially adding a few manually curated samples
to a larger uncurated dataset from two web sources, the top-1 classification
accuracy increases from 50.3% to 72.8%. To tackle the issue of weak labels, we
augment the deep model with Weakly Supervised learning (WSL) that results in an
increase in performance to 76.2%. Finally, we show some qualitative results to
provide insights into the performance improvements using the proposed ideas
Self-Erasing Network for Integral Object Attention
Recently, adversarial erasing for weakly-supervised object attention has been
deeply studied due to its capability in localizing integral object regions.
However, such a strategy raises one key problem that attention regions will
gradually expand to non-object regions as training iterations continue, which
significantly decreases the quality of the produced attention maps. To tackle
such an issue as well as promote the quality of object attention, we introduce
a simple yet effective Self-Erasing Network (SeeNet) to prohibit attentions
from spreading to unexpected background regions. In particular, SeeNet
leverages two self-erasing strategies to encourage networks to use reliable
object and background cues for learning to attention. In this way, integral
object regions can be effectively highlighted without including much more
background regions. To test the quality of the generated attention maps, we
employ the mined object regions as heuristic cues for learning semantic
segmentation models. Experiments on Pascal VOC well demonstrate the superiority
of our SeeNet over other state-of-the-art methods.Comment: Accepted by NIPS201
Fully Using Classifiers for Weakly Supervised Semantic Segmentation with Modified Cues
This paper proposes a novel weakly-supervised semantic segmentation method
using image-level label only. The class-specific activation maps from the
well-trained classifiers are used as cues to train a segmentation network. The
well-known defects of these cues are coarseness and incompleteness. We use
super-pixel to refine them, and fuse the cues extracted from both a color image
trained classifier and a gray image trained classifier to compensate for their
incompleteness. The conditional random field is adapted to regulate the
training process and to refine the outputs further. Besides initializing the
segmentation network, the previously trained classifier is also used in the
testing phase to suppress the non-existing classes. Experimental results on the
PASCAL VOC 2012 dataset illustrate the effectiveness of our method
Tell Me Where to Look: Guided Attention Inference Network
Weakly supervised learning with only coarse labels can obtain visual
explanations of deep neural network such as attention maps by back-propagating
gradients. These attention maps are then available as priors for tasks such as
object localization and semantic segmentation. In one common framework we
address three shortcomings of previous approaches in modeling such attention
maps: We (1) first time make attention maps an explicit and natural component
of the end-to-end training, (2) provide self-guidance directly on these maps by
exploring supervision form the network itself to improve them, and (3)
seamlessly bridge the gap between using weak and extra supervision if
available. Despite its simplicity, experiments on the semantic segmentation
task demonstrate the effectiveness of our methods. We clearly surpass the
state-of-the-art on Pascal VOC 2012 val. and test set. Besides, the proposed
framework provides a way not only explaining the focus of the learner but also
feeding back with direct guidance towards specific tasks. Under mild
assumptions our method can also be understood as a plug-in to existing weakly
supervised learners to improve their generalization performance.Comment: Accepted in CVPR201
Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation
Weakly supervised semantic segmentation receives much research attention
since it alleviates the need to obtain a large amount of dense pixel-wise
ground-truth annotations for the training images. Compared with other forms of
weak supervision, image labels are quite efficient to obtain. In our work, we
focus on the weakly supervised semantic segmentation with image label
annotations. Recent progress for this task has been largely dependent on the
quality of generated pseudo-annotations. In this work, inspired by spatial
neural-attention for image captioning, we propose a decoupled spatial neural
attention network for generating pseudo-annotations. Our decoupled attention
structure could simultaneously identify the object regions and localize the
discriminative parts which generates high-quality pseudo-annotations in one
forward path. The generated pseudo-annotations lead to the segmentation results
which achieve the state-of-the-art in weakly-supervised semantic segmentation
C-WSL: Count-guided Weakly Supervised Localization
We introduce count-guided weakly supervised localization (C-WSL), an approach
that uses per-class object count as a new form of supervision to improve weakly
supervised localization (WSL). C-WSL uses a simple count-based region selection
algorithm to select high-quality regions, each of which covers a single object
instance during training, and improves existing WSL methods by training with
the selected regions. To demonstrate the effectiveness of C-WSL, we integrate
it into two WSL architectures and conduct extensive experiments on VOC2007 and
VOC2012. Experimental results show that C-WSL leads to large improvements in
WSL and that the proposed approach significantly outperforms the
state-of-the-art methods. The results of annotation experiments on VOC2007
suggest that a modest extra time is needed to obtain per-class object counts
compared to labeling only object categories in an image. Furthermore, we reduce
the annotation time by more than and compared to
center-click and bounding-box annotations.Comment: ECCV201
Weakly Supervised Localization Using Background Images
Weakly Supervised Object Localization (WSOL) methodsusually rely on fully
convolutional networks in order to ob-tain class activation maps(CAMs) of
targeted labels. How-ever, these networks always highlight the most
discriminativeparts to perform the task, the located areas are much smallerthan
entire targeted objects. In this work, we propose a novelend-to-end model to
enlarge CAMs generated from classifi-cation models, which can localize targeted
objects more pre-cisely. In detail, we add an additional module in
traditionalclassification networks to extract foreground object propos-als from
images without classifying them into specific cate-gories. Then we set these
normalized regions as unrestrictedpixel-level mask supervision for the
following classificationtask. We collect a set of images defined as Background
ImageSet from the Internet. The number of them is much smallerthan the targeted
dataset but surprisingly well supports themethod to extract foreground regions
from different pictures.The region extracted is independent from classification
task,where the extracted region in each image covers almost en-tire object
rather than just a significant part. Therefore, theseregions can serve as masks
to supervise the response mapgenerated from classification models to become
larger andmore precise. The method achieves state-of-the-art results
onCUB-200-2011 in terms of Top-1 and Top-5 localization er-ror while has a
competitive result on ILSVRC2016 comparedwith other approaches.Comment: Course project of CSC577, University of Rocheste
Learning to Exploit the Prior Network Knowledge for Weakly-Supervised Semantic Segmentation
Training a Convolutional Neural Network (CNN) for semantic segmentation
typically requires to collect a large amount of accurate pixel-level
annotations, a hard and expensive task. In contrast, simple image tags are
easier to gather. With this paper we introduce a novel weakly-supervised
semantic segmentation model able to learn from image labels, and just image
labels. Our model uses the prior knowledge of a network trained for image
recognition, employing these image annotations as an attention mechanism to
identify semantic regions in the images. We then present a methodology that
builds accurate class-specific segmentation masks from these regions, where
neither external objectness nor saliency algorithms are required. We describe
how to incorporate this mask generation strategy into a fully end-to-end
trainable process where the network jointly learns to classify and segment
images. Our experiments on PASCAL VOC 2012 dataset show that exploiting these
generated class-specific masks in conjunction with our novel end-to-end
learning process outperforms several recent weakly-supervised semantic
segmentation methods that use image tags only, and even some models that
leverage additional supervision or training data
Structure Label Prediction Using Similarity-Based Retrieval and Weakly-Supervised Label Mapping
Recently, there has been significant interest in various supervised machine
learning techniques that can help reduce the time and effort consumed by manual
interpretation workflows. However, most successful supervised machine learning
algorithms require huge amounts of annotated training data. Obtaining these
labels for large seismic volumes is a very time-consuming and laborious task.
We address this problem by presenting a weakly-supervised approach for
predicting the labels of various seismic structures. By having an interpreter
select a very small number of exemplar images for every class of subsurface
structures, we use a novel similarity-based retrieval technique to extract
thousands of images that contain similar subsurface structures from the seismic
volume. By assuming that similar images belong to the same class, we obtain
thousands of image-level labels for these images; we validate this assumption
in our results section. We then introduce a novel weakly-supervised algorithm
for mapping these rough image-level labels into more accurate pixel-level
labels that localize the different subsurface structures within the image. This
approach dramatically simplifies the process of obtaining labeled data for
training supervised machine learning algorithms on seismic interpretation
tasks. Using our method we generate thousands of automatically-labeled images
from the Netherlands Offshore F3 block with reasonably accurate pixel-level
labels. We believe this work will allow for more advances in machine
learning-enabled seismic interpretation.Comment: Published at SEG Geophysics Journal in Dec 201