2,296 research outputs found
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference
The main obstacle to weakly supervised semantic image segmentation is the
difficulty of obtaining pixel-level information from coarse image-level
annotations. Most methods based on image-level annotations use localization
maps obtained from the classifier, but these only focus on the small
discriminative parts of objects and do not capture precise boundaries.
FickleNet explores diverse combinations of locations on feature maps created by
generic deep neural networks. It selects hidden units randomly and then uses
them to obtain activation scores for image classification. FickleNet implicitly
learns the coherence of each location in the feature maps, resulting in a
localization map which identifies both discriminative and other parts of
objects. The ensemble effects are obtained from a single network by selecting
random hidden unit pairs, which means that a variety of localization maps are
generated from a single image. Our approach does not require any additional
training steps and only adds a simple layer to a standard convolutional neural
network; nevertheless it outperforms recent comparable techniques on the Pascal
VOC 2012 benchmark in both weakly and semi-supervised settings.Comment: To appear in CVPR 201
On the Importance of Visual Context for Data Augmentation in Scene Understanding
Performing data augmentation for learning deep neural networks is known to be
important for training visual recognition systems. By artificially increasing
the number of training examples, it helps reducing overfitting and improves
generalization. While simple image transformations can already improve
predictive performance in most vision tasks, larger gains can be obtained by
leveraging task-specific prior knowledge. In this work, we consider object
detection, semantic and instance segmentation and augment the training images
by blending objects in existing scenes, using instance segmentation
annotations. We observe that randomly pasting objects on images hurts the
performance, unless the object is placed in the right context. To resolve this
issue, we propose an explicit context model by using a convolutional neural
network, which predicts whether an image region is suitable for placing a given
object or not. In our experiments, we show that our approach is able to improve
object detection, semantic and instance segmentation on the PASCAL VOC12 and
COCO datasets, with significant gains in a limited annotation scenario, i.e.
when only one category is annotated. We also show that the method is not
limited to datasets that come with expensive pixel-wise instance annotations
and can be used when only bounding boxes are available, by employing
weakly-supervised learning for instance masks approximation.Comment: Updated the experimental section. arXiv admin note: substantial text
overlap with arXiv:1807.0742
Weakly supervised segment annotation via expectation kernel density estimation
Since the labelling for the positive images/videos is ambiguous in weakly
supervised segment annotation, negative mining based methods that only use the
intra-class information emerge. In these methods, negative instances are
utilized to penalize unknown instances to rank their likelihood of being an
object, which can be considered as a voting in terms of similarity. However,
these methods 1) ignore the information contained in positive bags, 2) only
rank the likelihood but cannot generate an explicit decision function. In this
paper, we propose a voting scheme involving not only the definite negative
instances but also the ambiguous positive instances to make use of the extra
useful information in the weakly labelled positive bags. In the scheme, each
instance votes for its label with a magnitude arising from the similarity, and
the ambiguous positive instances are assigned soft labels that are iteratively
updated during the voting. It overcomes the limitations of voting using only
the negative bags. We also propose an expectation kernel density estimation
(eKDE) algorithm to gain further insight into the voting mechanism.
Experimental results demonstrate the superiority of our scheme beyond the
baselines.Comment: 9 pages, 2 figure
- …