816 research outputs found
Deep Self-Taught Learning for Weakly Supervised Object Localization
Most existing weakly supervised localization (WSL) approaches learn detectors
by finding positive bounding boxes based on features learned with image-level
supervision. However, those features do not contain spatial location related
information and usually provide poor-quality positive samples for training a
detector. To overcome this issue, we propose a deep self-taught learning
approach, which makes the detector learn the object-level features reliable for
acquiring tight positive samples and afterwards re-train itself based on them.
Consequently, the detector progressively improves its detection ability and
localizes more informative positive samples. To implement such self-taught
learning, we propose a seed sample acquisition method via image-to-object
transferring and dense subgraph discovery to find reliable positive samples for
initializing the detector. An online supportive sample harvesting scheme is
further proposed to dynamically select the most confident tight positive
samples and train the detector in a mutual boosting way. To prevent the
detector from being trapped in poor optima due to overfitting, we propose a new
relative improvement of predicted CNN scores for guiding the self-taught
learning process. Extensive experiments on PASCAL 2007 and 2012 show that our
approach outperforms the state-of-the-arts, strongly validating its
effectiveness.Comment: Accepted as spotlight paper by CVPR 201
Self-taught Object Localization with Deep Networks
This paper introduces self-taught object localization, a novel approach that
leverages deep convolutional networks trained for whole-image recognition to
localize objects in images without additional human supervision, i.e., without
using any ground-truth bounding boxes for training. The key idea is to analyze
the change in the recognition scores when artificially masking out different
regions of the image. The masking out of a region that includes the object
typically causes a significant drop in recognition score. This idea is embedded
into an agglomerative clustering technique that generates self-taught
localization hypotheses. Our object localization scheme outperforms existing
proposal methods in both precision and recall for small number of subwindow
proposals (e.g., on ILSVRC-2012 it produces a relative gain of 23.4% over the
state-of-the-art for top-1 hypothesis). Furthermore, our experiments show that
the annotations automatically-generated by our method can be used to train
object detectors yielding recognition results remarkably close to those
obtained by training on manually-annotated bounding boxes.Comment: WACV 201
BAOD: Budget-Aware Object Detection
We study the problem of object detection from a novel perspective in which
annotation budget constraints are taken into consideration, appropriately
coined Budget Aware Object Detection (BAOD). When provided with a fixed budget,
we propose a strategy for building a diverse and informative dataset that can
be used to optimally train a robust detector. We investigate both optimization
and learning-based methods to sample which images to annotate and what type of
annotation (strongly or weakly supervised) to annotate them with. We adopt a
hybrid supervised learning framework to train the object detector from both
these types of annotation. We conduct a comprehensive empirical study showing
that a handcrafted optimization method outperforms other selection techniques
including random sampling, uncertainty sampling and active learning. By
combining an optimal image/annotation selection scheme with hybrid supervised
learning to solve the BAOD problem, we show that one can achieve the
performance of a strongly supervised detector on PASCAL-VOC 2007 while saving
12.8% of its original annotation budget. Furthermore, when of the
budget is used, it surpasses this performance by 2.0 mAP percentage points
Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation
We introduce a new loss function for the weakly-supervised training of
semantic image segmentation models based on three guiding principles: to seed
with weak localization cues, to expand objects based on the information about
which classes can occur in an image, and to constrain the segmentations to
coincide with object boundaries. We show experimentally that training a deep
convolutional neural network using the proposed loss function leads to
substantially better segmentations than previous state-of-the-art methods on
the challenging PASCAL VOC 2012 dataset. We furthermore give insight into the
working mechanism of our method by a detailed experimental study that
illustrates how the segmentation quality is affected by each term of the
proposed loss function as well as their combinations.Comment: ECCV 201
Zero-Annotation Object Detection with Web Knowledge Transfer
Object detection is one of the major problems in computer vision, and has
been extensively studied. Most of the existing detection works rely on
labor-intensive supervision, such as ground truth bounding boxes of objects or
at least image-level annotations. On the contrary, we propose an object
detection method that does not require any form of human annotation on target
tasks, by exploiting freely available web images. In order to facilitate
effective knowledge transfer from web images, we introduce a multi-instance
multi-label domain adaption learning framework with two key innovations. First
of all, we propose an instance-level adversarial domain adaptation network with
attention on foreground objects to transfer the object appearances from web
domain to target domain. Second, to preserve the class-specific semantic
structure of transferred object features, we propose a simultaneous transfer
mechanism to transfer the supervision across domains through pseudo strong
label generation. With our end-to-end framework that simultaneously learns a
weakly supervised detector and transfers knowledge across domains, we achieved
significant improvements over baseline methods on the benchmark datasets.Comment: Accepted in ECCV 201
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization
We propose `Hide-and-Seek', a weakly-supervised framework that aims to
improve object localization in images and action localization in videos. Most
existing weakly-supervised methods localize only the most discriminative parts
of an object rather than all relevant parts, which leads to suboptimal
performance. Our key idea is to hide patches in a training image randomly,
forcing the network to seek other relevant parts when the most discriminative
part is hidden. Our approach only needs to modify the input image and can work
with any network designed for object localization. During testing, we do not
need to hide any patches. Our Hide-and-Seek approach obtains superior
performance compared to previous methods for weakly-supervised object
localization on the ILSVRC dataset. We also demonstrate that our framework can
be easily extended to weakly-supervised action localization.Comment: Camera-Ready Version (ICCV 2017
Weakly-supervised localization of diabetic retinopathy lesions in retinal fundus images
Convolutional neural networks (CNNs) show impressive performance for image
classification and detection, extending heavily to the medical image domain.
Nevertheless, medical experts are sceptical in these predictions as the
nonlinear multilayer structure resulting in a classification outcome is not
directly graspable. Recently, approaches have been shown which help the user to
understand the discriminative regions within an image which are decisive for
the CNN to conclude to a certain class. Although these approaches could help to
build trust in the CNNs predictions, they are only slightly shown to work with
medical image data which often poses a challenge as the decision for a class
relies on different lesion areas scattered around the entire image. Using the
DiaretDB1 dataset, we show that on retina images different lesion areas
fundamental for diabetic retinopathy are detected on an image level with high
accuracy, comparable or exceeding supervised methods. On lesion level, we
achieve few false positives with high sensitivity, though, the network is
solely trained on image-level labels which do not include information about
existing lesions. Classifying between diseased and healthy images, we achieve
an AUC of 0.954 on the DiaretDB1.Comment: Accepted in Proc. IEEE International Conference on Image Processing
(ICIP), 201
Adversarial Learning for Semi-Supervised Semantic Segmentation
We propose a method for semi-supervised semantic segmentation using an
adversarial network. While most existing discriminators are trained to classify
input images as real or fake on the image level, we design a discriminator in a
fully convolutional manner to differentiate the predicted probability maps from
the ground truth segmentation distribution with the consideration of the
spatial resolution. We show that the proposed discriminator can be used to
improve semantic segmentation accuracy by coupling the adversarial loss with
the standard cross entropy loss of the proposed model. In addition, the fully
convolutional discriminator enables semi-supervised learning through
discovering the trustworthy regions in predicted results of unlabeled images,
thereby providing additional supervisory signals. In contrast to existing
methods that utilize weakly-labeled images, our method leverages unlabeled
images to enhance the segmentation model. Experimental results on the PASCAL
VOC 2012 and Cityscapes datasets demonstrate the effectiveness of the proposed
algorithm.Comment: Accepted in BMVC 2018. Code and models available at
https://github.com/hfslyc/AdvSemiSe
Collaborative Learning for Weakly Supervised Object Detection
Weakly supervised object detection has recently received much attention,
since it only requires image-level labels instead of the bounding-box labels
consumed in strongly supervised learning. Nevertheless, the save in labeling
expense is usually at the cost of model accuracy. In this paper, we propose a
simple but effective weakly supervised collaborative learning framework to
resolve this problem, which trains a weakly supervised learner and a strongly
supervised learner jointly by enforcing partial feature sharing and prediction
consistency. For object detection, taking WSDDN-like architecture as weakly
supervised detector sub-network and Faster-RCNN-like architecture as strongly
supervised detector sub-network, we propose an end-to-end Weakly Supervised
Collaborative Detection Network. As there is no strong supervision available to
train the Faster-RCNN-like sub-network, a new prediction consistency loss is
defined to enforce consistency of predictions between the two sub-networks as
well as within the Faster-RCNN-like sub-networks. At the same time, the two
detectors are designed to partially share features to further guarantee the
model consistency at perceptual level. Extensive experiments on PASCAL VOC 2007
and 2012 data sets have demonstrated the effectiveness of the proposed
framework
Weakly Supervised Object Detection with Segmentation Collaboration
Weakly supervised object detection aims at learning precise object detectors,
given image category labels. In recent prevailing works, this problem is
generally formulated as a multiple instance learning module guided by an image
classification loss. The object bounding box is assumed to be the one
contributing most to the classification among all proposals. However, the
region contributing most is also likely to be a crucial part or the supporting
context of an object. To obtain a more accurate detector, in this work we
propose a novel end-to-end weakly supervised detection approach, where a newly
introduced generative adversarial segmentation module interacts with the
conventional detection module in a collaborative loop. The collaboration
mechanism takes full advantages of the complementary interpretations of the
weakly supervised localization task, namely detection and segmentation tasks,
forming a more comprehensive solution. Consequently, our method obtains more
precise object bounding boxes, rather than parts or irrelevant surroundings.
Expectedly, the proposed method achieves an accuracy of 51.0% on the PASCAL VOC
2007 dataset, outperforming the state-of-the-arts and demonstrating its
superiority for weakly supervised object detection
- …