20 research outputs found
Saliency Guided End-to-End Learning for Weakly Supervised Object Detection
Weakly supervised object detection (WSOD), which is the problem of learning
detectors using only image-level labels, has been attracting more and more
interest. However, this problem is quite challenging due to the lack of
location supervision. To address this issue, this paper integrates saliency
into a deep architecture, in which the location in- formation is explored both
explicitly and implicitly. Specifically, we select highly confident object pro-
posals under the guidance of class-specific saliency maps. The location
information, together with semantic and saliency information, of the selected
proposals are then used to explicitly supervise the network by imposing two
additional losses. Meanwhile, a saliency prediction sub-network is built in the
architecture. The prediction results are used to implicitly guide the
localization procedure. The entire network is trained end-to-end. Experiments
on PASCAL VOC demonstrate that our approach outperforms all state-of-the-arts.Comment: Accepted to appear in IJCAI 201
LSTD: A Low-Shot Transfer Detector for Object Detection
Recent advances in object detection are mainly driven by deep learning with
large-scale detection benchmarks. However, the fully-annotated training set is
often limited for a target detection task, which may deteriorate the
performance of deep detectors. To address this challenge, we propose a novel
low-shot transfer detector (LSTD) in this paper, where we leverage rich
source-domain knowledge to construct an effective target-domain detector with
very few training examples. The main contributions are described as follows.
First, we design a flexible deep architecture of LSTD to alleviate transfer
difficulties in low-shot detection. This architecture can integrate the
advantages of both SSD and Faster RCNN in a unified deep framework. Second, we
introduce a novel regularized transfer learning framework for low-shot
detection, where the transfer knowledge (TK) and background depression (BD)
regularizations are proposed to leverage object knowledge respectively from
source and target domains, in order to further enhance fine-tuning with a few
target images. Finally, we examine our LSTD on a number of challenging low-shot
detection experiments, where LSTD outperforms other state-of-the-art
approaches. The results demonstrate that LSTD is a preferable deep detector for
low-shot scenarios.Comment: Accepted by AAAI201
Multiple Instance Curriculum Learning for Weakly Supervised Object Detection
When supervising an object detector with weakly labeled data, most existing
approaches are prone to trapping in the discriminative object parts, e.g.,
finding the face of a cat instead of the full body, due to lacking the
supervision on the extent of full objects. To address this challenge, we
incorporate object segmentation into the detector training, which guides the
model to correctly localize the full objects. We propose the multiple instance
curriculum learning (MICL) method, which injects curriculum learning (CL) into
the multiple instance learning (MIL) framework. The MICL method starts by
automatically picking the easy training examples, where the extent of the
segmentation masks agree with detection bounding boxes. The training set is
gradually expanded to include harder examples to train strong detectors that
handle complex images. The proposed MICL method with segmentation in the loop
outperforms the state-of-the-art weakly supervised object detectors by a
substantial margin on the PASCAL VOC datasets.Comment: Published in BMVC 201
Exploiting saliency for object segmentation from image level labels
There have been remarkable improvements in the semantic labelling task in the
recent years. However, the state of the art methods rely on large-scale
pixel-level annotations. This paper studies the problem of training a
pixel-wise semantic labeller network from image-level annotations of the
present object classes. Recently, it has been shown that high quality seeds
indicating discriminative object regions can be obtained from image-level
labels. Without additional information, obtaining the full extent of the object
is an inherently ill-posed problem due to co-occurrences. We propose using a
saliency model as additional information and hereby exploit prior knowledge on
the object extent and image statistics. We show how to combine both information
sources in order to recover 80% of the fully supervised performance - which is
the new state of the art in weakly supervised training for pixel-wise semantic
labelling. The code is available at https://goo.gl/KygSeb.Comment: CVPR 201