52 research outputs found
Saliency Guided End-to-End Learning for Weakly Supervised Object Detection
Weakly supervised object detection (WSOD), which is the problem of learning
detectors using only image-level labels, has been attracting more and more
interest. However, this problem is quite challenging due to the lack of
location supervision. To address this issue, this paper integrates saliency
into a deep architecture, in which the location in- formation is explored both
explicitly and implicitly. Specifically, we select highly confident object pro-
posals under the guidance of class-specific saliency maps. The location
information, together with semantic and saliency information, of the selected
proposals are then used to explicitly supervise the network by imposing two
additional losses. Meanwhile, a saliency prediction sub-network is built in the
architecture. The prediction results are used to implicitly guide the
localization procedure. The entire network is trained end-to-end. Experiments
on PASCAL VOC demonstrate that our approach outperforms all state-of-the-arts.Comment: Accepted to appear in IJCAI 201
LSTD: A Low-Shot Transfer Detector for Object Detection
Recent advances in object detection are mainly driven by deep learning with
large-scale detection benchmarks. However, the fully-annotated training set is
often limited for a target detection task, which may deteriorate the
performance of deep detectors. To address this challenge, we propose a novel
low-shot transfer detector (LSTD) in this paper, where we leverage rich
source-domain knowledge to construct an effective target-domain detector with
very few training examples. The main contributions are described as follows.
First, we design a flexible deep architecture of LSTD to alleviate transfer
difficulties in low-shot detection. This architecture can integrate the
advantages of both SSD and Faster RCNN in a unified deep framework. Second, we
introduce a novel regularized transfer learning framework for low-shot
detection, where the transfer knowledge (TK) and background depression (BD)
regularizations are proposed to leverage object knowledge respectively from
source and target domains, in order to further enhance fine-tuning with a few
target images. Finally, we examine our LSTD on a number of challenging low-shot
detection experiments, where LSTD outperforms other state-of-the-art
approaches. The results demonstrate that LSTD is a preferable deep detector for
low-shot scenarios.Comment: Accepted by AAAI201
Multiple Instance Curriculum Learning for Weakly Supervised Object Detection
When supervising an object detector with weakly labeled data, most existing
approaches are prone to trapping in the discriminative object parts, e.g.,
finding the face of a cat instead of the full body, due to lacking the
supervision on the extent of full objects. To address this challenge, we
incorporate object segmentation into the detector training, which guides the
model to correctly localize the full objects. We propose the multiple instance
curriculum learning (MICL) method, which injects curriculum learning (CL) into
the multiple instance learning (MIL) framework. The MICL method starts by
automatically picking the easy training examples, where the extent of the
segmentation masks agree with detection bounding boxes. The training set is
gradually expanded to include harder examples to train strong detectors that
handle complex images. The proposed MICL method with segmentation in the loop
outperforms the state-of-the-art weakly supervised object detectors by a
substantial margin on the PASCAL VOC datasets.Comment: Published in BMVC 201
Activity Driven Weakly Supervised Object Detection
Weakly supervised object detection aims at reducing the amount of supervision
required to train detection models. Such models are traditionally learned from
images/videos labelled only with the object class and not the object bounding
box. In our work, we try to leverage not only the object class labels but also
the action labels associated with the data. We show that the action depicted in
the image/video can provide strong cues about the location of the associated
object. We learn a spatial prior for the object dependent on the action (e.g.
"ball" is closer to "leg of the person" in "kicking ball"), and incorporate
this prior to simultaneously train a joint object detection and action
classification model. We conducted experiments on both video datasets and image
datasets to evaluate the performance of our weakly supervised object detection
model. Our approach outperformed the current state-of-the-art (SOTA) method by
more than 6% in mAP on the Charades video dataset.Comment: CVPR'19 camera read
- …