Weakly supervised object detection (WSOD), which is the problem of learning
detectors using only image-level labels, has been attracting more and more
interest. However, this problem is quite challenging due to the lack of
location supervision. To address this issue, this paper integrates saliency
into a deep architecture, in which the location in- formation is explored both
explicitly and implicitly. Specifically, we select highly confident object pro-
posals under the guidance of class-specific saliency maps. The location
information, together with semantic and saliency information, of the selected
proposals are then used to explicitly supervise the network by imposing two
additional losses. Meanwhile, a saliency prediction sub-network is built in the
architecture. The prediction results are used to implicitly guide the
localization procedure. The entire network is trained end-to-end. Experiments
on PASCAL VOC demonstrate that our approach outperforms all state-of-the-arts.Comment: Accepted to appear in IJCAI 201