Fully supervised object detection has achieved great success in recent years.
However, abundant bounding boxes annotations are needed for training a detector
for novel classes. To reduce the human labeling effort, we propose a novel
webly supervised object detection (WebSOD) method for novel classes which only
requires the web images without further annotations. Our proposed method
combines bottom-up and top-down cues for novel class detection. Within our
approach, we introduce a bottom-up mechanism based on the well-trained fully
supervised object detector (i.e. Faster RCNN) as an object region estimator for
web images by recognizing the common objectiveness shared by base and novel
classes. With the estimated regions on the web images, we then utilize the
top-down attention cues as the guidance for region classification. Furthermore,
we propose a residual feature refinement (RFR) block to tackle the domain
mismatch between web domain and the target domain. We demonstrate our proposed
method on PASCAL VOC dataset with three different novel/base splits. Without
any target-domain novel-class images and annotations, our proposed webly
supervised object detection model is able to achieve promising performance for
novel classes. Moreover, we also conduct transfer learning experiments on large
scale ILSVRC 2013 detection dataset and achieve state-of-the-art performance