18 research outputs found
Feature Selective Networks for Object Detection
Objects for detection usually have distinct characteristics in different
sub-regions and different aspect ratios. However, in prevalent two-stage object
detection methods, Region-of-Interest (RoI) features are extracted by RoI
pooling with little emphasis on these translation-variant feature components.
We present feature selective networks to reform the feature representations of
RoIs by exploiting their disparities among sub-regions and aspect ratios. Our
network produces the sub-region attention bank and aspect ratio attention bank
for the whole image. The RoI-based sub-region attention map and aspect ratio
attention map are selectively pooled from the banks, and then used to refine
the original RoI features for RoI classification. Equipped with a light-weight
detection subnetwork, our network gets a consistent boost in detection
performance based on general ConvNet backbones (ResNet-101, GoogLeNet and
VGG-16). Without bells and whistles, our detectors equipped with ResNet-101
achieve more than 3% mAP improvement compared to counterparts on PASCAL VOC
2007, PASCAL VOC 2012 and MS COCO datasets
A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection
How do we learn an object detector that is invariant to occlusions and
deformations? Our current solution is to use a data-driven strategy -- collect
large-scale datasets which have object instances under different conditions.
The hope is that the final classifier can use these examples to learn
invariances. But is it really possible to see all the occlusions in a dataset?
We argue that like categories, occlusions and object deformations also follow a
long-tail. Some occlusions and deformations are so rare that they hardly
happen; yet we want to learn a model invariant to such occurrences. In this
paper, we propose an alternative solution. We propose to learn an adversarial
network that generates examples with occlusions and deformations. The goal of
the adversary is to generate examples that are difficult for the object
detector to classify. In our framework both the original detector and adversary
are learned in a joint manner. Our experimental results indicate a 2.3% mAP
boost on VOC07 and a 2.6% mAP boost on VOC2012 object detection challenge
compared to the Fast-RCNN pipeline. We also release the code for this paper.Comment: CVPR 2017 Camera Read
A Global Context Mechanism for Sequence Labeling
Sequential labeling tasks necessitate the computation of sentence
representations for each word within a given sentence. With the advent of
advanced pretrained language models; one common approach involves incorporating
a BiLSTM layer to bolster the sequence structure information at the output
level. Nevertheless, it has been empirically demonstrated (P.-H. Li et al.,
2020) that the potential of BiLSTM for generating sentence representations for
sequence labeling tasks is constrained, primarily due to the amalgamation of
fragments form past and future sentence representations to form a complete
sentence representation. In this study, we discovered that strategically
integrating the whole sentence representation, which existing in the first cell
and last cell of BiLSTM, into sentence representation of ecah cell, could
markedly enhance the F1 score and accuracy. Using BERT embedded within BiLSTM
as illustration, we conducted exhaustive experiments on nine datasets for
sequence labeling tasks, encompassing named entity recognition (NER), part of
speech (POS) tagging and End-to-End Aspect-Based sentiment analysis (E2E-ABSA).
We noted significant improvements in F1 scores and accuracy across all examined
datasets
Revisiting knowledge transfer for training object class detectors
We propose to revisit knowledge transfer for training object detectors on
target classes from weakly supervised training images, helped by a set of
source classes with bounding-box annotations. We present a unified knowledge
transfer framework based on training a single neural network multi-class object
detector over all source classes, organized in a semantic hierarchy. This
generates proposals with scores at multiple levels in the hierarchy, which we
use to explore knowledge transfer over a broad range of generality, ranging
from class-specific (bicycle to motorbike) to class-generic (objectness to any
class). Experiments on the 200 object classes in the ILSVRC 2013 detection
dataset show that our technique: (1) leads to much better performance on the
target classes (70.3% CorLoc, 36.9% mAP) than a weakly supervised baseline
which uses manually engineered objectness [11] (50.5% CorLoc, 25.4% mAP). (2)
delivers target object detectors reaching 80% of the mAP of their fully
supervised counterparts. (3) outperforms the best reported transfer learning
results on this dataset (+41% CorLoc and +3% mAP over [18, 46], +16.2% mAP over
[32]). Moreover, we also carry out several across-dataset knowledge transfer
experiments [27, 24, 35] and find that (4) our technique outperforms the weakly
supervised baseline in all dataset pairs by 1.5x-1.9x, establishing its general
applicability.Comment: CVPR 1
Single-Shot Refinement Neural Network for Object Detection
For object detection, the two-stage approach (e.g., Faster R-CNN) has been
achieving the highest accuracy, whereas the one-stage approach (e.g., SSD) has
the advantage of high efficiency. To inherit the merits of both while
overcoming their disadvantages, in this paper, we propose a novel single-shot
based detector, called RefineDet, that achieves better accuracy than two-stage
methods and maintains comparable efficiency of one-stage methods. RefineDet
consists of two inter-connected modules, namely, the anchor refinement module
and the object detection module. Specifically, the former aims to (1) filter
out negative anchors to reduce search space for the classifier, and (2)
coarsely adjust the locations and sizes of anchors to provide better
initialization for the subsequent regressor. The latter module takes the
refined anchors as the input from the former to further improve the regression
and predict multi-class label. Meanwhile, we design a transfer connection block
to transfer the features in the anchor refinement module to predict locations,
sizes and class labels of objects in the object detection module. The
multi-task loss function enables us to train the whole network in an end-to-end
way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO
demonstrate that RefineDet achieves state-of-the-art detection accuracy with
high efficiency. Code is available at https://github.com/sfzhang15/RefineDetComment: 14 pages, 7 figures, 7 table