6 research outputs found
Cascade Region Proposal and Global Context for Deep Object Detection
Deep region-based object detector consists of a region proposal step and a
deep object recognition step. In this paper, we make significant improvements
on both of the two steps. For region proposal we propose a novel lightweight
cascade structure which can effectively improve RPN proposal quality. For
object recognition we re-implement global context modeling with a few
modications and obtain a performance boost (4.2% mAP gain on the ILSVRC 2016
validation set). Besides, we apply the idea of pre-training extensively and
show its importance in both steps. Together with common training and testing
tricks, we improve Faster R-CNN baseline by a large margin. In particular, we
obtain 87.9% mAP on the PASCAL VOC 2012 test set, 65.3% on the ILSVRC 2016 test
set and 36.8% on the COCO test-std set.Comment: Preprint to appear in Neurocomputin
Rethinking Classification and Localization for Cascade R-CNN
We extend the state-of-the-art Cascade R-CNN with a simple feature sharing
mechanism. Our approach focuses on the performance increases on high IoU but
decreases on low IoU thresholds--a key problem this detector suffers from.
Feature sharing is extremely helpful, our results show that given this
mechanism embedded into all stages, we can easily narrow the gap between the
last stage and preceding stages on low IoU thresholds without resorting to the
commonly used testing ensemble but the network itself. We also observe obvious
improvements on all IoU thresholds benefited from feature sharing, and the
resulting cascade structure can easily match or exceed its counterparts, only
with negligible extra parameters introduced. To push the envelope, we
demonstrate 43.2 AP on COCO object detection without any bells and whistles
including testing ensemble, surpassing previous Cascade R-CNN by a large
margin. Our framework is easy to implement and we hope it can serve as a
general and strong baseline for future research.Comment: BMVC 2019 Camera Read
Modulating Localization and Classification for Harmonized Object Detection
Object detection involves two sub-tasks, i.e. localizing objects in an image
and classifying them into various categories. For existing CNN-based detectors,
we notice the widespread divergence between localization and classification,
which leads to degradation in performance. In this work, we propose a mutual
learning framework to modulate the two tasks. In particular, the two tasks are
forced to learn from each other with a novel mutual labeling strategy. Besides,
we introduce a simple yet effective IoU rescoring scheme, which further reduces
the divergence. Moreover, we define a Spearman rank correlation-based metric to
quantify the divergence, which correlates well with the detection performance.
The proposed approach is general-purpose and can be easily injected into
existing detectors such as FCOS and RetinaNet. We achieve a significant
performance gain over the baseline detectors on the COCO dataset.Comment: Accepted by ICME 202
Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection
Recent researches attempt to improve the detection performance by adopting
the idea of cascade for single-stage detectors. In this paper, we analyze and
discover that inconsistency is the major factor limiting the performance. The
refined anchors are associated with the feature extracted from the previous
location and the classifier is confused by misaligned classification and
localization. Further, we point out two main designing rules for the cascade
manner: improving consistency between classification confidence and
localization performance, and maintaining feature consistency between different
stages. A multistage object detector named Cas-RetinaNet, is then proposed for
reducing the misalignments. It consists of sequential stages trained with
increasing IoU thresholds for improving the correlation, and a novel Feature
Consistency Module for mitigating the feature inconsistency. Experiments show
that our proposed Cas-RetinaNet achieves stable performance gains across
different models and input scales. Specifically, our method improves RetinaNet
from 39.1 AP to 41.1 AP on the challenging MS COCO dataset without any bells or
whistles.Comment: BMVC 201
Region Proposal by Guided Anchoring
Region anchors are the cornerstone of modern object detection techniques.
State-of-the-art detectors mostly rely on a dense anchoring scheme, where
anchors are sampled uniformly over the spatial domain with a predefined set of
scales and aspect ratios. In this paper, we revisit this foundational stage.
Our study shows that it can be done much more effectively and efficiently.
Specifically, we present an alternative scheme, named Guided Anchoring, which
leverages semantic features to guide the anchoring. The proposed method jointly
predicts the locations where the center of objects of interest are likely to
exist as well as the scales and aspect ratios at different locations. On top of
predicted anchor shapes, we mitigate the feature inconsistency with a feature
adaption module. We also study the use of high-quality proposals to improve
detection performance. The anchoring scheme can be seamlessly integrated into
proposal methods and detectors. With Guided Anchoring, we achieve 9.1% higher
recall on MS COCO with 90% fewer anchors than the RPN baseline. We also adopt
Guided Anchoring in Fast R-CNN, Faster R-CNN and RetinaNet, respectively
improving the detection mAP by 2.2%, 2.7% and 1.2%. Code will be available at
https://github.com/open-mmlab/mmdetection.Comment: CVPR 2019 camera read
Detection in Crowded Scenes: One Proposal, Multiple Predictions
We propose a simple yet effective proposal-based object detector, aiming at
detecting highly-overlapped instances in crowded scenes. The key of our
approach is to let each proposal predict a set of correlated instances rather
than a single one in previous proposal-based frameworks. Equipped with new
techniques such as EMD Loss and Set NMS, our detector can effectively handle
the difficulty of detecting highly overlapped objects. On a FPN-Res50 baseline,
our detector can obtain 4.9\% AP gains on challenging CrowdHuman dataset and
1.0\% improvements on CityPersons dataset, without bells and
whistles. Moreover, on less crowed datasets like COCO, our approach can still
achieve moderate improvement, suggesting the proposed method is robust to
crowdedness. Code and pre-trained models will be released at
https://github.com/megvii-model/CrowdDetection.Comment: CVPR 2020 Ora