2,182 research outputs found
Cascade R-CNN: Delving into High Quality Object Detection
In object detection, an intersection over union (IoU) threshold is required
to define positives and negatives. An object detector, trained with low IoU
threshold, e.g. 0.5, usually produces noisy detections. However, detection
performance tends to degrade with increasing the IoU thresholds. Two main
factors are responsible for this: 1) overfitting during training, due to
exponentially vanishing positive samples, and 2) inference-time mismatch
between the IoUs for which the detector is optimal and those of the input
hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, is
proposed to address these problems. It consists of a sequence of detectors
trained with increasing IoU thresholds, to be sequentially more selective
against close false positives. The detectors are trained stage by stage,
leveraging the observation that the output of a detector is a good distribution
for training the next higher quality detector. The resampling of progressively
improved hypotheses guarantees that all detectors have a positive set of
examples of equivalent size, reducing the overfitting problem. The same cascade
procedure is applied at inference, enabling a closer match between the
hypotheses and the detector quality of each stage. A simple implementation of
the Cascade R-CNN is shown to surpass all single-model object detectors on the
challenging COCO dataset. Experiments also show that the Cascade R-CNN is
widely applicable across detector architectures, achieving consistent gains
independently of the baseline detector strength. The code will be made
available at https://github.com/zhaoweicai/cascade-rcnn
Accurate Single Stage Detector Using Recurrent Rolling Convolution
Most of the recent successful methods in accurate object detection and
localization used some variants of R-CNN style two stage Convolutional Neural
Networks (CNN) where plausible regions were proposed in the first stage then
followed by a second stage for decision refinement. Despite the simplicity of
training and the efficiency in deployment, the single stage detection methods
have not been as competitive when evaluated in benchmarks consider mAP for high
IoU thresholds. In this paper, we proposed a novel single stage end-to-end
trainable object detection network to overcome this limitation. We achieved
this by introducing Recurrent Rolling Convolution (RRC) architecture over
multi-scale feature maps to construct object classifiers and bounding box
regressors which are "deep in context". We evaluated our method in the
challenging KITTI dataset which measures methods under IoU threshold of 0.7. We
showed that with RRC, a single reduced VGG-16 based model already significantly
outperformed all the previously published results. At the time this paper was
written our models ranked the first in KITTI car detection (the hard level),
the first in cyclist detection and the second in pedestrian detection. These
results were not reached by the previous single stage methods. The code is
publicly available.Comment: CVPR 201
- …