29,473 research outputs found
Occlusion Coherence: Detecting and Localizing Occluded Faces
The presence of occluders significantly impacts object recognition accuracy.
However, occlusion is typically treated as an unstructured source of noise and
explicit models for occluders have lagged behind those for object appearance
and shape. In this paper we describe a hierarchical deformable part model for
face detection and landmark localization that explicitly models part occlusion.
The proposed model structure makes it possible to augment positive training
data with large numbers of synthetically occluded instances. This allows us to
easily incorporate the statistics of occlusion patterns in a discriminatively
trained model. We test the model on several benchmarks for landmark
localization and detection including challenging new data sets featuring
significant occlusion. We find that the addition of an explicit occlusion model
yields a detection system that outperforms existing approaches for occluded
instances while maintaining competitive accuracy in detection and landmark
localization for unoccluded instances
An Analysis of Scale Invariance in Object Detection - SNIP
An analysis of different techniques for recognizing and detecting objects
under extreme scale variation is presented. Scale specific and scale invariant
design of detectors are compared by training them with different configurations
of input data. By evaluating the performance of different network architectures
for classifying small objects on ImageNet, we show that CNNs are not robust to
changes in scale. Based on this analysis, we propose to train and test
detectors on the same scales of an image-pyramid. Since small and large objects
are difficult to recognize at smaller and larger scales respectively, we
present a novel training scheme called Scale Normalization for Image Pyramids
(SNIP) which selectively back-propagates the gradients of object instances of
different sizes as a function of the image scale. On the COCO dataset, our
single model performance is 45.7% and an ensemble of 3 networks obtains an mAP
of 48.3%. We use off-the-shelf ImageNet-1000 pre-trained models and only train
with bounding box supervision. Our submission won the Best Student Entry in the
COCO 2017 challenge. Code will be made available at
\url{http://bit.ly/2yXVg4c}.Comment: CVPR 2018, camera ready versio
- …