532 research outputs found
Grid Loss: Detecting Occluded Faces
Detection of partially occluded objects is a challenging computer vision
problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of
the detection window are occluded, since not every sub-part of the window is
discriminative on its own. To address this issue, we propose a novel loss layer
for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a
convolution layer independently rather than over the whole feature map. This
results in parts being more discriminative on their own, enabling the detector
to recover if the detection window is partially occluded. By mapping our loss
layer back to a regular fully connected layer, no additional computational cost
is incurred at runtime compared to standard CNNs. We demonstrate our method for
face detection on several public face detection benchmarks and show that our
method outperforms regular CNNs, is suitable for realtime applications and
achieves state-of-the-art performance.Comment: accepted to ECCV 201
Holistic Guidance for Occluded Person Re-Identification
In real-world video surveillance applications, person re-identification
(ReID) suffers from the effects of occlusions and detection errors. Despite
recent advances, occlusions continue to corrupt the features extracted by
state-of-art CNN backbones, and thereby deteriorate the accuracy of ReID
systems. To address this issue, methods in the literature use an additional
costly process such as pose estimation, where pose maps provide supervision to
exclude occluded regions. In contrast, we introduce a novel Holistic Guidance
(HG) method that relies only on person identity labels, and on the distribution
of pairwise matching distances of datasets to alleviate the problem of
occlusion, without requiring additional supervision. Hence, our proposed
student-teacher framework is trained to address the occlusion problem by
matching the distributions of between- and within-class distances (DCDs) of
occluded samples with that of holistic (non-occluded) samples, thereby using
the latter as a soft labeled reference to learn well separated DCDs. This
approach is supported by our empirical study where the distribution of between-
and within-class distances between images have more overlap in occluded than
holistic datasets. In particular, features extracted from both datasets are
jointly learned using the student model to produce an attention map that allows
separating visible regions from occluded ones. In addition to this, a joint
generative-discriminative backbone is trained with a denoising autoencoder,
allowing the system to self-recover from occlusions. Extensive experiments on
several challenging public datasets indicate that the proposed approach can
outperform state-of-the-art methods on both occluded and holistic datasetsComment: 10 page
- …