785 research outputs found
Loss Guided Activation for Action Recognition in Still Images
One significant problem of deep-learning based human action recognition is
that it can be easily misled by the presence of irrelevant objects or
backgrounds. Existing methods commonly address this problem by employing
bounding boxes on the target humans as part of the input, in both training and
testing stages. This requirement of bounding boxes as part of the input is
needed to enable the methods to ignore irrelevant contexts and extract only
human features. However, we consider this solution is inefficient, since the
bounding boxes might not be available. Hence, instead of using a person
bounding box as an input, we introduce a human-mask loss to automatically guide
the activations of the feature maps to the target human who is performing the
action, and hence suppress the activations of misleading contexts. We propose a
multi-task deep learning method that jointly predicts the human action class
and human location heatmap. Extensive experiments demonstrate our approach is
more robust compared to the baseline methods under the presence of irrelevant
misleading contexts. Our method achieves 94.06\% and 40.65\% (in terms of mAP)
on Stanford40 and MPII dataset respectively, which are 3.14\% and 12.6\%
relative improvements over the best results reported in the literature, and
thus set new state-of-the-art results. Additionally, unlike some existing
methods, we eliminate the requirement of using a person bounding box as an
input during testing.Comment: Accepted to appear in ACCV 201
Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN
Discriminative localization is essential for fine-grained image
classification task, which devotes to recognizing hundreds of subcategories in
the same basic-level category. Reflecting on discriminative regions of objects,
key differences among different subcategories are subtle and local. Existing
methods generally adopt a two-stage learning framework: The first stage is to
localize the discriminative regions of objects, and the second is to encode the
discriminative features for training classifiers. However, these methods
generally have two limitations: (1) Separation of the two-stage learning is
time-consuming. (2) Dependence on object and parts annotations for
discriminative localization learning leads to heavily labor-consuming labeling.
It is highly challenging to address these two important limitations
simultaneously. Existing methods only focus on one of them. Therefore, this
paper proposes the discriminative localization approach via saliency-guided
Faster R-CNN to address the above two limitations at the same time, and our
main novelties and advantages are: (1) End-to-end network based on Faster R-CNN
is designed to simultaneously localize discriminative regions and encode
discriminative features, which accelerates classification speed. (2)
Saliency-guided localization learning is proposed to localize the
discriminative region automatically, avoiding labor-consuming labeling. Both
are jointly employed to simultaneously accelerate classification speed and
eliminate dependence on object and parts annotations. Comparing with the
state-of-the-art methods on the widely-used CUB-200-2011 dataset, our approach
achieves both the best classification accuracy and efficiency.Comment: 9 pages, to appear in ACM MM 201
BoundaryFace: A mining framework with noise label self-correction for Face Recognition
Face recognition has made tremendous progress in recent years due to the
advances in loss functions and the explosive growth in training sets size. A
properly designed loss is seen as key to extract discriminative features for
classification. Several margin-based losses have been proposed as alternatives
of softmax loss in face recognition. However, two issues remain to consider: 1)
They overlook the importance of hard sample mining for discriminative learning.
2) Label noise ubiquitously exists in large-scale datasets, which can seriously
damage the model's performance. In this paper, starting from the perspective of
decision boundary, we propose a novel mining framework that focuses on the
relationship between a sample's ground truth class center and its nearest
negative class center. Specifically, a closed-set noise label self-correction
module is put forward, making this framework work well on datasets containing a
lot of label noise. The proposed method consistently outperforms SOTA methods
in various face recognition benchmarks. Training code has been released at
https://github.com/SWJTU-3DVision/BoundaryFace.Comment: ECCV 2022. Code available at
https://github.com/SWJTU-3DVision/BoundaryFac
- …