422 research outputs found
Mis-classified Vector Guided Softmax Loss for Face Recognition
Face recognition has witnessed significant progress due to the advances of
deep convolutional neural networks (CNNs), the central task of which is how to
improve the feature discrimination. To this end, several margin-based
(\textit{e.g.}, angular, additive and additive angular margins) softmax loss
functions have been proposed to increase the feature margin between different
classes. However, despite great achievements have been made, they mainly suffer
from three issues: 1) Obviously, they ignore the importance of informative
features mining for discriminative learning; 2) They encourage the feature
margin only from the ground truth class, without realizing the discriminability
from other non-ground truth classes; 3) The feature margin between different
classes is set to be same and fixed, which may not adapt the situations very
well. To cope with these issues, this paper develops a novel loss function,
which adaptively emphasizes the mis-classified feature vectors to guide the
discriminative feature learning. Thus we can address all the above issues and
achieve more discriminative face features. To the best of our knowledge, this
is the first attempt to inherit the advantages of feature margin and feature
mining into a unified loss function. Experimental results on several benchmarks
have demonstrated the effectiveness of our method over state-of-the-art
alternatives.Comment: Accepted by AAAI2020 as Oral presentation. arXiv admin note:
substantial text overlap with arXiv:1812.1131
Deep Networks with Internal Selective Attention through Feedback Connections
Traditional convolutional neural networks (CNN) are stationary and
feedforward. They neither change their parameters during evaluation nor use
feedback from higher to lower layers. Real brains, however, do. So does our
Deep Attention Selective Network (dasNet) architecture. DasNets feedback
structure can dynamically alter its convolutional filter sensitivities during
classification. It harnesses the power of sequential processing to improve
classification performance, by allowing the network to iteratively focus its
internal attention on some of its convolutional filters. Feedback is trained
through direct policy search in a huge million-dimensional parameter space,
through scalable natural evolution strategies (SNES). On the CIFAR-10 and
CIFAR-100 datasets, dasNet outperforms the previous state-of-the-art model.Comment: 13 pages, 3 figure
Multi-Scale Body-Part Mask Guided Attention for Person Re-identification
Person re-identification becomes a more and more important task due to its
wide applications. In practice, person re-identification still remains
challenging due to the variation of person pose, different lighting, occlusion,
misalignment, background clutter, etc. In this paper, we propose a multi-scale
body-part mask guided attention network (MMGA), which jointly learns whole-body
and part body attention to help extract global and local features
simultaneously. In MMGA, body-part masks are used to guide the training of
corresponding attention. Experiments show that our proposed method can reduce
the negative influence of variation of person pose, misalignment and background
clutter. Our method achieves rank-1/mAP of 95.0%/87.2% on the Market1501
dataset, 89.5%/78.1% on the DukeMTMC-reID dataset, outperforming current
state-of-the-art methods
Adversarial Attacks and Defences: A Survey
Deep learning has emerged as a strong and efficient framework that can be
applied to a broad spectrum of complex learning problems which were difficult
to solve using the traditional machine learning techniques in the past. In the
last few years, deep learning has advanced radically in such a way that it can
surpass human-level performance on a number of tasks. As a consequence, deep
learning is being extensively used in most of the recent day-to-day
applications. However, security of deep learning systems are vulnerable to
crafted adversarial examples, which may be imperceptible to the human eye, but
can lead the model to misclassify the output. In recent times, different types
of adversaries based on their threat model leverage these vulnerabilities to
compromise a deep learning system where adversaries have high incentives.
Hence, it is extremely important to provide robustness to deep learning
algorithms against these adversaries. However, there are only a few strong
countermeasures which can be used in all types of attack scenarios to design a
robust deep learning system. In this paper, we attempt to provide a detailed
discussion on different types of adversarial attacks with various threat models
and also elaborate the efficiency and challenges of recent countermeasures
against them
BoundaryFace: A mining framework with noise label self-correction for Face Recognition
Face recognition has made tremendous progress in recent years due to the
advances in loss functions and the explosive growth in training sets size. A
properly designed loss is seen as key to extract discriminative features for
classification. Several margin-based losses have been proposed as alternatives
of softmax loss in face recognition. However, two issues remain to consider: 1)
They overlook the importance of hard sample mining for discriminative learning.
2) Label noise ubiquitously exists in large-scale datasets, which can seriously
damage the model's performance. In this paper, starting from the perspective of
decision boundary, we propose a novel mining framework that focuses on the
relationship between a sample's ground truth class center and its nearest
negative class center. Specifically, a closed-set noise label self-correction
module is put forward, making this framework work well on datasets containing a
lot of label noise. The proposed method consistently outperforms SOTA methods
in various face recognition benchmarks. Training code has been released at
https://github.com/SWJTU-3DVision/BoundaryFace.Comment: ECCV 2022. Code available at
https://github.com/SWJTU-3DVision/BoundaryFac
CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition
As an emerging topic in face recognition, designing margin-based loss
functions can increase the feature margin between different classes for
enhanced discriminability. More recently, the idea of mining-based strategies
is adopted to emphasize the misclassified samples, achieving promising results.
However, during the entire training process, the prior methods either do not
explicitly emphasize the sample based on its importance that renders the hard
samples not fully exploited; or explicitly emphasize the effects of
semi-hard/hard samples even at the early training stage that may lead to
convergence issue. In this work, we propose a novel Adaptive Curriculum
Learning loss (CurricularFace) that embeds the idea of curriculum learning into
the loss function to achieve a novel training strategy for deep face
recognition, which mainly addresses easy samples in the early training stage
and hard ones in the later stage. Specifically, our CurricularFace adaptively
adjusts the relative importance of easy and hard samples during different
training stages. In each stage, different samples are assigned with different
importance according to their corresponding difficultness. Extensive
experimental results on popular benchmarks demonstrate the superiority of our
CurricularFace over the state-of-the-art competitors.Comment: CVPR 202
Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling
This paper proposes a new method called Multimodal RNNs for RGB-D scene
semantic segmentation. It is optimized to classify image pixels given two input
sources: RGB color channels and Depth maps. It simultaneously performs training
of two recurrent neural networks (RNNs) that are crossly connected through
information transfer layers, which are learnt to adaptively extract relevant
cross-modality features. Each RNN model learns its representations from its own
previous hidden states and transferred patterns from the other RNNs previous
hidden states; thus, both model-specific and crossmodality features are
retained. We exploit the structure of quad-directional 2D-RNNs to model the
short and long range contextual information in the 2D input image. We carefully
designed various baselines to efficiently examine our proposed model structure.
We test our Multimodal RNNs method on popular RGB-D benchmarks and show how it
outperforms previous methods significantly and achieves competitive results
with other state-of-the-art works.Comment: 15 pages, 13 figures, IEEE TMM 201
Learning from Higher-Layer Feature Visualizations
Driven by the goal to enable sleep apnea monitoring and machine
learning-based detection at home with small mobile devices, we investigate
whether interpretation-based indirect knowledge transfer can be used to create
classifiers with acceptable performance. Interpretation-based indirect
knowledge transfer means that a classifier (student) learns from a synthetic
dataset based on the knowledge representation from an already trained Deep
Network (teacher). We use activation maximization to generate visualizations
and create a synthetic dataset to train the student classifier. This approach
has the advantage that student classifiers can be trained without access to the
original training data. With experiments we investigate the feasibility of
interpretation-based indirect knowledge transfer and its limitations. The
student achieves an accuracy of 97.8% on MNIST (teacher accuracy: 99.3%) with a
similar smaller architecture to that of the teacher. The student classifier
achieves an accuracy of 86.1% and 89.5% for a subset of the Apnea-ECG dataset
(teacher: 89.5% and 91.1%, respectively)
Deep Boosting Multi-Modal Ensemble Face Recognition with Sample-Level Weighting
Deep convolutional neural networks have achieved remarkable success in face
recognition (FR), partly due to the abundant data availability. However, the
current training benchmarks exhibit an imbalanced quality distribution; most
images are of high quality. This poses issues for generalization on hard
samples since they are underrepresented during training. In this work, we
employ the multi-model boosting technique to deal with this issue. Inspired by
the well-known AdaBoost, we propose a sample-level weighting approach to
incorporate the importance of different samples into the FR loss. Individual
models of the proposed framework are experts at distinct levels of sample
hardness. Therefore, the combination of models leads to a robust feature
extractor without losing the discriminability on the easy samples. Also, for
incorporating the sample hardness into the training criterion, we analytically
show the effect of sample mining on the important aspects of current angular
margin loss functions, i.e., margin and scale. The proposed method shows
superior performance in comparison with the state-of-the-art algorithms in
extensive experiments on the CFP-FP, LFW, CPLFW, CALFW, AgeDB, TinyFace, IJB-B,
and IJB-C evaluation datasets.Comment: 2023 IEEE International Joint Conference on Biometrics (IJCB
Cartoon Face Recognition: A Benchmark Dataset
Recent years have witnessed increasing attention in cartoon media, powered by
the strong demands of industrial applications. As the first step to understand
this media, cartoon face recognition is a crucial but less-explored task with
few datasets proposed. In this work, we first present a new challenging
benchmark dataset, consisting of 389,678 images of 5,013 cartoon characters
annotated with identity, bounding box, pose, and other auxiliary attributes.
The dataset, named iCartoonFace, is currently the largest-scale, high-quality,
richannotated, and spanning multiple occurrences in the field of image
recognition, including near-duplications, occlusions, and appearance changes.
In addition, we provide two types of annotations for cartoon media, i.e., face
recognition, and face detection, with the help of a semi-automatic labeling
algorithm. To further investigate this challenging dataset, we propose a
multi-task domain adaptation approach that jointly utilizes the human and
cartoon domain knowledge with three discriminative regularizations. We hence
perform a benchmark analysis of the proposed dataset and verify the superiority
of the proposed approach in the cartoon face recognition task. We believe this
public availability will attract more research attention in broad practical
application scenarios.Comment: 9 papers, 6 figure
- …