10 research outputs found
Ensemble Soft-Margin Softmax Loss for Image Classification
Softmax loss is arguably one of the most popular losses to train CNN models
for image classification. However, recent works have exposed its limitation on
feature discriminability. This paper casts a new viewpoint on the weakness of
softmax loss. On the one hand, the CNN features learned using the softmax loss
are often inadequately discriminative. We hence introduce a soft-margin softmax
function to explicitly encourage the discrimination between different classes.
On the other hand, the learned classifier of softmax loss is weak. We propose
to assemble multiple these weak classifiers to a strong one, inspired by the
recognition that the diversity among weak classifiers is critical to a good
ensemble. To achieve the diversity, we adopt the Hilbert-Schmidt Independence
Criterion (HSIC). Considering these two aspects in one framework, we design a
novel loss, named as Ensemble soft-Margin Softmax (EM-Softmax). Extensive
experiments on benchmark datasets are conducted to show the superiority of our
design over the baseline softmax loss and several state-of-the-art
alternatives.Comment: Accepted by IJCAI 201
Skeleton-Based Action Recognition with Synchronous Local and Non-local Spatio-temporal Learning and Frequency Attention
Benefiting from its succinctness and robustness, skeleton-based action
recognition has recently attracted much attention. Most existing methods
utilize local networks (e.g., recurrent, convolutional, and graph convolutional
networks) to extract spatio-temporal dynamics hierarchically. As a consequence,
the local and non-local dependencies, which contain more details and semantics
respectively, are asynchronously captured in different level of layers.
Moreover, existing methods are limited to the spatio-temporal domain and ignore
information in the frequency domain. To better extract synchronous detailed and
semantic information from multi-domains, we propose a residual frequency
attention (rFA) block to focus on discriminative patterns in the frequency
domain, and a synchronous local and non-local (SLnL) block to simultaneously
capture the details and semantics in the spatio-temporal domain. Besides, a
soft-margin focal loss (SMFL) is proposed to optimize the learning whole
process, which automatically conducts data selection and encourages intrinsic
margins in classifiers. Our approach significantly outperforms other
state-of-the-art methods on several large-scale datasets.Comment: 6 pages,4 figures; accepted to ICME 201
A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing
Face anti-spoofing is essential to prevent face recognition systems from a
security breach. Much of the progresses have been made by the availability of
face anti-spoofing benchmark datasets in recent years. However, existing face
anti-spoofing benchmarks have limited number of subjects ()
and modalities (), which hinder the further development of
the academic community. To facilitate face anti-spoofing research, we introduce
a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest
publicly available dataset for face anti-spoofing in terms of both subjects and
visual modalities. Specifically, it consists of subjects with
videos and each sample has modalities (i.e., RGB, Depth and IR). We also
provide a measurement set, evaluation protocol and training/validation/testing
subsets, developing a new benchmark for face anti-spoofing. Moreover, we
present a new multi-modal fusion method as baseline, which performs feature
re-weighting to select the more informative channel features while suppressing
the less useful ones for each modal. Extensive experiments have been conducted
on the proposed dataset to verify its significance and generalization
capability. The dataset is available at
https://sites.google.com/qq.com/chalearnfacespoofingattackdeteComment: CVPR2019 Camera Read
Decision Propagation Networks for Image Classification
High-level (e.g., semantic) features encoded in the latter layers of
convolutional neural networks are extensively exploited for image
classification, leaving low-level (e.g., color) features in the early layers
underexplored. In this paper, we propose a novel Decision Propagation Module
(DPM) to make an intermediate decision that could act as category-coherent
guidance extracted from early layers, and then propagate it to the latter
layers. Therefore, by stacking a collection of DPMs into a classification
network, the generated Decision Propagation Network is explicitly formulated as
to progressively encode more discriminative features guided by the decision,
and then refine the decision based on the new generated features layer by
layer. Comprehensive results on four publicly available datasets validate DPM
could bring significant improvements for existing classification networks with
minimal additional computational cost and is superior to the state-of-the-art
methods
Implicit Semantic Data Augmentation for Deep Networks
In this paper, we propose a novel implicit semantic data augmentation (ISDA)
approach to complement traditional augmentation techniques like flipping,
translation or rotation. Our work is motivated by the intriguing property that
deep networks are surprisingly good at linearizing features, such that certain
directions in the deep feature space correspond to meaningful semantic
transformations, e.g., adding sunglasses or changing backgrounds. As a
consequence, translating training samples along many semantic directions in the
feature space can effectively augment the dataset to improve generalization. To
implement this idea effectively and efficiently, we first perform an online
estimate of the covariance matrix of deep features for each class, which
captures the intra-class semantic variations. Then random vectors are drawn
from a zero-mean normal distribution with the estimated covariance to augment
the training data in that class. Importantly, instead of augmenting the samples
explicitly, we can directly minimize an upper bound of the expected
cross-entropy (CE) loss on the augmented training set, leading to a highly
efficient algorithm. In fact, we show that the proposed ISDA amounts to
minimizing a novel robust CE loss, which adds negligible extra computational
cost to a normal training procedure. Although being simple, ISDA consistently
improves the generalization performance of popular deep models (ResNets and
DenseNets) on a variety of datasets, e.g., CIFAR-10, CIFAR-100 and ImageNet.
Code for reproducing our results is available at
https://github.com/blackfeather-wang/ISDA-for-Deep-Networks.Comment: Accepted by NeurIPS 201
Mis-classified Vector Guided Softmax Loss for Face Recognition
Face recognition has witnessed significant progress due to the advances of
deep convolutional neural networks (CNNs), the central task of which is how to
improve the feature discrimination. To this end, several margin-based
(\textit{e.g.}, angular, additive and additive angular margins) softmax loss
functions have been proposed to increase the feature margin between different
classes. However, despite great achievements have been made, they mainly suffer
from three issues: 1) Obviously, they ignore the importance of informative
features mining for discriminative learning; 2) They encourage the feature
margin only from the ground truth class, without realizing the discriminability
from other non-ground truth classes; 3) The feature margin between different
classes is set to be same and fixed, which may not adapt the situations very
well. To cope with these issues, this paper develops a novel loss function,
which adaptively emphasizes the mis-classified feature vectors to guide the
discriminative feature learning. Thus we can address all the above issues and
achieve more discriminative face features. To the best of our knowledge, this
is the first attempt to inherit the advantages of feature margin and feature
mining into a unified loss function. Experimental results on several benchmarks
have demonstrated the effectiveness of our method over state-of-the-art
alternatives.Comment: Accepted by AAAI2020 as Oral presentation. arXiv admin note:
substantial text overlap with arXiv:1812.1131
Support Vector Guided Softmax Loss for Face Recognition
Face recognition has witnessed significant progresses due to the advances of
deep convolutional neural networks (CNNs), the central challenge of which, is
feature discrimination. To address it, one group tries to exploit mining-based
strategies (\textit{e.g.}, hard example mining and focal loss) to focus on the
informative examples. The other group devotes to designing margin-based loss
functions (\textit{e.g.}, angular, additive and additive angular margins) to
increase the feature margin from the perspective of ground truth class. Both of
them have been well-verified to learn discriminative features. However, they
suffer from either the ambiguity of hard examples or the lack of discriminative
power of other classes. In this paper, we design a novel loss function, namely
support vector guided softmax loss (SV-Softmax), which adaptively emphasizes
the mis-classified points (support vectors) to guide the discriminative
features learning. So the developed SV-Softmax loss is able to eliminate the
ambiguity of hard examples as well as absorb the discriminative power of other
classes, and thus results in more discrimiantive features. To the best of our
knowledge, this is the first attempt to inherit the advantages of mining-based
and margin-based losses into one framework. Experimental results on several
benchmarks have demonstrated the effectiveness of our approach over
state-of-the-arts
Recent Advances in Large Margin Learning
This paper serves as a survey of recent advances in large margin training and
its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs)
that are probably the most prominent machine learning models for large-scale
data in the community over the past decade. We generalize the formulation of
classification margins from classical research to latest DNNs, summarize
theoretical connections between the margin, network generalization, and
robustness, and introduce recent efforts in enlarging the margins for DNNs
comprehensively. Since the viewpoint of different methods is discrepant, we
categorize them into groups for ease of comparison and discussion in the paper.
Hopefully, our discussions and overview inspire new research work in the
community that aim to improve the performance of DNNs, and we also point to
directions where the large margin principle can be verified to provide
theoretical evidence why certain regularizations for DNNs function well in
practice. We managed to shorten the paper such that the crucial spirit of large
margin learning and related methods are better emphasized.Comment: 8 pages, 3 figure
CASIA-SURF: A Large-scale Multi-modal Benchmark for Face Anti-spoofing
Face anti-spoofing is essential to prevent face recognition systems from a
security breach. Much of the progresses have been made by the availability of
face anti-spoofing benchmark datasets in recent years. However, existing face
anti-spoofing benchmarks have limited number of subjects ()
and modalities (), which hinder the further development of
the academic community. To facilitate face anti-spoofing research, we introduce
a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest
publicly available dataset for face anti-spoofing in terms of both subjects and
modalities. Specifically, it consists of subjects with videos
and each sample has modalities (i.e., RGB, Depth and IR). We also provide
comprehensive evaluation metrics, diverse evaluation protocols,
training/validation/testing subsets and a measurement tool, developing a new
benchmark for face anti-spoofing. Moreover, we present a novel multi-modal
multi-scale fusion method as a strong baseline, which performs feature
re-weighting to select the more informative channel features while suppressing
the less useful ones for each modality across different scales. Extensive
experiments have been conducted on the proposed dataset to verify its
significance and generalization capability. The dataset is available at
https://sites.google.com/qq.com/face-anti-spoofing/welcome/challengecvpr2019?authuser=0Comment: Accepted by TBIOM; Journal extension of our previous conference
paper: arXiv:1812.0040
Adversarial Margin Maximization Networks
The tremendous recent success of deep neural networks (DNNs) has sparked a
surge of interest in understanding their predictive ability. Unlike the human
visual system which is able to generalize robustly and learn with little
supervision, DNNs normally require a massive amount of data to learn new
concepts. In addition, research works also show that DNNs are vulnerable to
adversarial examples-maliciously generated images which seem perceptually
similar to the natural ones but are actually formed to fool learning models,
which means the models have problem generalizing to unseen data with certain
type of distortions. In this paper, we analyze the generalization ability of
DNNs comprehensively and attempt to improve it from a geometric point of view.
We propose adversarial margin maximization (AMM), a learning-based
regularization which exploits an adversarial perturbation as a proxy. It
encourages a large margin in the input space, just like the support vector
machines. With a differentiable formulation of the perturbation, we train the
regularized DNNs simply through back-propagation in an end-to-end manner.
Experimental results on various datasets (including MNIST, CIFAR-10/100, SVHN
and ImageNet) and different DNN architectures demonstrate the superiority of
our method over previous state-of-the-arts. Code and models for reproducing our
results will be made publicly available.Comment: 11 pages + 1 page appendix, accepted by T-PAM