19 research outputs found
Feature Incay for Representation Regularization
Softmax loss is widely used in deep neural networks for multi-class
classification, where each class is represented by a weight vector, a sample is
represented as a feature vector, and the feature vector has the largest
projection on the weight vector of the correct category when the model
correctly classifies a sample. To ensure generalization, weight decay that
shrinks the weight norm is often used as regularizer. Different from
traditional learning algorithms where features are fixed and only weights are
tunable, features are also tunable as representation learning in deep learning.
Thus, we propose feature incay to also regularize representation learning,
which favors feature vectors with large norm when the samples can be correctly
classified. With the feature incay, feature vectors are further pushed away
from the origin along the direction of their corresponding weight vectors,
which achieves better inter-class separability. In addition, the proposed
feature incay encourages intra-class compactness along the directions of weight
vectors by increasing the small feature norm faster than the large ones.
Empirical results on MNIST, CIFAR10 and CIFAR100 demonstrate feature incay can
improve the generalization ability
A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification
Despite the growing popularity of metric learning approaches, very little
work has attempted to perform a fair comparison of these techniques for speaker
verification. We try to fill this gap and compare several metric learning loss
functions in a systematic manner on the VoxCeleb dataset. The first family of
loss functions is derived from the cross entropy loss (usually used for
supervised classification) and includes the congenerous cosine loss, the
additive angular margin loss, and the center loss. The second family of loss
functions focuses on the similarity between training samples and includes the
contrastive loss and the triplet loss. We show that the additive angular margin
loss function outperforms all other loss functions in the study, while learning
more robust representations. Based on a combination of SincNet trainable
features and the x-vector architecture, the network used in this paper brings
us a step closer to a really-end-to-end speaker verification system, when
combined with the additive angular margin loss, while still being competitive
with the x-vector baseline. In the spirit of reproducible research, we also
release open source Python code for reproducing our results, and share
pretrained PyTorch models on torch.hub that can be used either directly or
after fine-tuning
Ensemble Soft-Margin Softmax Loss for Image Classification
Softmax loss is arguably one of the most popular losses to train CNN models
for image classification. However, recent works have exposed its limitation on
feature discriminability. This paper casts a new viewpoint on the weakness of
softmax loss. On the one hand, the CNN features learned using the softmax loss
are often inadequately discriminative. We hence introduce a soft-margin softmax
function to explicitly encourage the discrimination between different classes.
On the other hand, the learned classifier of softmax loss is weak. We propose
to assemble multiple these weak classifiers to a strong one, inspired by the
recognition that the diversity among weak classifiers is critical to a good
ensemble. To achieve the diversity, we adopt the Hilbert-Schmidt Independence
Criterion (HSIC). Considering these two aspects in one framework, we design a
novel loss, named as Ensemble soft-Margin Softmax (EM-Softmax). Extensive
experiments on benchmark datasets are conducted to show the superiority of our
design over the baseline softmax loss and several state-of-the-art
alternatives.Comment: Accepted by IJCAI 201
A Performance Comparison of Loss Functions for Deep Face Recognition
Face recognition is one of the most widely publicized feature in the devices
today and hence represents an important problem that should be studied with the
utmost priority. As per the recent trends, the Convolutional Neural Network
(CNN) based approaches are highly successful in many tasks of Computer Vision
including face recognition. The loss function is used on the top of CNN to
judge the goodness of any network. In this paper, we present a performance
comparison of different loss functions such as Cross-Entropy, Angular Softmax,
Additive-Margin Softmax, ArcFace and Marginal Loss for face recognition. The
experiments are conducted with two CNN architectures namely, ResNet and
MobileNet. Two widely used face datasets namely, CASIA-Webface and MS-Celeb-1M
are used for the training and benchmark Labeled Faces in the Wild (LFW) face
dataset is used for the testing.Comment: Accepted in NCVPRIPG 2019 Conferenc
Anchor-based Nearest Class Mean Loss for Convolutional Neural Networks
Discriminative features are critical for machine learning applications. Most
existing deep learning approaches, however, rely on convolutional neural
networks (CNNs) for learning features, whose discriminant power is not
explicitly enforced. In this paper, we propose a novel approach to train deep
CNNs by imposing the intra-class compactness and the inter-class separability,
so as to enhance the learned features' discriminant power. To this end, we
introduce anchors, which are predefined vectors regarded as the centers for
each class and fixed during training. Discriminative features are obtained by
constraining the deep CNNs to map training samples to the corresponding anchors
as close as possible. We propose two principles to select the anchors, and
measure the proximity of two points using the Euclidean and cosine distance
metric functions, which results in two novel loss functions. These loss
functions require no sample pairs or triplets and can be efficiently optimized
by batch stochastic gradient descent. We test the proposed method on three
benchmark image classification datasets and demonstrate its promising results
NormFace: L2 Hypersphere Embedding for Face Verification
Thanks to the recent developments of Convolutional Neural Networks, the
performance of face verification methods has increased rapidly. In a typical
face verification method, feature normalization is a critical step for boosting
performance. This motivates us to introduce and study the effect of
normalization during training. But we find this is non-trivial, despite
normalization being differentiable. We identify and study four issues related
to normalization through mathematical analysis, which yields understanding and
helps with parameter settings. Based on this analysis we propose two strategies
for training using normalized features. The first is a modification of softmax
loss, which optimizes cosine similarity instead of inner-product. The second is
a reformulation of metric learning by introducing an agent vector for each
class. We show that both strategies, and small variants, consistently improve
performance by between 0.2% to 0.4% on the LFW dataset based on two models.
This is significant because the performance of the two models on LFW dataset is
close to saturation at over 98%. Codes and models are released on
https://github.com/happynear/NormFaceComment: camera-ready versio
P2SGrad: Refined Gradients for Optimizing Deep Face Models
Cosine-based softmax losses significantly improve the performance of deep
face recognition networks. However, these losses always include sensitive
hyper-parameters which can make training process unstable, and it is very
tricky to set suitable hyper parameters for a specific dataset. This paper
addresses this challenge by directly designing the gradients for adaptively
training deep neural networks. We first investigate and unify previous cosine
softmax losses by analyzing their gradients. This unified view inspires us to
propose a novel gradient called P2SGrad (Probability-to-Similarity Gradient),
which leverages a cosine similarity instead of classification probability to
directly update the testing metrics for updating neural network parameters.
P2SGrad is adaptive and hyper-parameter free, which makes the training process
more efficient and faster. We evaluate our P2SGrad on three face recognition
benchmarks, LFW, MegaFace, and IJB-C. The results show that P2SGrad is stable
in training, robust to noise, and achieves state-of-the-art performance on all
the three benchmarks.Comment: Accepted by CVPR 201
AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations
The cosine-based softmax losses and their variants achieve great success in
deep learning based face recognition. However, hyperparameter settings in these
losses have significant influences on the optimization path as well as the
final recognition performance. Manually tuning those hyperparameters heavily
relies on user experience and requires many training tricks. In this paper, we
investigate in depth the effects of two important hyperparameters of
cosine-based softmax losses, the scale parameter and angular margin parameter,
by analyzing how they modulate the predicted classification probability. Based
on these analysis, we propose a novel cosine-based softmax loss, AdaCos, which
is hyperparameter-free and leverages an adaptive scale parameter to
automatically strengthen the training supervisions during the training process.
We apply the proposed AdaCos loss to large-scale face verification and
identification datasets, including LFW, MegaFace, and IJB-C 1:1 Verification.
Our results show that training deep neural networks with the AdaCos loss is
stable and able to achieve high face recognition accuracy. Our method
outperforms state-of-the-art softmax losses on all the three datasets.Comment: CVPR 2019 Ora
Towards Interpretable Face Recognition
Deep CNNs have been pushing the frontier of visual recognition over past
years. Besides recognition accuracy, strong demands in understanding deep CNNs
in the research community motivate developments of tools to dissect pre-trained
models to visualize how they make predictions. Recent works further push the
interpretability in the network learning stage to learn more meaningful
representations. In this work, focusing on a specific area of visual
recognition, we report our efforts towards interpretable face recognition. We
propose a spatial activation diversity loss to learn more structured face
representations. By leveraging the structure, we further design a feature
activation diversity loss to push the interpretable representations to be
discriminative and robust to occlusions. We demonstrate on three face
recognition benchmarks that our proposed method is able to improve face
recognition accuracy with easily interpretable face representations.Comment: 10 pages, 9 figures, 6 tables, To appear in ICCV 2019 as an oral
pape
Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation
Speech translation (ST) aims to learn transformations from speech in the
source language to the text in the target language. Previous works show that
multitask learning improves the ST performance, in which the recognition
decoder generates the text of the source language, and the translation decoder
obtains the final translations based on the output of the recognition decoder.
Because whether the output of the recognition decoder has the correct semantics
is more critical than its accuracy, we propose to improve the multitask ST
model by utilizing word embedding as the intermediate.Comment: Accepted by ACL 202