71 research outputs found
Virtual Class Enhanced Discriminative Embedding Learning
Recently, learning discriminative features to improve the recognition
performances gradually becomes the primary goal of deep learning, and numerous
remarkable works have emerged. In this paper, we propose a novel yet extremely
simple method \textbf{Virtual Softmax} to enhance the discriminative property
of learned features by injecting a dynamic virtual negative class into the
original softmax. Injecting virtual class aims to enlarge inter-class margin
and compress intra-class distribution by strengthening the decision boundary
constraint. Although it seems weird to optimize with this additional virtual
class, we show that our method derives from an intuitive and clear motivation,
and it indeed encourages the features to be more compact and separable. This
paper empirically and experimentally demonstrates the superiority of Virtual
Softmax, improving the performances on a variety of object classification and
face verification tasks.Comment: NeurIPS 201
SphereFace: Deep Hypersphere Embedding for Face Recognition
This paper addresses deep face recognition (FR) problem under open-set
protocol, where ideal face features are expected to have smaller maximal
intra-class distance than minimal inter-class distance under a suitably chosen
metric space. However, few existing algorithms can effectively achieve this
criterion. To this end, we propose the angular softmax (A-Softmax) loss that
enables convolutional neural networks (CNNs) to learn angularly discriminative
features. Geometrically, A-Softmax loss can be viewed as imposing
discriminative constraints on a hypersphere manifold, which intrinsically
matches the prior that faces also lie on a manifold. Moreover, the size of
angular margin can be quantitatively adjusted by a parameter . We further
derive specific to approximate the ideal feature criterion. Extensive
analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF)
and MegaFace Challenge show the superiority of A-Softmax loss in FR tasks. The
code has also been made publicly available.Comment: CVPR 2017 (v4: updated the Appendix
Subject Identification Across Large Expression Variations Using 3D Facial Landmarks
Landmark localization is an important first step towards geometric based
vision research including subject identification. Considering this, we propose
to use 3D facial landmarks for the task of subject identification, over a range
of expressed emotion. Landmarks are detected, using a Temporal Deformable Shape
Model and used to train a Support Vector Machine (SVM), Random Forest (RF), and
Long Short-term Memory (LSTM) neural network for subject identification. As we
are interested in subject identification with large variations in expression,
we conducted experiments on 3 emotion-based databases, namely the BU-4DFE,
BP4D, and BP4D+ 3D/4D face databases. We show that our proposed method
outperforms current state of the art methods for subject identification on
BU-4DFE and BP4D. To the best of our knowledge, this is the first work to
investigate subject identification on the BP4D+, resulting in a baseline for
the community
Bi-directional Exponential Angular Triplet Loss for RGB-Infrared Person Re-Identification
RGB-Infrared person re-identification (RGB-IR Re- ID) is a cross-modality
matching problem, where the modality discrepancy is a big challenge. Most
existing works use Euclidean metric based constraints to resolve the
discrepancy between features of images from different modalities. However,
these methods are incapable of learning angularly discriminative feature
embedding because Euclidean distance cannot measure the included angle between
embedding vectors effectively. As an angularly discriminative feature space is
important for classifying the human images based on their embedding vectors, in
this paper, we propose a novel ranking loss function, named Bi-directional
Exponential Angular Triplet Loss, to help learn an angularly separable common
feature space by explicitly constraining the included angles between embedding
vectors. Moreover, to help stabilize and learn the magnitudes of embedding
vectors, we adopt a common space batch normalization layer. The quantitative
and qualitative experiments on the SYSU-MM01 and RegDB dataset support our
analysis. On SYSU-MM01 dataset, the performance is improved from 7.40% / 11.46%
to 38.57% / 38.61% for rank-1 accuracy / mAP compared with the baseline. The
proposed method can be generalized to the task of single-modality Re-ID and
improves the rank-1 accuracy / mAP from 92.0% / 81.7% to 94.7% / 86.6% on the
Market-1501 dataset, from 82.6% / 70.6% to 87.6% / 77.1% on the DukeMTMC-reID
dataset. Code: https://github.com/prismformore/expATComment: First Submission: April 2019. The final revision accepted by the IEEE
Transactions on Image Processing in December 202
Angular Softmax Loss for End-to-end Speaker Verification
End-to-end speaker verification systems have received increasing interests.
The traditional i-vector approach trains a generative model (basically a
factor-analysis model) to extract i-vectors as speaker embeddings. In contrast,
the end-to-end approach directly trains a discriminative model (often a neural
network) to learn discriminative speaker embeddings; a crucial component is the
training criterion. In this paper, we use angular softmax (A-softmax), which is
originally proposed for face verification, as the loss function for feature
learning in end-to-end speaker verification. By introducing margins between
classes into softmax loss, A-softmax can learn more discriminative features
than softmax loss and triplet loss, and at the same time, is easy and stable
for usage. We make two contributions in this work. 1) We introduce A-softmax
loss into end-to-end speaker verification and achieve significant EER
reductions. 2) We find that the combination of using A-softmax in training the
front-end and using PLDA in the back-end scoring further boosts the performance
of end-to-end systems under short utterance condition (short in both enrollment
and test). Experiments are conducted on part of dataset and
demonstrate the improvements of using A-softmax
Fix Your Features: Stationary and Maximally Discriminative Embeddings using Regular Polytope (Fixed Classifier) Networks
Neural networks are widely used as a model for classification in a large
variety of tasks. Typically, a learnable transformation (i.e. the classifier)
is placed at the end of such models returning a value for each class used for
classification. This transformation plays an important role in determining how
the generated features change during the learning process.
In this work we argue that this transformation not only can be fixed (i.e.
set as non trainable) with no loss of accuracy, but it can also be used to
learn stationary and maximally discriminative embeddings.
We show that the stationarity of the embedding and its maximal discriminative
representation can be theoretically justified by setting the weights of the
fixed classifier to values taken from the coordinate vertices of three regular
polytopes available in , namely: the -Simplex, the -Cube
and the -Orthoplex. These regular polytopes have the maximal amount of
symmetry that can be exploited to generate stationary features angularly
centered around their corresponding fixed weights.
Our approach improves and broadens the concept of a fixed classifier,
recently proposed in \cite{hoffer2018fix}, to a larger class of fixed
classifier models. Experimental results confirm both the theoretical analysis
and the generalization capability of the proposed method
L2-constrained Softmax Loss for Discriminative Face Verification
In recent years, the performance of face verification systems has
significantly improved using deep convolutional neural networks (DCNNs). A
typical pipeline for face verification includes training a deep network for
subject classification with softmax loss, using the penultimate layer output as
the feature descriptor, and generating a cosine similarity score given a pair
of face images. The softmax loss function does not optimize the features to
have higher similarity score for positive pairs and lower similarity score for
negative pairs, which leads to a performance gap. In this paper, we add an
L2-constraint to the feature descriptors which restricts them to lie on a
hypersphere of a fixed radius. This module can be easily implemented using
existing deep learning frameworks. We show that integrating this simple step in
the training pipeline significantly boosts the performance of face
verification. Specifically, we achieve state-of-the-art results on the
challenging IJB-A dataset, achieving True Accept Rate of 0.909 at False Accept
Rate 0.0001 on the face verification protocol. Additionally, we achieve
state-of-the-art performance on LFW dataset with an accuracy of 99.78%, and
competing performance on YTF dataset with accuracy of 96.08%
ReadNet:Towards Accurate ReID with Limited and Noisy Samples
Person re-identification (ReID) is an essential cross-camera retrieval task
to identify pedestrians. However, the photo number of each pedestrian usually
differs drastically, and thus the data limitation and imbalance problem hinders
the prediction accuracy greatly. Additionally, in real-world applications,
pedestrian images are captured by different surveillance cameras, so the noisy
camera related information, such as the lights, perspectives and resolutions,
result in inevitable domain gaps for ReID algorithms. These challenges bring
difficulties to current deep learning methods with triplet loss for coping with
such problems. To address these challenges, this paper proposes ReadNet, an
adversarial camera network (ACN) with an angular triplet loss (ATL). In detail,
ATL focuses on learning the angular distance among different identities to
mitigate the effect of data imbalance, and guarantees a linear decision
boundary as well, while ACN takes the camera discriminator as a game opponent
of feature extractor to filter camera related information to bridge the
multi-camera gaps. ReadNet is designed to be flexible so that either ATL or ACN
can be deployed independently or simultaneously. The experiment results on
various benchmark datasets have shown that ReadNet can deliver better
prediction performance than current state-of-the-art methods
Tackling Early Sparse Gradients in Softmax Activation Using Leaky Squared Euclidean Distance
Softmax activation is commonly used to output the probability distribution
over categories based on certain distance metric. In scenarios like one-shot
learning, the distance metric is often chosen to be squared Euclidean distance
between the query sample and the category prototype. This practice works well
in most time. However, we find that choosing squared Euclidean distance may
cause distance explosion leading gradients to be extremely sparse in the early
stage of back propagation. We term this phenomena as the early sparse gradients
problem. Though it doesn't deteriorate the convergence of the model, it may set
up a barrier to further model improvement. To tackle this problem, we propose
to use leaky squared Euclidean distance to impose a restriction on distances.
In this way, we can avoid distance explosion and increase the magnitude of
gradients. Extensive experiments are conducted on Omniglot and miniImageNet
datasets. We show that using leaky squared Euclidean distance can improve
one-shot classification accuracy on both datasets
- …