3,060 research outputs found
Unique Faces Recognition in Videos
This paper tackles face recognition in videos employing metric learning
methods and similarity ranking models. The paper compares the use of the
Siamese network with contrastive loss and Triplet Network with triplet loss
implementing the following architectures: Google/Inception architecture, 3D
Convolutional Network (C3D), and a 2-D Long short-term memory (LSTM) Recurrent
Neural Network. We make use of still images and sequences from videos for
training the networks and compare the performances implementing the above
architectures. The dataset used was the YouTube Face Database designed for
investigating the problem of face recognition in videos. The contribution of
this paper is two-fold: to begin, the experiments have established 3-D
Convolutional networks and 2-D LSTMs with the contrastive loss on image
sequences do not outperform Google/Inception architecture with contrastive loss
in top rank face retrievals with still images. However, the 3-D Convolution
networks and 2-D LSTM with triplet Loss outperform the Google/Inception with
triplet loss in top rank face retrievals on the dataset; second, a Support
Vector Machine (SVM) was used in conjunction with the CNNs' learned feature
representations for facial identification. The results show that feature
representation learned with triplet loss is significantly better for n-shot
facial identification compared to contrastive loss. The most useful feature
representations for facial identification are from the 2-D LSTM with triplet
loss. The experiments show that learning spatio-temporal features from video
sequences is beneficial for facial recognition in videos.Comment: Paper was accepted into Fusion 2020 conference but will only be
published after the virtual conference in July 2020. 7 pages lon
Learning Face Representation from Scratch
Pushing by big data and deep convolutional neural network (CNN), the
performance of face recognition is becoming comparable to human. Using private
large scale training datasets, several groups achieve very high performance on
LFW, i.e., 97% to 99%. While there are many open source implementations of CNN,
none of large scale face dataset is publicly available. The current situation
in the field of face recognition is that data is more important than algorithm.
To solve this problem, this paper proposes a semi-automatical way to collect
face images from Internet and builds a large scale dataset containing about
10,000 subjects and 500,000 images, called CASIAWebFace. Based on the database,
we use a 11-layer CNN to learn discriminative representation and obtain
state-of-theart accuracy on LFW and YTF. The publication of CASIAWebFace will
attract more research groups entering this field and accelerate the development
of face recognition in the wild
SphereFace: Deep Hypersphere Embedding for Face Recognition
This paper addresses deep face recognition (FR) problem under open-set
protocol, where ideal face features are expected to have smaller maximal
intra-class distance than minimal inter-class distance under a suitably chosen
metric space. However, few existing algorithms can effectively achieve this
criterion. To this end, we propose the angular softmax (A-Softmax) loss that
enables convolutional neural networks (CNNs) to learn angularly discriminative
features. Geometrically, A-Softmax loss can be viewed as imposing
discriminative constraints on a hypersphere manifold, which intrinsically
matches the prior that faces also lie on a manifold. Moreover, the size of
angular margin can be quantitatively adjusted by a parameter . We further
derive specific to approximate the ideal feature criterion. Extensive
analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF)
and MegaFace Challenge show the superiority of A-Softmax loss in FR tasks. The
code has also been made publicly available.Comment: CVPR 2017 (v4: updated the Appendix
Successive Embedding and Classification Loss for Aerial Image Classification
Deep neural networks can be effective means to automatically classify aerial
images but is easy to overfit to the training data. It is critical for trained
neural networks to be robust to variations that exist between training and test
environments. To address the overfitting problem in aerial image
classification, we consider the neural network as successive transformations of
an input image into embedded feature representations and ultimately into a
semantic class label, and train neural networks to optimize image
representations in the embedded space in addition to optimizing the final
classification score. We demonstrate that networks trained with this dual
embedding and classification loss outperform networks with classification loss
only. %We also study placing the embedding loss on different network layers. We
also find that moving the embedding loss from commonly-used feature space to
the classifier space, which is the space just before softmax nonlinearization,
leads to the best classification performance for aerial images. Visualizations
of the network's embedded representations reveal that the embedding loss
encourages greater separation between target class clusters for both training
and testing partitions of two aerial image classification benchmark datasets,
MSTAR and AID. Our code is publicly available on GitHub
EXPERTNet Exigent Features Preservative Network for Facial Expression Recognition
Facial expressions have essential cues to infer the humans state of mind,
that conveys adequate information to understand individuals actual feelings.
Thus, automatic facial expression recognition is an interesting and crucial
task to interpret the humans cognitive state through the machine. In this
paper, we proposed an Exigent Features Preservative Network (EXPERTNet), to
describe the features of the facial expressions. The EXPERTNet extracts only
pertinent features and neglect others by using exigent feature (ExFeat) block,
mainly comprises of elective layer. Specifically, elective layer selects the
desired edge variation features from the previous layer outcomes, which are
generated by applying different sized filters as 1 x 1, 3 x 3, 5 x 5 and 7 x 7.
Different sized filters aid to elicits both micro and high-level features that
enhance the learnability of neurons. ExFeat block preserves the spatial
structural information of the facial expression, which allows to discriminate
between different classes of facial expressions. Visual representation of the
proposed method over different facial expressions shows the learning capability
of the neurons of different layers. Experimental and comparative analysis
results over four comprehensive datasets CK+, MMI DISFA and GEMEP-FERA, ensures
the better performance of the proposed network as compared to existing
networks
Anchor-based Nearest Class Mean Loss for Convolutional Neural Networks
Discriminative features are critical for machine learning applications. Most
existing deep learning approaches, however, rely on convolutional neural
networks (CNNs) for learning features, whose discriminant power is not
explicitly enforced. In this paper, we propose a novel approach to train deep
CNNs by imposing the intra-class compactness and the inter-class separability,
so as to enhance the learned features' discriminant power. To this end, we
introduce anchors, which are predefined vectors regarded as the centers for
each class and fixed during training. Discriminative features are obtained by
constraining the deep CNNs to map training samples to the corresponding anchors
as close as possible. We propose two principles to select the anchors, and
measure the proximity of two points using the Euclidean and cosine distance
metric functions, which results in two novel loss functions. These loss
functions require no sample pairs or triplets and can be efficiently optimized
by batch stochastic gradient descent. We test the proposed method on three
benchmark image classification datasets and demonstrate its promising results
von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification
A number of pattern recognition tasks, \textit{e.g.}, face verification, can
be boiled down to classification or clustering of unit length directional
feature vectors whose distance can be simply computed by their angle. In this
paper, we propose the von Mises-Fisher (vMF) mixture model as the theoretical
foundation for an effective deep-learning of such directional features and
derive a novel vMF Mixture Loss and its corresponding vMF deep features. The
proposed vMF feature learning achieves the characteristics of discriminative
learning, \textit{i.e.}, compacting the instances of the same class while
increasing the distance of instances from different classes. Moreover, it
subsumes a number of popular loss functions as well as an effective method in
deep learning, namely normalization. We conduct extensive experiments on face
verification using 4 different challenging face datasets, \textit{i.e.}, LFW,
YouTube faces, CACD and IJB-A. Results show the effectiveness and excellent
generalization ability of the proposed approach as it achieves state-of-the-art
results on the LFW, YouTube faces and CACD datasets and competitive results on
the IJB-A dataset.Comment: Under revie
Towards Distortion-Predictable Embedding of Neural Networks
Current research in Computer Vision has shown that Convolutional Neural
Networks (CNN) give state-of-the-art performance in many classification tasks
and Computer Vision problems. The embedding of CNN, which is the internal
representation produced by the last layer, can indirectly learn topological and
relational properties. Moreover, by using a suitable loss function, CNN models
can learn invariance to a wide range of non-linear distortions such as
rotation, viewpoint angle or lighting condition. In this work, new insights are
discovered about CNN embeddings and a new loss function is proposed, derived
from the contrastive loss, that creates models with more predicable mappings
and also quantifies distortions. In typical distortion-dependent methods, there
is no simple relation between the features corresponding to one image and the
features of this image distorted. Therefore, these methods require to
feed-forward inputs under every distortions in order to find the corresponding
features representations. Our contribution makes a step towards embeddings
where features of distorted inputs are related and can be derived from each
others by the intensity of the distortion.Comment: 54 pages, 28 figures. Master project at EPFL (Switzerland) in 2015.
For source code on GitHub, see https://github.com/axel-angel/master-projec
Automated Simulations of Galaxy Morphology Evolution using Deep Learning and Particle Swarm Optimisation
The formation of Hoag-type galaxies with central spheroidal galaxies and
outer stellar rings has yet to be understood in astronomy. We consider that
these unique objects were formed from the past interaction between elliptical
galaxies and gas-rich dwarf galaxies. We have modelled this potential formation
process through simulation. These numerical simulations are a means of
investigating this formation hypothesis, however the parameter space to be
explored for these simulations is vast. Through the application of machine
learning and computational science, we implement a new two-fold method to find
the best model parameters for stellar rings in the simulations. First, test
particle simulations are run to find a possible range of parameters for which
stellar rings can be formed around elliptical galaxies (i.e. Hoag-type
galaxies). A novel combination of particle swarm optimisation and Siamese
neural networks has been implemented to perform the search over the parameter
space and test the level of consistency between observations and simulations
for numerous models. Upon the success of this initial step, we subsequently run
full chemodynamical simulations for the derived range of model parameters in
order to verify the output of the test particle simulations. We successfully
find parameter sets at which stellar rings can be formed from the interaction
between a gas-rich dwarf galaxy and a central elliptical galaxy. This is
evidence that supports our hypothesis about the formation process of Hoag-type
galaxies. In addition, this suggests that our new two-fold method has been
successfully implemented in this problem search-space and can be investigated
further in future applications. ~Comment: 32 pages: Master thesis at UWA (Computer science
Contrastive-center loss for deep neural networks
The deep convolutional neural network(CNN) has significantly raised the
performance of image classification and face recognition. Softmax is usually
used as supervision, but it only penalizes the classification loss. In this
paper, we propose a novel auxiliary supervision signal called contrastivecenter
loss, which can further enhance the discriminative power of the features, for
it learns a class center for each class. The proposed contrastive-center loss
simultaneously considers intra-class compactness and inter-class separability,
by penalizing the contrastive values between: (1)the distances of training
samples to their corresponding class centers, and (2)the sum of the distances
of training samples to their non-corresponding class centers. Experiments on
different datasets demonstrate the effectiveness of contrastive-center loss
- …