10 research outputs found
Recent Advances in Deep Learning Techniques for Face Recognition
In recent years, researchers have proposed many deep learning (DL) methods
for various tasks, and particularly face recognition (FR) made an enormous leap
using these techniques. Deep FR systems benefit from the hierarchical
architecture of the DL methods to learn discriminative face representation.
Therefore, DL techniques significantly improve state-of-the-art performance on
FR systems and encourage diverse and efficient real-world applications. In this
paper, we present a comprehensive analysis of various FR systems that leverage
the different types of DL techniques, and for the study, we summarize 168
recent contributions from this area. We discuss the papers related to different
algorithms, architectures, loss functions, activation functions, datasets,
challenges, improvement ideas, current and future trends of DL-based FR
systems. We provide a detailed discussion of various DL methods to understand
the current state-of-the-art, and then we discuss various activation and loss
functions for the methods. Additionally, we summarize different datasets used
widely for FR tasks and discuss challenges related to illumination, expression,
pose variations, and occlusion. Finally, we discuss improvement ideas, current
and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep
Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp.
99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613
SmartyFlow - Biometria Facial Robusta para Identificação Virtual
O roubo de identidade é um problema crescente na nossa sociedade em geral. Deste
modo, é necessário garantir que os métodos de autenticação existentes sejam seguros
contra ataques de apresentação. Nesta tese pretende-se estudar métodos de autenticação
com base em biometria facial, mais especificamente, verificação facial. Trata-se de um
método que, apesar de moderno, é igualmente vulnerável a ataques de segurança, em
particular ataques de falsificaçãodo rosto. Ultimamente, têm surgido abordagens que
utilizam a verificação da vivacidade para detetar tais ameaças.
Assim, no contexto desta tese, a vivacidade será detetada através de um vídeo da face
de um indivíduo, utilizando o seu ritmo cardíaco estimado através de Eulerian Video
Magnification (EVM). Ritmo cardíaco este que é posteriormente classificado recorrendo
a dois tipos de redes neurais profundas diferentes: Convolution Neural Network (CNN)
e Temporal Convolutional Network (TCN). Utilizando esta técnica de deteção, é possível garantir maior resiliência a ataques de apresentação, pois o ritmo cardíaco é uma característica fisiológica dificilmente falsificável.
Para além de classificar o sinal do ritmo cardíaco estimado, procurou-se desenvol ver uma forma eficiente de melhorar ainda mais a robustez dos modelos implementa dos ao detetar os ataques de apresentação. Para isso, com base no Treino Adversarial desenvolveu-se a Deep Convolutional Generative Adversarial Network (DCGAN) que per mite a criação de sinais cardíacos artificiais.
Como resultado concluiu-se que a rede TCN é mais apropriada para esta tarefa (obtendo 90,17 de eficácia sem sinais artificiais) e que a introdução de sinais artificiais produzidos pela DCGAN permitem de facto melhorar a robustez do modelo (obtendo 93,55 de eficácia).Identity theft is an ever-increasing problem in our society. Thus, it is necessary to ensure
that the existing authentication methods are secure against presentation attacks. The
proposed thesis aims to study authentication methods based on facial biometrics, more
specifically, facial verification. Nonetheless, despite being a rather modern method, it
is also vulnerable to security attacks, in particular, to face spoofing. Several approaches
have recently emerged that use liveness checks to detect such threats.
So, in the context of this thesis, liveness will be detected through a video of an
individual’s face, using its estimated heart rate estimated through EVM. The heart rate
is then classified using two different types of deep neural networks: CNN e TCN. By
using this detection technique, it is possible to ensure a higher level of resilience to
presentation attacks, considering that heart rate is a physiological characteristic that is
difficult to forge.
Besides classifying the estimated heart rate signal, an efficient way to increase the
robustness of the implemented models in detecting presentation attacks was developed.
To achieve this, on the basis of Adversarial Training, the DCGAN was developed, which
allows the creation of artificial heart signals.
As a result it was concluded that the TCN is more appropriate for this task (achieving
90,17 efficacy without artificial signals) and that the introduction of artificial signals
produced by DCGAN can in fact improve the robustness of the model (achieving 93,55
efficacy)
Augmented Deep Representations for Unconstrained Still/Video-based Face Recognition
Face recognition is one of the active areas of research in computer vision and biometrics. Many approaches have been proposed in the literature that demonstrate impressive performance, especially those based on deep learning. However, unconstrained face recognition with large pose, illumination, occlusion and other variations is still an unsolved problem. Unconstrained video-based face recognition is even more challenging due to the large volume of data to be processed, lack of labeled training data and significant intra/inter-video variations on scene, blur, video quality, etc. Although Deep Convolutional Neural Networks (DCNNs) have provided discriminant representations for faces and achieved performance surpassing humans in controlled scenarios, modifications are necessary for face recognition in unconstrained conditions. In this dissertation, we propose several methods that improve unconstrained face recognition performance by augmenting the representation provided by the deep networks using correlation or contextual information in the data.
For unconstrained still face recognition, we present an encoding approach to combine the Fisher vector (FV) encoding and DCNN representations, which is called FV-DCNN. The feature maps from the last convolutional layer in the deep network are encoded by FV into a robust representation, which utilizes the correlation between facial parts within each face. A VLAD-based encoding method called VLAD-DCNN is also proposed as an extension. Extensive evaluations on three challenging face recognition datasets show that the proposed FV-DCNN and VLAD-DCNN perform comparable to or better than many state-of-the-art face verification methods.
For the more challenging video-based face recognition task, we first propose an automatic system and model the video-to-video similarity as subspace-to-subspace similarity, where the subspaces characterize the correlation between deep representations of faces in videos. In the system, a quality-aware subspace-to-subspace similarity is introduced, where subspaces are learned using quality-aware principal component analysis. Subspaces along with quality-aware exemplars of templates are used to produce the similarity scores between video pairs by a quality-aware principal angle-based subspace-to-subspace similarity metric. The method is evaluated on four video datasets. The experimental results demonstrate the superior performance of the proposed method.
To utilize the temporal information in videos, a hybrid dictionary learning method is also proposed for video-based face recognition. The proposed unsupervised approach effectively models the temporal correlation between deep representations of video faces using dynamical dictionaries. A practical iterative optimization algorithm is introduced to learn the dynamical dictionary. Experiments on three video-based face recognition datasets demonstrate that the proposed method can effectively learn robust and discriminative representation for videos and improve the face recognition performance.
Finally, to leverage contextual information in videos, we present the Uncertainty-Gated Graph (UGG) for unconstrained video-based face recognition. It utilizes contextual information between faces by conducting graph-based identity propagation between sample tracklets, where identity information are initialized by the deep representations of video faces. UGG explicitly models the uncertainty of the contextual connections between tracklets by adaptively updating the weights of the edge gates according to the identity distributions of the nodes during inference. UGG is a generic graphical model that can be applied at only inference time or with end-to-end training. We demonstrate the effectiveness of UGG with state-of-the-art results on the recently released challenging Cast Search in Movies and IARPA Janus Surveillance Video Benchmark datasets
Deep visual learning with spike-timing dependent plasticity
For most animal species, reliable and fast visual pattern recognition is vital for
their survival. Ventral stream, a primary pathway within visual cortex, plays an important
role in object representation and form recognition. It is a hierarchical system
consisting of various visual areas, in which each visual area extracts different level of
abstractions. It is known that the neurons within ventral stream use spikes to represent
these abstractions. To increase the level of realism in a neural simulation, spiking
neural network (SNN) is often used as the neural network model. From SNN point of
view, the analog output values generated by traditional artificial neural network (ANN)
can be considered as the average spiking firing rates. Unlike traditional ANN, SNN
can not only use spiking rates but also specific spiking timing sequences to represent
the structural information of the input visual stimuli, which greatly increases the distinguishability.
To simulate the learning procedure of the ventral stream, various research questions
need to be resolved. In most cases, traditional methods use winner-take-all strategy to
distinguish different classes. However, such strategy works not well for overlapped
classes within decision space. Moreover, neurons within ventral stream tends to recognize
new input visual stimuli in a limited time window, which requires a fast learning
procedure. Furthermore, within ventral stream, neurons receive continuous input visual
stimuli and can only access local information during the learning procedure. However,
most traditional methods use separated visual stimuli as the input and incorporate
global information within the learning period. Finally, to verify the universality of the
proposed SNN framework, it is necessary to investigate its classification performance
for complex real world tasks such as video-based face disguise recognition.
To address the above problems, a novel classification method inspired by the soft
I
winner-take-all strategy has been proposed firstly, in which each associated class will
be assigned with a possibility and the input visual stimulus will be classified as the
class with the highest possibility. Moreover, to achieve a fast learning procedure, a
novel feed-forward SNN framework equipped with an unsupervised spike-timing dependent
plasticity (STDP) learning rule has been proposed. Furthermore, an eventdriven
continuous STDP (ECS) learning method has been proposed, in which two
novel continuous input mechanisms have been used to generate a continuous input
visual stimuli and a new event-driven STDP learning rule based on the local information
has been applied within the training procedure. Finally, such methodologies have
also been extended to the video-based disguise face recognition (VDFR) task in which
human identities are recognized not just on a few images but the sequences of video
stream showing facial muscle movements while speakin