10 research outputs found

    Recent Advances in Deep Learning Techniques for Face Recognition

    Full text link
    In recent years, researchers have proposed many deep learning (DL) methods for various tasks, and particularly face recognition (FR) made an enormous leap using these techniques. Deep FR systems benefit from the hierarchical architecture of the DL methods to learn discriminative face representation. Therefore, DL techniques significantly improve state-of-the-art performance on FR systems and encourage diverse and efficient real-world applications. In this paper, we present a comprehensive analysis of various FR systems that leverage the different types of DL techniques, and for the study, we summarize 168 recent contributions from this area. We discuss the papers related to different algorithms, architectures, loss functions, activation functions, datasets, challenges, improvement ideas, current and future trends of DL-based FR systems. We provide a detailed discussion of various DL methods to understand the current state-of-the-art, and then we discuss various activation and loss functions for the methods. Additionally, we summarize different datasets used widely for FR tasks and discuss challenges related to illumination, expression, pose variations, and occlusion. Finally, we discuss improvement ideas, current and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp. 99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613

    Illumination Processing in Face Recognition

    Get PDF

    SmartyFlow - Biometria Facial Robusta para Identificação Virtual

    Get PDF
    O roubo de identidade é um problema crescente na nossa sociedade em geral. Deste modo, é necessário garantir que os métodos de autenticação existentes sejam seguros contra ataques de apresentação. Nesta tese pretende-se estudar métodos de autenticação com base em biometria facial, mais especificamente, verificação facial. Trata-se de um método que, apesar de moderno, é igualmente vulnerável a ataques de segurança, em particular ataques de falsificaçãodo rosto. Ultimamente, têm surgido abordagens que utilizam a verificação da vivacidade para detetar tais ameaças. Assim, no contexto desta tese, a vivacidade será detetada através de um vídeo da face de um indivíduo, utilizando o seu ritmo cardíaco estimado através de Eulerian Video Magnification (EVM). Ritmo cardíaco este que é posteriormente classificado recorrendo a dois tipos de redes neurais profundas diferentes: Convolution Neural Network (CNN) e Temporal Convolutional Network (TCN). Utilizando esta técnica de deteção, é possível garantir maior resiliência a ataques de apresentação, pois o ritmo cardíaco é uma característica fisiológica dificilmente falsificável. Para além de classificar o sinal do ritmo cardíaco estimado, procurou-se desenvol ver uma forma eficiente de melhorar ainda mais a robustez dos modelos implementa dos ao detetar os ataques de apresentação. Para isso, com base no Treino Adversarial desenvolveu-se a Deep Convolutional Generative Adversarial Network (DCGAN) que per mite a criação de sinais cardíacos artificiais. Como resultado concluiu-se que a rede TCN é mais apropriada para esta tarefa (obtendo 90,17 de eficácia sem sinais artificiais) e que a introdução de sinais artificiais produzidos pela DCGAN permitem de facto melhorar a robustez do modelo (obtendo 93,55 de eficácia).Identity theft is an ever-increasing problem in our society. Thus, it is necessary to ensure that the existing authentication methods are secure against presentation attacks. The proposed thesis aims to study authentication methods based on facial biometrics, more specifically, facial verification. Nonetheless, despite being a rather modern method, it is also vulnerable to security attacks, in particular, to face spoofing. Several approaches have recently emerged that use liveness checks to detect such threats. So, in the context of this thesis, liveness will be detected through a video of an individual’s face, using its estimated heart rate estimated through EVM. The heart rate is then classified using two different types of deep neural networks: CNN e TCN. By using this detection technique, it is possible to ensure a higher level of resilience to presentation attacks, considering that heart rate is a physiological characteristic that is difficult to forge. Besides classifying the estimated heart rate signal, an efficient way to increase the robustness of the implemented models in detecting presentation attacks was developed. To achieve this, on the basis of Adversarial Training, the DCGAN was developed, which allows the creation of artificial heart signals. As a result it was concluded that the TCN is more appropriate for this task (achieving 90,17 efficacy without artificial signals) and that the introduction of artificial signals produced by DCGAN can in fact improve the robustness of the model (achieving 93,55 efficacy)

    Augmented Deep Representations for Unconstrained Still/Video-based Face Recognition

    Get PDF
    Face recognition is one of the active areas of research in computer vision and biometrics. Many approaches have been proposed in the literature that demonstrate impressive performance, especially those based on deep learning. However, unconstrained face recognition with large pose, illumination, occlusion and other variations is still an unsolved problem. Unconstrained video-based face recognition is even more challenging due to the large volume of data to be processed, lack of labeled training data and significant intra/inter-video variations on scene, blur, video quality, etc. Although Deep Convolutional Neural Networks (DCNNs) have provided discriminant representations for faces and achieved performance surpassing humans in controlled scenarios, modifications are necessary for face recognition in unconstrained conditions. In this dissertation, we propose several methods that improve unconstrained face recognition performance by augmenting the representation provided by the deep networks using correlation or contextual information in the data. For unconstrained still face recognition, we present an encoding approach to combine the Fisher vector (FV) encoding and DCNN representations, which is called FV-DCNN. The feature maps from the last convolutional layer in the deep network are encoded by FV into a robust representation, which utilizes the correlation between facial parts within each face. A VLAD-based encoding method called VLAD-DCNN is also proposed as an extension. Extensive evaluations on three challenging face recognition datasets show that the proposed FV-DCNN and VLAD-DCNN perform comparable to or better than many state-of-the-art face verification methods. For the more challenging video-based face recognition task, we first propose an automatic system and model the video-to-video similarity as subspace-to-subspace similarity, where the subspaces characterize the correlation between deep representations of faces in videos. In the system, a quality-aware subspace-to-subspace similarity is introduced, where subspaces are learned using quality-aware principal component analysis. Subspaces along with quality-aware exemplars of templates are used to produce the similarity scores between video pairs by a quality-aware principal angle-based subspace-to-subspace similarity metric. The method is evaluated on four video datasets. The experimental results demonstrate the superior performance of the proposed method. To utilize the temporal information in videos, a hybrid dictionary learning method is also proposed for video-based face recognition. The proposed unsupervised approach effectively models the temporal correlation between deep representations of video faces using dynamical dictionaries. A practical iterative optimization algorithm is introduced to learn the dynamical dictionary. Experiments on three video-based face recognition datasets demonstrate that the proposed method can effectively learn robust and discriminative representation for videos and improve the face recognition performance. Finally, to leverage contextual information in videos, we present the Uncertainty-Gated Graph (UGG) for unconstrained video-based face recognition. It utilizes contextual information between faces by conducting graph-based identity propagation between sample tracklets, where identity information are initialized by the deep representations of video faces. UGG explicitly models the uncertainty of the contextual connections between tracklets by adaptively updating the weights of the edge gates according to the identity distributions of the nodes during inference. UGG is a generic graphical model that can be applied at only inference time or with end-to-end training. We demonstrate the effectiveness of UGG with state-of-the-art results on the recently released challenging Cast Search in Movies and IARPA Janus Surveillance Video Benchmark datasets

    Deep visual learning with spike-timing dependent plasticity

    Get PDF
    For most animal species, reliable and fast visual pattern recognition is vital for their survival. Ventral stream, a primary pathway within visual cortex, plays an important role in object representation and form recognition. It is a hierarchical system consisting of various visual areas, in which each visual area extracts different level of abstractions. It is known that the neurons within ventral stream use spikes to represent these abstractions. To increase the level of realism in a neural simulation, spiking neural network (SNN) is often used as the neural network model. From SNN point of view, the analog output values generated by traditional artificial neural network (ANN) can be considered as the average spiking firing rates. Unlike traditional ANN, SNN can not only use spiking rates but also specific spiking timing sequences to represent the structural information of the input visual stimuli, which greatly increases the distinguishability. To simulate the learning procedure of the ventral stream, various research questions need to be resolved. In most cases, traditional methods use winner-take-all strategy to distinguish different classes. However, such strategy works not well for overlapped classes within decision space. Moreover, neurons within ventral stream tends to recognize new input visual stimuli in a limited time window, which requires a fast learning procedure. Furthermore, within ventral stream, neurons receive continuous input visual stimuli and can only access local information during the learning procedure. However, most traditional methods use separated visual stimuli as the input and incorporate global information within the learning period. Finally, to verify the universality of the proposed SNN framework, it is necessary to investigate its classification performance for complex real world tasks such as video-based face disguise recognition. To address the above problems, a novel classification method inspired by the soft I winner-take-all strategy has been proposed firstly, in which each associated class will be assigned with a possibility and the input visual stimulus will be classified as the class with the highest possibility. Moreover, to achieve a fast learning procedure, a novel feed-forward SNN framework equipped with an unsupervised spike-timing dependent plasticity (STDP) learning rule has been proposed. Furthermore, an eventdriven continuous STDP (ECS) learning method has been proposed, in which two novel continuous input mechanisms have been used to generate a continuous input visual stimuli and a new event-driven STDP learning rule based on the local information has been applied within the training procedure. Finally, such methodologies have also been extended to the video-based disguise face recognition (VDFR) task in which human identities are recognized not just on a few images but the sequences of video stream showing facial muscle movements while speakin
    corecore