10 research outputs found
Robust face recognition
University of Technology Sydney. Faculty of Engineering and Information Technology.Face recognition is one of the most important and promising biometric techniques. In face recognition, a similarity score is automatically calculated between face images to further decide their identity. Due to its non-invasive characteristics and ease of use, it has shown great potential in many real-world applications, e.g., video surveillance, access control systems, forensics and security, and social networks. This thesis addresses key challenges inherent in real-world face recognition systems including pose and illumination variations, occlusion, and image blur. To tackle these challenges, a series of robust face recognition algorithms are proposed. These can be summarized as follows:
In Chapter 2, we present a novel, manually designed face image descriptor named “Dual-Cross Patterns” (DCP). DCP efficiently encodes the seconder-order statistics of facial textures in the most informative directions within a face image. It proves to be more descriptive and discriminative than previous descriptors. We further extend DCP into a comprehensive face representation scheme named “Multi-Directional Multi-Level Dual-Cross Patterns” (MDML-DCPs). MDML-DCPs efficiently encodes the invariant characteristics of a face image from multiple levels into patterns that are highly discriminative of inter-personal differences but robust to intra-personal variations. MDML-DCPs achieves the best performance on the challenging FERET, FRGC 2.0, CAS-PEAL-R1, and LFW databases.
In Chapter 3, we develop a deep learning-based face image descriptor named “Multimodal Deep Face Representation” (MM-DFR) to automatically learn face representations from multimodal image data. In brief, convolutional neural networks (CNNs) are designed to extract complementary information from the original holistic face image, the frontal pose image rendered by 3D modeling, and uniformly sampled image patches. The recognition ability of each CNN is optimized by carefully integrating a number of published or newly developed tricks. A feature level fusion approach using stacked auto-encoders is designed to fuse the features extracted from the set of CNNs, which is advantageous for non-linear dimension reduction. MM-DFR achieves over 99% recognition rate on LFW using publicly available training data.
In Chapter 4, based on our research on handcrafted face image descriptors, we propose a powerful pose-invariant face recognition (PIFR) framework capable of handling the full range of pose variations within ±90° of yaw. The framework has two parts: the first is Patch-based Partial Representation (PBPR), and the second is Multi-task Feature Transformation Learning (MtFTL). PBPR transforms the original PIFR problem into a partial frontal face recognition problem. A robust patch-based face representation scheme is developed to represent the synthesized partial frontal faces. For each patch, a transformation dictionary is learnt under the MtFTL scheme. The transformation dictionary transforms the features of different poses into a discriminative subspace in which face matching is performed. The PBPR-MtFTL framework outperforms previous state-of-the-art PIFR methods on the FERET, CMU-PIE, and Multi-PIE databases.
In Chapter 5, based on our research on deep learning-based face image descriptors, we design a novel framework named Trunk-Branch Ensemble CNN (TBE-CNN) to handle challenges in video-based face recognition (VFR) under surveillance circumstances. Three major challenges are considered: image blur, occlusion, and pose variation. First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Second, to enhance the robustness of CNN features to pose variations and occlusion, we propose the TBE-CNN architecture, which efficiently extracts complementary information from holistic face images and patches cropped around facial components. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. With the proposed techniques, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces
Report on the BTAS 2016 Video Person Recognition Evaluation
© 2016 IEEE. This report presents results from the Video Person Recognition Evaluation held in conjunction with the 8th IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS). Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1,401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. An additional experiment required algorithms to recognize people in videos from the Video Database of Moving Faces and People (VDMFP). There were 958 videos in this experiment of 297 subjects. Four groups from around the world participated in the evaluation. The top verification rate for PaSC from this evaluation is 0.98 at a false accept rate of 0.01 - a remarkable advancement in performance from the competition held at FG 2015
Face Recognition: Issues, Methods and Alternative Applications
Face recognition, as one of the most successful applications of image analysis, has recently gained significant attention. It is due to availability of feasible technologies, including mobile solutions. Research in automatic face recognition has been conducted since the 1960s, but the problem is still largely unsolved. Last decade has provided significant progress in this area owing to advances in face modelling and analysis techniques. Although systems have been developed for face detection and tracking, reliable face recognition still offers a great challenge to computer vision and pattern recognition researchers. There are several reasons for recent increased interest in face recognition, including rising public concern for security, the need for identity verification in the digital world, face analysis and modelling techniques in multimedia data management and computer entertainment. In this chapter, we have discussed face recognition processing, including major components such as face detection, tracking, alignment and feature extraction, and it points out the technical challenges of building a face recognition system. We focus on the importance of the most successful solutions available so far. The final part of the chapter describes chosen face recognition methods and applications and their potential use in areas not related to face recognition
Deep visual learning with spike-timing dependent plasticity
For most animal species, reliable and fast visual pattern recognition is vital for
their survival. Ventral stream, a primary pathway within visual cortex, plays an important
role in object representation and form recognition. It is a hierarchical system
consisting of various visual areas, in which each visual area extracts different level of
abstractions. It is known that the neurons within ventral stream use spikes to represent
these abstractions. To increase the level of realism in a neural simulation, spiking
neural network (SNN) is often used as the neural network model. From SNN point of
view, the analog output values generated by traditional artificial neural network (ANN)
can be considered as the average spiking firing rates. Unlike traditional ANN, SNN
can not only use spiking rates but also specific spiking timing sequences to represent
the structural information of the input visual stimuli, which greatly increases the distinguishability.
To simulate the learning procedure of the ventral stream, various research questions
need to be resolved. In most cases, traditional methods use winner-take-all strategy to
distinguish different classes. However, such strategy works not well for overlapped
classes within decision space. Moreover, neurons within ventral stream tends to recognize
new input visual stimuli in a limited time window, which requires a fast learning
procedure. Furthermore, within ventral stream, neurons receive continuous input visual
stimuli and can only access local information during the learning procedure. However,
most traditional methods use separated visual stimuli as the input and incorporate
global information within the learning period. Finally, to verify the universality of the
proposed SNN framework, it is necessary to investigate its classification performance
for complex real world tasks such as video-based face disguise recognition.
To address the above problems, a novel classification method inspired by the soft
I
winner-take-all strategy has been proposed firstly, in which each associated class will
be assigned with a possibility and the input visual stimulus will be classified as the
class with the highest possibility. Moreover, to achieve a fast learning procedure, a
novel feed-forward SNN framework equipped with an unsupervised spike-timing dependent
plasticity (STDP) learning rule has been proposed. Furthermore, an eventdriven
continuous STDP (ECS) learning method has been proposed, in which two
novel continuous input mechanisms have been used to generate a continuous input
visual stimuli and a new event-driven STDP learning rule based on the local information
has been applied within the training procedure. Finally, such methodologies have
also been extended to the video-based disguise face recognition (VDFR) task in which
human identities are recognized not just on a few images but the sequences of video
stream showing facial muscle movements while speakin
Біометрична ідентифікація та автентифікація за геометрією облич
Робота публікується згідно наказу Ректора НАУ від 27.05.2021 р. №311/од "Про розміщення кваліфікаційних робіт здобувачів вищої освіти в репозиторії університету" Керівник проекту: доцент, кандитат технічних наук, Холявкіна Тетяна ВолодимирівнаБіометрична автентифікація вже давно і досить глибоко проникла в життя
багатьох людей. Наразі ми вже не звертаємо особливу увагу і не дивуємося тому, що
цифрові прилади вміють розпізнавати нашу особу. А чи взагалі важливо це зараз?
Люди, які не працюють у сфері цифрових технологій, часто навіть і не знають як
функціонує біометрична автентифікація чи надійні такі технології, на якому етапі
розвитку вони є, і що було б якби ми не мали можливості використовувати те що
маємо.
Слід почати з того, що люди використовують кожен день, а саме мобільний
телефон. Наразі всі нові моделі смартфонів мають функцію розпізнавання особи за
геометрією обличчя. Здавалося б, що це всього лише приємний функціонал і без
нього легко можна обійтися за допомогою звичайних паролів, але ні. Паролі дуже
часто і з відносною легкістю викрадаються зловмисниками, а потім як результат
стають ключем для викрадення персональних даних. Однак кібербезпека це не
єдиний мінус. Людина в середньому за день розблоковує телефон близько 150 разів.
Якщо врахувати, що на пароль довжиною в 6 символів ми витрачаємо приблизно 2-і
секунди, то кожного дня така процедура витрачає 5 хвилин часу. Досить
неприємний результат, враховуючи те, що за допомогою біометричної
автентифікації цей процес займає мілісекунди.
Приклад зі смартфоном, насправді, є лише маленькою долею користі від
інтеграції біометричної ідентифікації в наше повсякденне життя. Наразі ця
технологія широко використовується в правоохоронних органах для пошуку
злочинців, у приватних компаніях заради безпеки важливих бізнес секретів, у
домівка, у банківських установах, навіть вже в державному додатку “Дія” для
цифрового підпису, і у багатьох інших областях (рисунок 1). Найголовніше те, що
все це робить людям повсякденність зручнішою і виграє найцінніший ресурс – час
Recent Advances in Deep Learning Techniques for Face Recognition
In recent years, researchers have proposed many deep learning (DL) methods
for various tasks, and particularly face recognition (FR) made an enormous leap
using these techniques. Deep FR systems benefit from the hierarchical
architecture of the DL methods to learn discriminative face representation.
Therefore, DL techniques significantly improve state-of-the-art performance on
FR systems and encourage diverse and efficient real-world applications. In this
paper, we present a comprehensive analysis of various FR systems that leverage
the different types of DL techniques, and for the study, we summarize 168
recent contributions from this area. We discuss the papers related to different
algorithms, architectures, loss functions, activation functions, datasets,
challenges, improvement ideas, current and future trends of DL-based FR
systems. We provide a detailed discussion of various DL methods to understand
the current state-of-the-art, and then we discuss various activation and loss
functions for the methods. Additionally, we summarize different datasets used
widely for FR tasks and discuss challenges related to illumination, expression,
pose variations, and occlusion. Finally, we discuss improvement ideas, current
and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep
Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp.
99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613
RECOGNITION OF FACES FROM SINGLE AND MULTI-VIEW VIDEOS
Face recognition has been an active research field for decades. In recent years, with videos playing an increasingly important role in our everyday life, video-based face recognition has begun to attract considerable research interest. This leads to a wide range of potential application areas, including TV/movies search and parsing, video surveillance, access control etc. Preliminary research results in this field have suggested that by exploiting the abundant spatial-temporal information contained in videos, we can greatly improve the accuracy and robustness of a visual recognition system. On the other hand, as this research area is still in its infancy, developing an end-to-end face processing pipeline that can robustly detect, track and recognize faces remains a challenging task. The goal of this dissertation is to study some of the related problems under different settings.
We address the video-based face association problem, in which one attempts to extract face tracks of multiple subjects while maintaining label consistency. Traditional tracking algorithms have difficulty in handling this task, especially when challenging nuisance factors like motion blur, low resolution or significant camera motions are present. We demonstrate that contextual features, in addition to face appearance itself, play an important role in this case. We propose principled methods to combine multiple features using Conditional Random Fields and Max-Margin Markov networks to infer labels for the detected faces. Different from many existing approaches, our algorithms work in online mode and hence have a wider range of applications. We address issues such as parameter learning, inference and handling false positves/negatives that arise in the proposed approach. Finally, we evaluate our approach on several public databases.
We next propose a novel video-based face recognition framework. We address the problem from two different aspects: To handle pose variations, we learn a Structural-SVM based detector which can simultaneously localize the face fiducial points and estimate the face pose. By adopting a different optimization criterion from existing algorithms, we are able to improve localization accuracy. To model other face variations, we use intra-personal/extra-personal dictionaries. The intra-personal/extra-personal modeling of human faces has been shown to work successfully in the Bayesian face recognition framework. It has additional advantages in scalability and generalization, which are of critical importance to real-world applications. Combining intra-personal/extra-personal models with dictionary learning enables us to achieve state-of-arts performance on unconstrained video data, even when the training data come from a different database.
Finally, we present an approach for video-based face recognition using camera networks. The focus is on handling pose variations by applying the strength of the multi-view camera network. However, rather than taking the typical approach of modeling these variations, which eventually requires explicit knowledge about pose parameters, we rely on a pose-robust feature that eliminates the needs for pose estimation. The pose-robust feature is developed using the Spherical Harmonic (SH) representation theory. It is extracted using the surface texture map of a spherical model which approximates the subject's head. Feature vectors extracted from a video are modeled as an ensemble of instances of a probability distribution in the Reduced Kernel Hilbert Space (RKHS). The ensemble similarity measure in RKHS improves both robustness and accuracy of the recognition system. The proposed approach outperforms traditional algorithms on a multi-view video database collected using a camera network
Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition
© 1979-2012 IEEE. Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low-and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation