36 research outputs found

    3D Face Reconstruction for Forensic Recognition - A Survey

    Get PDF
    3D face reconstruction algorithms from images and videos are applied to many fields, from plastic surgery to the entertainment sector, thanks to their advantageous features. However, when looking at forensic applications, 3D face reconstruction must observe strict requirements that still make unclear its possible role in bringing evidence to a lawsuit. Shedding some light on this matter is the goal of the present survey, where we start by clarifying the relation between forensic applications and biometrics. To our knowledge, no previous work adopted this relation to make the point on the state of the art. Therefore, we analyzed the achievements of 3D face reconstruction algorithms from surveillance videos and mugshot images and discussed the current obstacles that separate 3D face reconstruction from an active role in forensic applications

    Active Shape Model with Applications in Facial Landmark Localization

    Get PDF
    Active Shape Model (AAM) and Active Appearance Model (AAM) are the commonly used methods in facial landmark localization, in the past 20 years, there are a few extended methods, which are based on the classical ones, being published. In this master thesis, the classical Active Shape Model (ASM) and Active Appearance Model (AAM) are well studied and summarized, especially ASM, which is also introduced in formulation. Two newly extended versions based on Active Shape Model are introduced and implemented through the thesis work. Stacked Active Shape Model (Stasm), which is much closer to the classical ASM, achieves a very good result on frontal face image landmark detection, so that it is the emphasis of this thesis. Besides we use Component based ASM as the comparison method, which is another Active Shape Model method based on component analysis. We performed these two methods for facial images from different situations: frontal and non-frontal images, single and group images. From the observation and data results, we show that Stasm still has room for improvement on facial feature localization. We explore the theoretical differences of these two extended versions and propose ideas for improvement in the later chapters

    Sign correlation subspace for face alignment

    Get PDF
    © 2018, Springer-Verlag GmbH Germany, part of Springer Nature. Face alignment is an essential task for facial performance capture and expression analysis. Current methods such as random subspace supervised descent method, stage-wise relational dictionary and coarse-to-fine shape searching can ease multi-pose face alignment problem, but no method can deal with the multiple local minima problem directly. In this paper, we propose a sign correlation subspace method for domain partition in only one reduced low-dimensional subspace. Unlike previous methods, we analyze the sign correlation between features and shapes and project both of them into a mutual sign correlation subspace. Each pair of projected shape and feature keeps their signs consistent in each dimension of the subspace, so that each hyper octant holds the condition that one general descent exists. Then a set of general descents are learned from the samples in different hyperoctants. Requiring only the feature projection for domain partition, our proposed method is effective for face alignment. We have validated our approach with the public face datasets which include a range of poses. The validation results show that our method can reveal their latent relationships to poses. The comparison with state-of-the-art methods demonstrates that our method outperforms them, especially in uncontrolled conditions with various poses, while enjoying the comparable speed

    The Menpo benchmark for multi-pose 2D and 3D facial landmark localisation and tracking

    Get PDF
    In this article, we present the Menpo 2D and Menpo 3D benchmarks, two new datasets for multi-pose 2D and 3D facial landmark localisation and tracking. In contrast to the previous benchmarks such as 300W and 300VW, the proposed benchmarks contain facial images in both semi-frontal and profile pose. We introduce an elaborate semi-automatic methodology for providing high-quality annotations for both the Menpo 2D and Menpo 3D benchmarks. In Menpo 2D benchmark, different visible landmark configurations are designed for semi-frontal and profile faces, thus making the 2D face alignment full-pose. In Menpo 3D benchmark, a united landmark configuration is designed for both semi-frontal and profile faces based on the correspondence with a 3D face model, thus making face alignment not only full-pose but also corresponding to the real-world 3D space. Based on the considerable number of annotated images, we organised Menpo 2D Challenge and Menpo 3D Challenge for face alignment under large pose variations in conjunction with CVPR 2017 and ICCV 2017, respectively. The results of these challenges demonstrate that recent deep learning architectures, when trained with the abundant data, lead to excellent results. We also provide a very simple, yet effective solution, named Cascade Multi-view Hourglass Model, to 2D and 3D face alignment. In our method, we take advantage of all 2D and 3D facial landmark annotations in a joint way. We not only capitalise on the correspondences between the semi-frontal and profile 2D facial landmarks but also employ joint supervision from both 2D and 3D facial landmarks. Finally, we discuss future directions on the topic of face alignment

    The Menpo benchmark for multi-pose 2D and 3D facial landmark localisation and tracking

    Get PDF
    In this article, we present the Menpo 2D and Menpo 3D benchmarks, two new datasets for multi-pose 2D and 3D facial landmark localisation and tracking. In contrast to the previous benchmarks such as 300W and 300VW, the proposed benchmarks contain facial images in both semi-frontal and profile pose. We introduce an elaborate semi-automatic methodology for providing high-quality annotations for both the Menpo 2D and Menpo 3D benchmarks. In Menpo 2D benchmark, different visible landmark configurations are designed for semi-frontal and profile faces, thus making the 2D face alignment full-pose. In Menpo 3D benchmark, a united landmark configuration is designed for both semi-frontal and profile faces based on the correspondence with a 3D face model, thus making face alignment not only full-pose but also corresponding to the real-world 3D space. Based on the considerable number of annotated images, we organised Menpo 2D Challenge and Menpo 3D Challenge for face alignment under large pose variations in conjunction with CVPR 2017 and ICCV 2017, respectively. The results of these challenges demonstrate that recent deep learning architectures, when trained with the abundant data, lead to excellent results. We also provide a very simple, yet effective solution, named Cascade Multi-view Hourglass Model, to 2D and 3D face alignment. In our method, we take advantage of all 2D and 3D facial landmark annotations in a joint way. We not only capitalise on the correspondences between the semi-frontal and profile 2D facial landmarks but also employ joint supervision from both 2D and 3D facial landmarks. Finally, we discuss future directions on the topic of face alignment

    Gender classification using facial components.

    Get PDF
    Master’s degree. University of KwaZulu-Natal, Durban.Gender classification is very important in facial analysis as it can be used as input into a number of systems such as face recognition. Humans are able to classify gender with great accuracy however passing this ability to machines is a complex task because of many variables such as lighting to mention a few. For the purpose of this research we have approached gender classification as a binary problem, involving the two classes male and female. Two datasets are used in this research which are the FG-NET dataset and Pilots Parliament datasets. Two appearance based feature extractors are used which are the LBP and LDP with the Active Shape model being included by fusing. The classifiers used here are the Support Vector Machine with Radial Basis Function kernel and an Artificial Neural Network with backpropagation. On the FG-NET an average detection of 90.6% against that of 87.5% to that of the PPB. Gender is then detected from the facial components the nose, eyes among others. The forehead recorded the highest accuracy with 92%, followed by the nose with 90%, cheeks with 89.2% and the eyes with 87% and the mouth recorded the lowest accuracy of 75%. As a result feature fusion is then carried out to improve classification accuracies especially that of the mouth and eyes with lowest accuracies. The eyes with an accuracy of 87% is fused with the forehead with 92% and the resulting accuracy is an increase to 93%. The mouth, with the lowest accuracy of 75% is fused with the nose which has an accuracy of 90% and the resulting accuracy is 87%. These results carried out by fusing through addition showed improved results. Fusion is then carried out between Appearance based and shape based features. On the FG-NET dataset using the LBP and LDP an accuracy of 85.33% and 89.53% with the PPB recording 83.13%, 89.3% for LBP and LDP respectively. As expected and shown by previous researchers the LDP clearly obtains higher classification accuracies as it than LBP as it uses gradient rather than pixel intensity. We then fuse the vectors of the LDP, LBP with that of the ASM and carry out dimensionality reduction, then fusion by addition. On the PPB dataset fusion of LDP and ASM records 81.56%, and 94.53% with the FG-NET recording 89.53% respectively

    Feature extraction on faces : from landmark localization to depth estimation

    Get PDF
    Le sujet de cette thèse porte sur les algorithmes d'apprentissage qui extraient les caractéristiques importantes des visages. Les caractéristiques d’intérêt principal sont des points clés; La localisation en deux dimensions (2D) ou en trois dimensions (3D) de traits importants du visage telles que le centre des yeux, le bout du nez et les coins de la bouche. Les points clés sont utilisés pour résoudre des tâches complexes qui ne peuvent pas être résolues directement ou qui requièrent du guidage pour l’obtention de performances améliorées, telles que la reconnaissance de poses ou de gestes, le suivi ou la vérification du visage. L'application des modèles présentés dans cette thèse concerne les images du visage; cependant, les algorithmes proposés sont plus généraux et peuvent être appliqués aux points clés de d'autres objets, tels que les mains, le corps ou des objets fabriqués par l'homme. Cette thèse est écrite par article et explore différentes techniques pour résoudre plusieurs aspects de la localisation de points clés. Dans le premier article, nous démêlons l'identité et l'expression d'un visage donné pour apprendre une distribution à priori sur l'ensemble des points clés. Cette distribution à priori est ensuite combinée avec un classifieur discriminant qui apprend une distribution de probabilité indépendante par point clé. Le modèle combiné est capable d'expliquer les différences dans les expressions pour une même représentation d'identité. Dans le deuxième article, nous proposons une architecture qui vise à conserver les caractéristiques d’images pour effectuer des tâches qui nécessitent une haute précision au niveau des pixels, telles que la localisation de points clés ou la segmentation d’images. L’architecture proposée extrait progressivement les caractéristiques les plus grossières dans les étapes d'encodage pour obtenir des informations plus globales sur l’image. Ensuite, il étend les caractéristiques grossières pour revenir à la résolution de l'image originale en recombinant les caractéristiques du chemin d'encodage. Le modèle, appelé Réseaux de Recombinaison, a obtenu l’état de l’art sur plusieurs jeux de données, tout en accélérant le temps d’apprentissage. Dans le troisième article, nous visons à améliorer la localisation des points clés lorsque peu d'images comportent des étiquettes sur des points clés. En particulier, nous exploitons une forme plus faible d’étiquettes qui sont plus faciles à acquérir ou plus abondantes tel que l'émotion ou la pose de la tête. Pour ce faire, nous proposons une architecture permettant la rétropropagation du gradient des étiquettes les plus faibles à travers des points clés, ainsi entraînant le réseau de localisation des points clés. Nous proposons également une composante de coût non supervisée qui permet des prédictions de points clés équivariantes en fonction des transformations appliquées à l'image, sans avoir les vraies étiquettes des points clés. Ces techniques ont considérablement amélioré les performances tout en réduisant le pourcentage d'images étiquetées par points clés. Finalement, dans le dernier article, nous proposons un algorithme d'apprentissage permettant d'estimer la profondeur des points clés sans aucune supervision de la profondeur. Nous y parvenons en faisant correspondre les points clés de deux visages en les transformant l'un vers l'autre. Cette transformation nécessite une estimation de la profondeur sur un visage, ainsi que une transformation affine qui transforme le premier visage au deuxième. Nous démontrons que notre formulation ne nécessite que la profondeur et que les paramètres affines peuvent être estimés avec un solution analytique impliquant les points clés augmentés par profondeur. Même en l'absence de supervision directe de la profondeur, la technique proposée extrait des valeurs de profondeur raisonnables qui diffèrent des vraies valeurs de profondeur par un facteur d'échelle et de décalage. Nous démontrons des applications d'estimation de profondeur pour la tâche de rotation de visage, ainsi que celle d'échange de visage.This thesis focuses on learning algorithms that extract important features from faces. The features of main interest are landmarks; the two dimensional (2D) or three dimensional (3D) locations of important facial features such as eye centers, nose tip, and mouth corners. Landmarks are used to solve complex tasks that cannot be solved directly or require guidance for enhanced performance, such as pose or gesture recognition, tracking, or face verification. The application of the models presented in this thesis is on facial images; however, the algorithms proposed are more general and can be applied to the landmarks of other forms of objects, such as hands, full body or man-made objects. This thesis is written by article and explores different techniques to solve various aspects of landmark localization. In the first article, we disentangle identity and expression of a given face to learn a prior distribution over the joint set of landmarks. This prior is then merged with a discriminative classifier that learns an independent probability distribution per landmark. The merged model is capable of explaining differences in expressions for the same identity representation. In the second article, we propose an architecture that aims at uncovering image features to do tasks that require high pixel-level accuracy, such as landmark localization or image segmentation. The proposed architecture gradually extracts coarser features in its encoding steps to get more global information over the image and then it expands the coarse features back to the image resolution by recombining the features of the encoding path. The model, termed Recombinator Networks, obtained state-of-the-art on several datasets, while also speeding up training. In the third article, we aim at improving landmark localization when only a few images with labelled landmarks are available. In particular, we leverage a weaker form of data labels that are easier to acquire or more abundantly available such as emotion or head pose. To do so, we propose an architecture to backpropagate gradients of the weaker labels through landmarks, effectively training the landmark localization network. We also propose an unsupervised loss component which makes equivariant landmark predictions with respect to transformations applied to the image without having ground truth landmark labels. These techniques improved performance considerably when we have a low percentage of labelled images with landmarks. Finally, in the last article, we propose a learning algorithm to estimate the depth of the landmarks without any depth supervision. We do so by matching landmarks of two faces through transforming one to another. This transformation requires estimation of depth on one face and an affine transformation that maps the first face to the second one. Our formulation, which only requires depth estimation and affine parameters, can be estimated as a closed form solution of the 2D landmarks and the estimated depth. Even without direct depth supervision, the proposed technique extracts reasonable depth values that differ from the ground truth depth values by a scale and a shift. We demonstrate applications of the estimated depth in face rotation and face replacement tasks

    3D Face Modelling, Analysis and Synthesis

    Get PDF
    Human faces have always been of a special interest to researchers in the computer vision and graphics areas. There has been an explosion in the number of studies around accurately modelling, analysing and synthesising realistic faces for various applications. The importance of human faces emerges from the fact that they are invaluable means of effective communication, recognition, behaviour analysis, conveying emotions, etc. Therefore, addressing the automatic visual perception of human faces efficiently could open up many influential applications in various domains, e.g. virtual/augmented reality, computer-aided surgeries, security and surveillance, entertainment, and many more. However, the vast variability associated with the geometry and appearance of human faces captured in unconstrained videos and images renders their automatic analysis and understanding very challenging even today. The primary objective of this thesis is to develop novel methodologies of 3D computer vision for human faces that go beyond the state of the art and achieve unprecedented quality and robustness. In more detail, this thesis advances the state of the art in 3D facial shape reconstruction and tracking, fine-grained 3D facial motion estimation, expression recognition and facial synthesis with the aid of 3D face modelling. We give a special attention to the case where the input comes from monocular imagery data captured under uncontrolled settings, a.k.a. \textit{in-the-wild} data. This kind of data are available in abundance nowadays on the internet. Analysing these data pushes the boundaries of currently available computer vision algorithms and opens up many new crucial applications in the industry. We define the four targeted vision problems (3D facial reconstruction &\& tracking, fine-grained 3D facial motion estimation, expression recognition, facial synthesis) in this thesis as the four 3D-based essential systems for the automatic facial behaviour understanding and show how they rely on each other. Finally, to aid the research conducted in this thesis, we collect and annotate a large-scale videos dataset of monocular facial performances. All of our proposed methods demonstarte very promising quantitative and qualitative results when compared to the state-of-the-art methods

    Face pose estimation with automatic 3D model creation for a driver inattention monitoring application

    Get PDF
    Texto en inglés y resumen en inglés y españolRecent studies have identified inattention (including distraction and drowsiness) as the main cause of accidents, being responsible of at least 25% of them. Driving distraction has been less studied, since it is more diverse and exhibits a higher risk factor than fatigue. In addition, it is present over half of the inattention involved crashes. The increased presence of In Vehicle Information Systems (IVIS) adds to the potential distraction risk and modifies driving behaviour, and thus research on this issue is of vital importance. Many researchers have been working on different approaches to deal with distraction during driving. Among them, Computer Vision is one of the most common, because it allows for a cost effective and non-invasive driver monitoring and sensing. Using Computer Vision techniques it is possible to evaluate some facial movements that characterise the state of attention of a driver. This thesis presents methods to estimate the face pose and gaze direction of a person in real-time, using a stereo camera as a basic for assessing driver distractions. The methods are completely automatic and user-independent. A set of features in the face are identified at initialisation, and used to create a sparse 3D model of the face. These features are tracked from frame to frame, and the model is augmented to cover parts of the face that may have been occluded before. The algorithm is designed to work in a naturalistic driving simulator, which presents challenging low light conditions. We evaluate several techniques to detect features on the face that can be matched between cameras and tracked with success. Well-known methods such as SURF do not return good results, due to the lack of salient points in the face, as well as the low illumination of the images. We introduce a novel multisize technique, based on Harris corner detector and patch correlation. This technique benefits from the better performance of small patches under rotations and illumination changes, and the more robust correlation of the bigger patches under motion blur. The head rotates in a range of ±90º in the yaw angle, and the appearance of the features change noticeably. To deal with these changes, we implement a new re-registering technique that captures new textures of the features as the face rotates. These new textures are incorporated to the model, which mixes the views of both cameras. The captures are taken at regular angle intervals for rotations in yaw, so that each texture is only used in a range of ±7.5º around the capture angle. Rotations in pitch and roll are handled using affine patch warping. The 3D model created at initialisation can only take features in the frontal part of the face, and some of these may occlude during rotations. The accuracy and robustness of the face tracking depends on the number of visible points, so new points are added to the 3D model when new parts of the face are visible from both cameras. Bundle adjustment is used to reduce the accumulated drift of the 3D reconstruction. We estimate the pose from the position of the features in the images and the 3D model using POSIT or Levenberg-Marquardt. A RANSAC process detects incorrectly tracked points, which are not considered for pose estimation. POSIT is faster, while LM obtains more accurate results. Using the model extension and the re-registering technique, we can accurately estimate the pose in the full head rotation range, with error levels that improve the state of the art. A coarse eye direction is composed with the face pose estimation to obtain the gaze and driver's fixation area, parameter which gives much information about the distraction pattern of the driver. The resulting gaze estimation algorithm proposed in this thesis has been tested on a set of driving experiments directed by a team of psychologists in a naturalistic driving simulator. This simulator mimics conditions present in real driving, including weather changes, manoeuvring and distractions due to IVIS. Professional drivers participated in the tests. The driver?s fixation statistics obtained with the proposed system show how the utilisation of IVIS influences the distraction pattern of the drivers, increasing reaction times and affecting the fixation of attention on the road and the surroundings
    corecore