44 research outputs found

    DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image

    Get PDF
    We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust model for 3D Dense Head Alignment in the wild. It contains annotations of over 3.5K landmarks that accurately represent 3D head shape compared to the ground-truth scans. The data-driven model, DAD-3DNet, trained on our dataset, learns shape, expression, and pose parameters, and performs 3D reconstruction of a FLAME mesh. The model also incorporates a landmark prediction branch to take advantage of rich supervision and co-training of multiple related tasks. Experimentally, DAD-3DNet outperforms or is comparable to the state-of-the-art models in (i) 3D Head Pose Estimation on AFLW2000-3D and BIWI, (ii) 3D Face Shape Reconstruction on NoW and Feng, and (iii) 3D Dense Head Alignment and 3D Landmarks Estimation on DAD-3DHeads dataset. Finally, the diversity of DAD-3DHeads in camera angles, facial expressions, and occlusions enables a benchmark to study in-the-wild generalization and robustness to distribution shifts. The dataset webpage is https://p.farm/research/dad-3dheads

    Artificial Intelligence Tools for Facial Expression Analysis.

    Get PDF
    Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier

    3D reconstruction for plastic surgery simulation based on statistical shape models

    Get PDF
    This thesis has been accomplished in Crisalix in collaboration with the Universitat Pompeu Fabra within the program of Doctorats Industrials. Crisalix has the mission of enhancing the communication between professionals of plastic surgery and patients by providing a solution to the most common question during the surgery planning process of ``How will I look after the surgery?''. The solution proposed by Crisalix is based in 3D imaging technology. This technology generates the 3D reconstruction that accurately represents the area of the patient that is going to be operated. This is followed by the possibility of creating multiple simulations of the plastic procedure, which results in the representation of the possible outcomes of the surgery. This thesis presents a framework capable to reconstruct 3D shapes of faces and breasts of plastic surgery patients from 2D images and 3D scans. The 3D reconstruction of an object is a challenging problem with many inherent ambiguities. Statistical model based methods are a powerful approach to overcome some of these ambiguities. We follow the intuition of maximizing the use of available prior information by introducing it into statistical model based methods to enhance their properties. First, we explore Active Shape Models (ASM) which are a well known method to perform 2D shapes alignment. However, it is challenging to maintain prior information (e.g. small set of given landmarks) unchanged once the statistical model constraints are applied. We propose a new weighted regularized projection into the parameter space which allows us to obtain shapes that at the same time fulfill the imposed shape constraints and are plausible according to the statistical model. Second, we extend this methodology to be applied to 3D Morphable Models (3DMM), which are a widespread method to perform 3D reconstruction. However, existing methods present some limitations. Some of them are based in non-linear optimizations computationally expensive that can get stuck in local minima. Another limitation is that not all the methods provide enough resolution to represent accurately the anatomy details needed for this application. Given the medical use of the application, the accuracy and robustness of the method, are important factors to take into consideration. We show how 3DMM initialization and 3DMM fitting can be improved using our weighted regularized projection. Finally, we present a framework capable to reconstruct 3D shapes of plastic surgery patients from two possible inputs: 2D images and 3D scans. Our method is used in different stages of the 3D reconstruction pipeline: shape alignment; 3DMM initialization and 3DMM fitting. The developed methods have been integrated in the production environment of Crisalix, proving their validity.Aquesta tesi ha estat realitzada a Crisalix amb la col·laboració de la Universitat Pompeu Fabra sota el pla de Doctorats Industrials. Crisalix té com a objectiu la millora de la comunicació entre els professionals de la cirurgia plàstica i els pacients, proporcionant una solució a la pregunta que sorgeix més freqüentment durant el procés de planificació d'una operació quirúrgica ``Com em veuré després de la cirurgia?''. La solució proposada per Crisalix està basada en la tecnologia d'imatge 3D. Aquesta tecnologia genera la reconstrucció 3D de la zona del pacient operada, seguit de la possibilitat de crear múltiples simulacions obtenint la representació dels possibles resultats de la cirurgia. Aquesta tesi presenta un sistema capaç de reconstruir cares i pits de pacients de cirurgia plàstica a partir de fotos 2D i escanegis. La reconstrucció en 3D d'un objecte és un problema complicat degut a la presència d'ambigüitats. Els mètodes basats en models estadístics son adequats per mitigar-les. En aquest treball, hem seguit la intuïció de maximitzar l'ús d'informació prèvia, introduint-la al model estadístic per millorar les seves propietats. En primer lloc, explorem els Active Shape Models (ASM) que són un conegut mètode fet servir per alinear contorns d'objectes 2D. No obstant, un cop aplicades les correccions de forma del model estadístic, es difícil de mantenir informació de la que es disposava a priori (per exemple, un petit conjunt de punts donat) inalterada. Proposem una nova projecció ponderada amb un terme de regularització, que permet obtenir formes que compleixen les restriccions de forma imposades i alhora són plausibles en concordança amb el model estadístic. En segon lloc, ampliem la metodologia per aplicar-la als anomenats 3D Morphable Models (3DMM) que són un mètode extensivament utilitzat per fer reconstrucció 3D. No obstant, els mètodes de 3DMM existents presenten algunes limitacions. Alguns estan basats en optimitzacions no lineals, computacionalment costoses i que poden quedar atrapades en mínims locals. Una altra limitació, és que no tots el mètodes proporcionen la resolució adequada per representar amb precisió els detalls de l'anatomia. Donat l'ús mèdic de l'aplicació, la precisió i la robustesa són factors molt importants a tenir en compte. Mostrem com la inicialització i l'ajustament de 3DMM poden ser millorats fent servir la projecció ponderada amb regularització proposada. Finalment, es presenta un sistema capaç de reconstruir models 3D de pacients de cirurgia plàstica a partir de dos possibles tipus de dades: imatges 2D i escaneigs en 3D. El nostre mètode es fa servir en diverses etapes del procés de reconstrucció: alineament de formes en imatge, la inicialització i l'ajustament de 3DMM. Els mètodes desenvolupats han estat integrats a l'entorn de producció de Crisalix provant la seva validesa

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    High-quality face capture, animation and editing from monocular video

    Get PDF
    Digitization of virtual faces in movies requires complex capture setups and extensive manual work to produce superb animations and video-realistic editing. This thesis pushes the boundaries of the digitization pipeline by proposing automatic algorithms for high-quality 3D face capture and animation, as well as photo-realistic face editing. These algorithms reconstruct and modify faces in 2D videos recorded in uncontrolled scenarios and illumination. In particular, advances in three main areas offer solutions for the lack of depth and overall uncertainty in video recordings. First, contributions in capture include model-based reconstruction of detailed, dynamic 3D geometry that exploits optical and shading cues, multilayer parametric reconstruction of accurate 3D models in unconstrained setups based on inverse rendering, and regression-based 3D lip shape enhancement from high-quality data. Second, advances in animation are video-based face reenactment based on robust appearance metrics and temporal clustering, performance-driven retargeting of detailed facial models in sync with audio, and the automatic creation of personalized controllable 3D rigs. Finally, advances in plausible photo-realistic editing are dense face albedo capture and mouth interior synthesis using image warping and 3D teeth proxies. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of photo-realistic 3D faces.Die Digitalisierung von Gesichtern zum Einsatz in der Filmindustrie erfordert komplizierte Aufnahmevorrichtungen und die manuelle Nachbearbeitung von Rekonstruktionen, um perfekte Animationen und realistische Videobearbeitung zu erzielen. Diese Dissertation erweitert vorhandene Digitalisierungsverfahren durch die Erforschung von automatischen Verfahren zur qualitativ hochwertigen 3D Rekonstruktion, Animation und Modifikation von Gesichtern. Diese Algorithmen erlauben es, Gesichter in 2D Videos, die unter allgemeinen Bedingungen und unbekannten Beleuchtungsverhältnissen aufgenommen wurden, zu rekonstruieren und zu modifizieren. Vor allem Fortschritte in den folgenden drei Hauptbereichen tragen zur Kompensation von fehlender Tiefeninformation und der allgemeinen Mehrdeutigkeit von 2D Videoaufnahmen bei. Erstens, Beiträge zur modellbasierten Rekonstruktion von detaillierter und dynamischer 3D Geometrie durch optische Merkmale und die Shading-Eigenschaften des Gesichts, mehrschichtige parametrische Rekonstruktion von exakten 3D Modellen mittels inversen Renderings in allgemeinen Szenen und regressionsbasierter 3D Lippenformverfeinerung mittels qualitativ hochwertigen Daten. Zweitens, Fortschritte im Bereich der Computeranimation durch videobasierte Gesichtsausdrucksübertragung und temporaler Clusterbildung, Übertragung von detaillierten Gesichtsmodellen, deren Mundbewegung mit Ton synchronisiert ist, und die automatische Erstellung von personalisierten "3D Face Rigs". Schließlich werden Fortschritte im Bereich der realistischen Videobearbeitung vorgestellt, welche auf der dichten Rekonstruktion von Hautreflektionseigenschaften und der Mundinnenraumsynthese mittels bildbasierten und geometriebasierten Verfahren aufbauen. Qualitativ hochwertige Ergebnisse in anspruchsvollen Anwendungen untermauern die Wichtigkeit der geleisteten Beiträgen und zeigen das große Potential der automatischen Erstellung von realistischen digitalen 3D Gesichtern auf

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area
    corecore