262 research outputs found

    A PCA approach to the object constancy for faces using view-based models of the face

    Get PDF
    The analysis of object and face recognition by humans attracts a great deal of interest, mainly because of its many applications in various fields, including psychology, security, computer technology, medicine and computer graphics. The aim of this work is to investigate whether a PCA-based mapping approach can offer a new perspective on models of object constancy for faces in human vision. An existing system for facial motion capture and animation developed for performance-driven animation of avatars is adapted, improved and repurposed to study face representation in the context of viewpoint and lighting invariance. The main goal of the thesis is to develop and evaluate a new approach to viewpoint invariance that is view-based and allows mapping of facial variation between different views to construct a multi-view representation of the face. The thesis describes a computer implementation of a model that uses PCA to generate example- based models of the face. The work explores the joint encoding of expression and viewpoint using PCA and the mapping between viewspecific PCA spaces. The simultaneous, synchronised video recording of 6 views of the face was used to construct multi-view representations, which helped to investigate how well multiple views could be recovered from a single view via the content addressable memory property of PCA. A similar approach was taken to lighting invariance. Finally, the possibility of constructing a multi-view representation from asynchronous view-based data was explored. The results of this thesis have implications for a continuing research problem in computer vision – the problem of recognising faces and objects from different perspectives and in different lighting. It also provides a new approach to understanding viewpoint invariance and lighting invariance in human observers

    3D Human Face Reconstruction and 2D Appearance Synthesis

    Get PDF
    3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store. In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs. In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption. As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach. In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses. We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results. We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results

    Geometric Expression Invariant 3D Face Recognition using Statistical Discriminant Models

    No full text
    Currently there is no complete face recognition system that is invariant to all facial expressions. Although humans find it easy to identify and recognise faces regardless of changes in illumination, pose and expression, producing a computer system with a similar capability has proved to be particularly di cult. Three dimensional face models are geometric in nature and therefore have the advantage of being invariant to head pose and lighting. However they are still susceptible to facial expressions. This can be seen in the decrease in the recognition results using principal component analysis when expressions are added to a data set. In order to achieve expression-invariant face recognition systems, we have employed a tensor algebra framework to represent 3D face data with facial expressions in a parsimonious space. Face variation factors are organised in particular subject and facial expression modes. We manipulate this using single value decomposition on sub-tensors representing one variation mode. This framework possesses the ability to deal with the shortcomings of PCA in less constrained environments and still preserves the integrity of the 3D data. The results show improved recognition rates for faces and facial expressions, even recognising high intensity expressions that are not in the training datasets. We have determined, experimentally, a set of anatomical landmarks that best describe facial expression e ectively. We found that the best placement of landmarks to distinguish di erent facial expressions are in areas around the prominent features, such as the cheeks and eyebrows. Recognition results using landmark-based face recognition could be improved with better placement. We looked into the possibility of achieving expression-invariant face recognition by reconstructing and manipulating realistic facial expressions. We proposed a tensor-based statistical discriminant analysis method to reconstruct facial expressions and in particular to neutralise facial expressions. The results of the synthesised facial expressions are visually more realistic than facial expressions generated using conventional active shape modelling (ASM). We then used reconstructed neutral faces in the sub-tensor framework for recognition purposes. The recognition results showed slight improvement. Besides biometric recognition, this novel tensor-based synthesis approach could be used in computer games and real-time animation applications

    FreeStyleGAN: Free-view Editable Portrait Rendering with the Camera Manifold

    Get PDF
    International audienceCurrent Generative Adversarial Networks (GANs) produce photorealisticrenderings of portrait images. Embedding real images into the latent spaceof such models enables high-level image editing. While recent methodsprovide considerable semantic control over the (re-)generated images, theycan only generate a limited set of viewpoints and cannot explicitly controlthe camera. Such 3D camera control is required for 3D virtual and mixedreality applications. In our solution, we use a few images of a face to perform3D reconstruction, and we introduce the notion of the GAN camera manifold,the key element allowing us to precisely define the range of images that theGAN can reproduce in a stable manner. We train a small face-specific neuralimplicit representation network to map a captured face to this manifoldand complement it with a warping scheme to obtain free-viewpoint novel-view synthesis. We show how our approach ś due to its precise cameracontrol ś enables the integration of a pre-trained StyleGAN into standard 3Drendering pipelines, allowing e.g., stereo rendering or consistent insertionof faces in synthetic 3D environments. Our solution proposes the first trulyfree-viewpoint rendering of realistic faces at interactive rates, using onlya small number of casual photos as input, while simultaneously allowingsemantic editing capabilities, such as facial expression or lighting changes

    3D reconstruction for plastic surgery simulation based on statistical shape models

    Get PDF
    This thesis has been accomplished in Crisalix in collaboration with the Universitat Pompeu Fabra within the program of Doctorats Industrials. Crisalix has the mission of enhancing the communication between professionals of plastic surgery and patients by providing a solution to the most common question during the surgery planning process of ``How will I look after the surgery?''. The solution proposed by Crisalix is based in 3D imaging technology. This technology generates the 3D reconstruction that accurately represents the area of the patient that is going to be operated. This is followed by the possibility of creating multiple simulations of the plastic procedure, which results in the representation of the possible outcomes of the surgery. This thesis presents a framework capable to reconstruct 3D shapes of faces and breasts of plastic surgery patients from 2D images and 3D scans. The 3D reconstruction of an object is a challenging problem with many inherent ambiguities. Statistical model based methods are a powerful approach to overcome some of these ambiguities. We follow the intuition of maximizing the use of available prior information by introducing it into statistical model based methods to enhance their properties. First, we explore Active Shape Models (ASM) which are a well known method to perform 2D shapes alignment. However, it is challenging to maintain prior information (e.g. small set of given landmarks) unchanged once the statistical model constraints are applied. We propose a new weighted regularized projection into the parameter space which allows us to obtain shapes that at the same time fulfill the imposed shape constraints and are plausible according to the statistical model. Second, we extend this methodology to be applied to 3D Morphable Models (3DMM), which are a widespread method to perform 3D reconstruction. However, existing methods present some limitations. Some of them are based in non-linear optimizations computationally expensive that can get stuck in local minima. Another limitation is that not all the methods provide enough resolution to represent accurately the anatomy details needed for this application. Given the medical use of the application, the accuracy and robustness of the method, are important factors to take into consideration. We show how 3DMM initialization and 3DMM fitting can be improved using our weighted regularized projection. Finally, we present a framework capable to reconstruct 3D shapes of plastic surgery patients from two possible inputs: 2D images and 3D scans. Our method is used in different stages of the 3D reconstruction pipeline: shape alignment; 3DMM initialization and 3DMM fitting. The developed methods have been integrated in the production environment of Crisalix, proving their validity.Aquesta tesi ha estat realitzada a Crisalix amb la col·laboració de la Universitat Pompeu Fabra sota el pla de Doctorats Industrials. Crisalix té com a objectiu la millora de la comunicació entre els professionals de la cirurgia plàstica i els pacients, proporcionant una solució a la pregunta que sorgeix més freqüentment durant el procés de planificació d'una operació quirúrgica ``Com em veuré després de la cirurgia?''. La solució proposada per Crisalix està basada en la tecnologia d'imatge 3D. Aquesta tecnologia genera la reconstrucció 3D de la zona del pacient operada, seguit de la possibilitat de crear múltiples simulacions obtenint la representació dels possibles resultats de la cirurgia. Aquesta tesi presenta un sistema capaç de reconstruir cares i pits de pacients de cirurgia plàstica a partir de fotos 2D i escanegis. La reconstrucció en 3D d'un objecte és un problema complicat degut a la presència d'ambigüitats. Els mètodes basats en models estadístics son adequats per mitigar-les. En aquest treball, hem seguit la intuïció de maximitzar l'ús d'informació prèvia, introduint-la al model estadístic per millorar les seves propietats. En primer lloc, explorem els Active Shape Models (ASM) que són un conegut mètode fet servir per alinear contorns d'objectes 2D. No obstant, un cop aplicades les correccions de forma del model estadístic, es difícil de mantenir informació de la que es disposava a priori (per exemple, un petit conjunt de punts donat) inalterada. Proposem una nova projecció ponderada amb un terme de regularització, que permet obtenir formes que compleixen les restriccions de forma imposades i alhora són plausibles en concordança amb el model estadístic. En segon lloc, ampliem la metodologia per aplicar-la als anomenats 3D Morphable Models (3DMM) que són un mètode extensivament utilitzat per fer reconstrucció 3D. No obstant, els mètodes de 3DMM existents presenten algunes limitacions. Alguns estan basats en optimitzacions no lineals, computacionalment costoses i que poden quedar atrapades en mínims locals. Una altra limitació, és que no tots el mètodes proporcionen la resolució adequada per representar amb precisió els detalls de l'anatomia. Donat l'ús mèdic de l'aplicació, la precisió i la robustesa són factors molt importants a tenir en compte. Mostrem com la inicialització i l'ajustament de 3DMM poden ser millorats fent servir la projecció ponderada amb regularització proposada. Finalment, es presenta un sistema capaç de reconstruir models 3D de pacients de cirurgia plàstica a partir de dos possibles tipus de dades: imatges 2D i escaneigs en 3D. El nostre mètode es fa servir en diverses etapes del procés de reconstrucció: alineament de formes en imatge, la inicialització i l'ajustament de 3DMM. Els mètodes desenvolupats han estat integrats a l'entorn de producció de Crisalix provant la seva validesa

    ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation

    Full text link
    Gaze estimation is a fundamental task in many applications of computer vision, human computer interaction and robotics. Many state-of-the-art methods are trained and tested on custom datasets, making comparison across methods challenging. Furthermore, existing gaze estimation datasets have limited head pose and gaze variations, and the evaluations are conducted using different protocols and metrics. In this paper, we propose a new gaze estimation dataset called ETH-XGaze, consisting of over one million high-resolution images of varying gaze under extreme head poses. We collect this dataset from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets. We show that our dataset can significantly improve the robustness of gaze estimation methods across different head poses and gaze angles. Additionally, we define a standardized experimental protocol and evaluation metric on ETH-XGaze, to better unify gaze estimation research going forward. The dataset and benchmark website are available at https://ait.ethz.ch/projects/2020/ETH-XGazeComment: Accepted at ECCV 2020 (Spotlight

    RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

    Full text link
    Synthesizing high-fidelity head avatars is a central problem for computer vision and graphics. While head avatar synthesis algorithms have advanced rapidly, the best ones still face great obstacles in real-world scenarios. One of the vital causes is inadequate datasets -- 1) current public datasets can only support researchers to explore high-fidelity head avatars in one or two task directions; 2) these datasets usually contain digital head assets with limited data volume, and narrow distribution over different attributes. In this paper, we present RenderMe-360, a comprehensive 4D human head dataset to drive advance in head avatar research. It contains massive data assets, with 243+ million complete head frames, and over 800k video sequences from 500 different identities captured by synchronized multi-view cameras at 30 FPS. It is a large-scale digital library for head avatars with three key attributes: 1) High Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K cameras in 360 degrees. 2) High Diversity: The collected subjects vary from different ages, eras, ethnicities, and cultures, providing abundant materials with distinctive styles in appearance and geometry. Moreover, each subject is asked to perform various motions, such as expressions and head rotations, which further extend the richness of assets. 3) Rich Annotations: we provide annotations with different granularities: cameras' parameters, matting, scan, 2D/3D facial landmarks, FLAME fitting, and text description. Based on the dataset, we build a comprehensive benchmark for head avatar research, with 16 state-of-the-art methods performed on five main tasks: novel view synthesis, novel expression synthesis, hair rendering, hair editing, and talking head generation. Our experiments uncover the strengths and weaknesses of current methods. RenderMe-360 opens the door for future exploration in head avatars.Comment: Technical Report; Project Page: 36; Github Link: https://github.com/RenderMe-360/RenderMe-36

    3D face modelling from sparse data

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore