485 research outputs found

    Gazedirector: Fully articulated eye gaze redirection in video

    Get PDF
    We present GazeDirector, a new approach for eye gaze redirection that uses model-fitting. Our method first tracks the eyes by fitting a multi-part eye region model to video frames using analysis-by-synthesis, thereby recovering eye region shape, texture, pose, and gaze simultaneously. It then redirects gaze by 1) warping the eyelids from the original image using a model-derived flow field, and 2) rendering and compositing synthesized 3D eyeballs onto the output image in a photorealistic manner. GazeDirector allows us to change where people are looking without person-specific training data, and with full articulation, i.e. we can precisely specify new gaze directions in 3D. Quantitatively, we evaluate both model-fitting and gaze synthesis, with experiments for gaze estimation and redirection on the Columbia gaze dataset. Qualitatively, we compare GazeDirector against recent work on gaze redirection, showing better results especially for large redirection angles. Finally, we demonstrate gaze redirection on YouTube videos by introducing new 3D gaze targets and by manipulating visual behavior

    FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances

    Full text link
    Visual effects commonly requires both the creation of realistic synthetic humans as well as retargeting actors' performances to humanoid characters such as aliens and monsters. Achieving the expressive performances demanded in entertainment requires manipulating complex models with hundreds of parameters. Full creative control requires the freedom to make edits at any stage of the production, which prohibits the use of a fully automatic ``black box'' solution with uninterpretable parameters. On the other hand, producing realistic animation with these sophisticated models is difficult and laborious. This paper describes FDLS (Facial Deep Learning Solver), which is Weta Digital's solution to these challenges. FDLS adopts a coarse-to-fine and human-in-the-loop strategy, allowing a solved performance to be verified and edited at several stages in the solving process. To train FDLS, we first transform the raw motion-captured data into robust graph features. Secondly, based on the observation that the artists typically finalize the jaw pass animation before proceeding to finer detail, we solve for the jaw motion first and predict fine expressions with region-based networks conditioned on the jaw position. Finally, artists can optionally invoke a non-linear finetuning process on top of the FDLS solution to follow the motion-captured virtual markers as closely as possible. FDLS supports editing if needed to improve the results of the deep learning solution and it can handle small daily changes in the actor's face shape. FDLS permits reliable and production-quality performance solving with minimal training and little or no manual effort in many cases, while also allowing the solve to be guided and edited in unusual and difficult cases. The system has been under development for several years and has been used in major movies.Comment: DigiPro '22: The Digital Production Symposiu

    GIF: Generative Interpretable Faces

    Full text link
    Photo-realistic visualization and animation of expressive human faces have been a long standing challenge. 3D face modeling methods provide parametric control but generates unrealistic images, on the other hand, generative 2D models like GANs (Generative Adversarial Networks) output photo-realistic face images, but lack explicit control. Recent methods gain partial control, either by attempting to disentangle different factors in an unsupervised manner, or by adding control post hoc to a pre-trained model. Unconditional GANs, however, may entangle factors that are hard to undo later. We condition our generative model on pre-defined control parameters to encourage disentanglement in the generation process. Specifically, we condition StyleGAN2 on FLAME, a generative 3D face model. While conditioning on FLAME parameters yields unsatisfactory results, we find that conditioning on rendered FLAME geometry and photometric details works well. This gives us a generative 2D face model named GIF (Generative Interpretable Faces) that offers FLAME's parametric control. Here, interpretable refers to the semantic meaning of different parameters. Given FLAME parameters for shape, pose, expressions, parameters for appearance, lighting, and an additional style vector, GIF outputs photo-realistic face images. We perform an AMT based perceptual study to quantitatively and qualitatively evaluate how well GIF follows its conditioning. The code, data, and trained model are publicly available for research purposes at http://gif.is.tue.mpg.de.Comment: International Conference on 3D Vision (3DV) 202

    Deep into the Eyes: Applying Machine Learning to improve Eye-Tracking

    Get PDF
    Eye-tracking has been an active research area with applications in personal and behav- ioral studies, medical diagnosis, virtual reality, and mixed reality applications. Improving the robustness, generalizability, accuracy, and precision of eye-trackers while maintaining privacy is crucial. Unfortunately, many existing low-cost portable commercial eye trackers suffer from signal artifacts and a low signal-to-noise ratio. These trackers are highly depen- dent on low-level features such as pupil edges or diffused bright spots in order to precisely localize the pupil and corneal reflection. As a result, they are not reliable for studying eye movements that require high precision, such as microsaccades, smooth pursuit, and ver- gence. Additionally, these methods suffer from reflective artifacts, occlusion of the pupil boundary by the eyelid and often require a manual update of person-dependent parame- ters to identify the pupil region. In this dissertation, I demonstrate (I) a new method to improve precision while maintaining the accuracy of head-fixed eye trackers by combin- ing velocity information from iris textures across frames with position information, (II) a generalized semantic segmentation framework for identifying eye regions with a further extension to identify ellipse fits on the pupil and iris, (III) a data-driven rendering pipeline to generate a temporally contiguous synthetic dataset for use in many eye-tracking ap- plications, and (IV) a novel strategy to preserve privacy in eye videos captured as part of the eye-tracking process. My work also provides the foundation for future research by addressing critical questions like the suitability of using synthetic datasets to improve eye-tracking performance in real-world applications, and ways to improve the precision of future commercial eye trackers with improved camera specifications

    EMS: 3D Eyebrow Modeling from Single-view Images

    Full text link
    Eyebrows play a critical role in facial expression and appearance. Although the 3D digitization of faces is well explored, less attention has been drawn to 3D eyebrow modeling. In this work, we propose EMS, the first learning-based framework for single-view 3D eyebrow reconstruction. Following the methods of scalp hair reconstruction, we also represent the eyebrow as a set of fiber curves and convert the reconstruction to fibers growing problem. Three modules are then carefully designed: RootFinder firstly localizes the fiber root positions which indicates where to grow; OriPredictor predicts an orientation field in the 3D space to guide the growing of fibers; FiberEnder is designed to determine when to stop the growth of each fiber. Our OriPredictor is directly borrowing the method used in hair reconstruction. Considering the differences between hair and eyebrows, both RootFinder and FiberEnder are newly proposed. Specifically, to cope with the challenge that the root location is severely occluded, we formulate root localization as a density map estimation task. Given the predicted density map, a density-based clustering method is further used for finding the roots. For each fiber, the growth starts from the root point and moves step by step until the ending, where each step is defined as an oriented line with a constant length according to the predicted orientation field. To determine when to end, a pixel-aligned RNN architecture is designed to form a binary classifier, which outputs stop or not for each growing step. To support the training of all proposed networks, we build the first 3D synthetic eyebrow dataset that contains 400 high-quality eyebrow models manually created by artists. Extensive experiments have demonstrated the effectiveness of the proposed EMS pipeline on a variety of different eyebrow styles and lengths, ranging from short and sparse to long bushy eyebrows.Comment: To appear in SIGGRAPH Asia 2023 (Journal Track). 19 pages, 19 figures, 6 table
    corecore