165 research outputs found

    Exploring new methods for measuring, analyzing, and visualizing facial expressions

    Get PDF
    We explore new methods for measuring, analyzing, and visualizing facial expressions and demonstrate the utility of these methods in a case study on polar questions in Sign Language of the Netherlands

    Enhancing Mesh Deformation Realism: Dynamic Mesostructure Detailing and Procedural Microstructure Synthesis

    Get PDF
    Propomos uma solução para gerar dados de mapas de relevo dinâmicos para simular deformações em superfícies macias, com foco na pele humana. A solução incorpora a simulação de rugas ao nível mesoestrutural e utiliza texturas procedurais para adicionar detalhes de microestrutura estáticos. Oferece flexibilidade além da pele humana, permitindo a geração de padrões que imitam deformações em outros materiais macios, como couro, durante a animação. As soluções existentes para simular rugas e pistas de deformação frequentemente dependem de hardware especializado, que é dispendioso e de difícil acesso. Além disso, depender exclusivamente de dados capturados limita a direção artística e dificulta a adaptação a mudanças. Em contraste, a solução proposta permite a síntese dinâmica de texturas que se adaptam às deformações subjacentes da malha de forma fisicamente plausível. Vários métodos foram explorados para sintetizar rugas diretamente na geometria, mas sofrem de limitações como auto-interseções e maiores requisitos de armazenamento. A intervenção manual de artistas na criação de mapas de rugas e mapas de tensão permite controle, mas pode ser limitada em deformações complexas ou onde maior realismo seja necessário. O nosso trabalho destaca o potencial dos métodos procedimentais para aprimorar a geração de padrões de deformação dinâmica, incluindo rugas, com maior controle criativo e sem depender de dados capturados. A incorporação de padrões procedimentais estáticos melhora o realismo, e a abordagem pode ser estendida além da pele para outros materiais macios.We propose a solution for generating dynamic heightmap data to simulate deformations for soft surfaces, with a focus on human skin. The solution incorporates mesostructure-level wrinkles and utilizes procedural textures to add static microstructure details. It offers flexibility beyond human skin, enabling the generation of patterns mimicking deformations in other soft materials, such as leater, during animation. Existing solutions for simulating wrinkles and deformation cues often rely on specialized hardware, which is costly and not easily accessible. Moreover, relying solely on captured data limits artistic direction and hinders adaptability to changes. In contrast, our proposed solution provides dynamic texture synthesis that adapts to underlying mesh deformations. Various methods have been explored to synthesize wrinkles directly to the geometry, but they suffer from limitations such as self-intersections and increased storage requirements. Manual intervention by artists using wrinkle maps and tension maps provides control but may be limited to the physics-based simulations. Our research presents the potential of procedural methods to enhance the generation of dynamic deformation patterns, including wrinkles, with greater creative control and without reliance on captured data. Incorporating static procedural patterns improves realism, and the approach can be extended to other soft-materials beyond skin

    Neural Volumetric Blendshapes: Computationally Efficient Physics-Based Facial Blendshapes

    Full text link
    Computationally weak systems and demanding graphical applications are still mostly dependent on linear blendshapes for facial animations. The accompanying artifacts such as self-intersections, loss of volume, or missing soft tissue elasticity can be avoided by using physics-based animation models. However, these are cumbersome to implement and require immense computational effort. We propose neural volumetric blendshapes, an approach that combines the advantages of physics-based simulations with realtime runtimes even on consumer-grade CPUs. To this end, we present a neural network that efficiently approximates the involved volumetric simulations and generalizes across human identities as well as facial expressions. Our approach can be used on top of any linear blendshape system and, hence, can be deployed straightforwardly. Furthermore, it only requires a single neutral face mesh as input in the minimal setting. Along with the design of the network, we introduce a pipeline for the challenging creation of anatomically and physically plausible training data. Part of the pipeline is a novel hybrid regressor that densely positions a skull within a skin surface while avoiding intersections. The fidelity of all parts of the data generation pipeline as well as the accuracy and efficiency of the network are evaluated in this work. Upon publication, the trained models and associated code will be released

    Neural radiance fields for heads: towards accurate digital avatars

    Get PDF
    La digitalització d'éssers humans en entorns 3D ha estat durant dècades objecte d'estudi en la visió per computador i els gràfics digitals, però és encara un problema obert. Avui dia, cap tecnologia és capaç de digitalitzar persones amb una qualitat i dinamisme excel·lent, i que pugui ser utilitzada en motors 3D, com ara un casc de realitat virtual o un telèfon mòbil, en temps real. En aquesta tesi, intentem contribuir a aquest problema explorant com combinar els dos mètodes més usats en els últims anys: \textit{neural radiance fields} i models paramètrics 3D. Intentem dissenyar un model capaç de crear avatars digitals i animables de cares humans a velocitats raonables. La nostra feina se centra principalment a crear un model d'aprenentatge automàtic capaç de generar un avatar facial a partir d'una col·lecció d'imatges i càmeres, però també desenvolupem una eina per integrar l'obtenció d'aquestes dades, de manera que podem provar el nostre mètode en dades reals. A més, també implementem una llibreria per generar dades sintètiques, per tal de controlar els errors que podrien sorgir quan s'obtenen dades reals, per exemple problemes amb la calibració de les càmeres, i facilitar el desenvolupament d'altres projectes relacionats amb humans.La digitalización de seres humanos en entornos 3D ha sido durante décadas objeto de estudio en la visión por computador y los gráficos digitales, pero es aún un problema abierto. Actualmente, ninguna tecnología es capaz de digitalizar personas con una cualidad y dinamismo excelente, y que pueda ser usada en motores 3D, como por ejemplo un casco de realidad virtual o un teléfono móvil, en tiempo real. En esta tesis, intentamos contribuir a este problema explorando como combinar los dos métodos más usados en los últimos años: \textit{neural radiance fields} y modelos paramétricos 3D. Intentamos diseñar un modelo capaz de crear avatares digitales y animables de caras humanas a velocidades razonables. Nuestro trabajo se centra principalmente en crear un modelo de aprendizaje automático capaz de generar un avatar facial a partir de una colección de imágenes y cámaras, pero también desarrollamos una herramienta para obtener estos datos, de manera que podemos probar nuestro método en datos reales. Además, también implementamos una librería para generar datos sintéticos, para poder controlar los errores que pueden surgir al obtener datos reales, como problemas con la calibración de las cámaras, y facilitar el desarrollo de otros proyectos relacionados con humanos.Digitalizing humans in 3D environments has been a subject of study in computer vision and computer graphics for decades, but it still remains an open problem. No current technology can digitalize humans with excellent quality and dynamism that can be used in 3D engines, such as in a virtual reality headset or a mobile phone, at real-time speeds. In this thesis, we aim to contribute to this problem by exploring how to combine the two most commonly used approaches in recent years: neural radiance fields and parametric 3D meshes. We attempt to design a model capable of creating digital, animatable avatars of human faces at reasonable speeds. Our work focuses mostly on creating a machine learning model capable of generating a facial avatar from a set of images and camera poses, but we will also build a pipeline to integrate all steps of obtaining such data, allowing us to demonstrate our method in real-world data. Additionally, we implement a framework to generate synthetic data, in order to alleviate the errors in obtaining real-data, such as problems with camera calibration, and facilitate the development of other human-related projects.Outgoin

    High-fidelity Interpretable Inverse Rig: An Accurate and Sparse Solution Optimizing the Quartic Blendshape Model

    Full text link
    We propose a method to fit arbitrarily accurate blendshape rig models by solving the inverse rig problem in realistic human face animation. The method considers blendshape models with different levels of added corrections and solves the regularized least-squares problem using coordinate descent, i.e., iteratively estimating blendshape weights. Besides making the optimization easier to solve, this approach ensures that mutually exclusive controllers will not be activated simultaneously and improves the goodness of fit after each iteration. We show experimentally that the proposed method yields solutions with mesh error comparable to or lower than the state-of-the-art approaches while significantly reducing the cardinality of the weight vector (over 20 percent), hence giving a high-fidelity reconstruction of the reference expression that is easier to manipulate in the post-production manually. Python scripts for the algorithm will be publicly available upon acceptance of the paper

    Digital Ansiktsrekonstruktion genom Differentierbar Rendering med Ansiktrigprior

    Get PDF
    Realistic facial animation is an important component in creating believable characters in digital media. Facial performance capture attempts to solve this need by recording actor performances and reproducing them digitally. One approach in performance capture is the use of differentiable rendering: an analysis-by-synthesis approach where candidate images are rendered and compared against reference material to optimize scene parameters such as geometry and texture in order to match the reference as closely as possible. A differentiable renderer makes this possible by computing the gradient of an objective function over the scene parameters. This Thesis aims to explore differentiable rendering for inferring facial animation from markerless multi-view reference video footage. This has been done before, but the approaches have not been directly applicable in the video game industry where head stabilization data and a semantically meaningful animation basis is crucial for further processing. To obtain these advantages we leverage a highly tailored facial rig as prior data: a facial model based on blendshapes, parametrized as a linear combination of meshes representing a range of facial expressions, common in the industry to control character animation. We design and implement a facial performance capture pipeline for Remedy Entertainment as an open-source contribution and infer animation with varying configurations. The underlying optimization architecture is built on Nvidia's nvdiffrast-library, a Python- and PyTorch-based differentiable rendering framework that utilizes highly-optimized graphics pipelines on GPUs to accelerate performance. Experiments with the implemented pipeline show that staying completely on-model with a blendshape-based facial rig as prior data provides both advantages and disadvantages. Although we propose numerous improvements, the animation quality is of insufficient cinematic quality, particularly with more extreme facial expressions. However, we benefit from shape constraints set by our rig prior and see computation simplicity by only learning per-frame shape activations, instead of the shapes themselves, as done in previous works. Moreover, we obtain head stabilization data that is important to have down the line in video game production, and the use of blendshapes as the basis of our resulting animation enables semantically meaningful editing after inference.Realistisk ansiktsanimation är viktigt för att skapa trovärdiga karaktärer i digital media. Vid animering av ansikten används ofta digital ansiktsrekonstruktion: processen av att filma skådespelare och återskapa dem i digitalt format. En metod i digital i ansiktsrekonstruktion baserar sig på differentierbar rendering. I processen genereras kandidatbilder som sedan jämförs med referensbilder för att optimera scenparametrar, så som geometri och textur, för att motsvara videodata så nära som möjligt. En differentierbar renderare möjliggör detta genom att beräkna gradienten av en målfunktion med avseende på scenparametrarna. Syftet med detta arbete är att utforska differentierbar rendering för skapandet av ansiktsanimation från videodata utan markers, med flera kameravinklar. Även om detta gjorts förut, har resultaten inte varit direkt applicerbara i spelindustrin där stabilisering av huvudrörelser samt semantiskt betydelsefulla kontroller är centrala för vidare behandling. För att nå dessa fördelar använder vi en ansiktsrig som baserar sig på blendshapes, alltså en datormodell av ett ansikte som genererar utseenden genom linjära kombinationer av tredimensionella modeller, som representerar olika ansiktsuttryck. Dessa modeller är vanliga inom spelindustrin för att kontrollera ansiktsanimation. Vi utvecklar ett digitalt ansiktsrekonstruktionssystem som öppen källkod för Remedy Entertainment och skapar ansiktsanimation med ett antal olika konfigurationer. Den underliggande programarkitekturen baserar sig på Nvidias Nvdiffrast-programvarubibliotek: en Python- och PyTorch-baserad differentierbar renderare som räknar högt optimerade grafikoperationer på grafikkort. Experiment med det utvecklade systemet avslöjar att användningen av en rig i ansiktsrekonstruktion för med sig såväl fördelar som nackdelar. Trots ett stort antal förbättringsförslag är kvalitén på vår ansiktsanimation inte tillräckligt hög för att möta kraven i realistiska videospel, speciellt då skådespelaren gör överdrivna ansiktsuttryck. Däremot förenklar ansiktsrigen beräkningarna och förhindrar animationen från att producera anatomiskt inkorrekta uttryck. Detta görs genom att optimera enbart aktiveringarna av blendshape-komponenter för varje bild i filmsekvensen, istället för att ytterligare skapa själva uttrycken, så som gjorts i tidigare forskning. Samtidigt får vi stabilisering av huvudrörelser samt semantiskt betydelsefulla animationskontroller som animerare kan använda för vidarebehandling av animationen

    An Actor-Centric Approach to Facial Animation Control by Neural Networks For Non-Player Characters in Video Games

    Get PDF
    Game developers increasingly consider the degree to which character animation emulates facial expressions found in cinema. Employing animators and actors to produce cinematic facial animation by mixing motion capture and hand-crafted animation is labor intensive and therefore expensive. Emotion corpora and neural network controllers have shown promise toward developing autonomous animation that does not rely on motion capture. Previous research and practice in disciplines of Computer Science, Psychology and the Performing Arts have provided frameworks on which to build a workflow toward creating an emotion AI system that can animate the facial mesh of a 3d non-player character deploying a combination of related theories and methods. However, past investigations and their resulting production methods largely ignore the emotion generation systems that have evolved in the performing arts for more than a century. We find very little research that embraces the intellectual process of trained actors as complex collaborators from which to understand and model the training of a neural network for character animation. This investigation demonstrates a workflow design that integrates knowledge from the performing arts and the affective branches of the social and biological sciences. Our workflow begins at the stage of developing and annotating a fictional scenario with actors, to producing a video emotion corpus, to designing training and validating a neural network, to analyzing the emotion data annotation of the corpus and neural network, and finally to determining resemblant behavior of its autonomous animation control of a 3d character facial mesh. The resulting workflow includes a method for the development of a neural network architecture whose initial efficacy as a facial emotion expression simulator has been tested and validated as substantially resemblant to the character behavior developed by a human actor

    EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation

    Full text link
    Speech-driven 3D face animation aims to generate realistic facial expressions that match the speech content and emotion. However, existing methods often neglect emotional facial expressions or fail to disentangle them from speech content. To address this issue, this paper proposes an end-to-end neural network to disentangle different emotions in speech so as to generate rich 3D facial expressions. Specifically, we introduce the emotion disentangling encoder (EDE) to disentangle the emotion and content in the speech by cross-reconstructed speech signals with different emotion labels. Then an emotion-guided feature fusion decoder is employed to generate a 3D talking face with enhanced emotion. The decoder is driven by the disentangled identity, emotional, and content embeddings so as to generate controllable personal and emotional styles. Finally, considering the scarcity of the 3D emotional talking face data, we resort to the supervision of facial blendshapes, which enables the reconstruction of plausible 3D faces from 2D emotional data, and contribute a large-scale 3D emotional talking face dataset (3D-ETF) to train the network. Our experiments and user studies demonstrate that our approach outperforms state-of-the-art methods and exhibits more diverse facial movements. We recommend watching the supplementary video: https://ziqiaopeng.github.io/emotalkComment: Accepted by ICCV 202

    FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning

    Get PDF
    This paper presents FaceXHuBERT, a text-less speech-driven 3D facial animation generation method that allows to capture personalized and subtle cues in speech (e.g. identity, emotion and hesitation). It is also very robust to background noise and can handle audio recorded in a variety of situations (e.g. multiple people speaking). Recent approaches employ end-to-end deep learning taking into account both audio and text as input to generate facial animation for the whole face. However, scarcity of publicly available expressive audio-3D facial animation datasets poses a major bottleneck. The resulting animations still have issues regarding accurate lip-synching, expressivity, person-specific information and generalizability. We effectively employ self-supervised pretrained HuBERT model in the training process that allows us to incorporate both lexical and non-lexical information in the audio without using a large lexicon. Additionally, guiding the training with a binary emotion condition and speaker identity distinguishes the tiniest subtle facial motion. We carried out extensive objective and subjective evaluation in comparison to ground-truth and state-of-the-art work. A perceptual user study demonstrates that our approach produces superior results with respect to the realism of the animation 78% of the time in comparison to the state-of-the-art. In addition, our method is 4 times faster eliminating the use of complex sequential models such as transformers. We strongly recommend watching the supplementary video before reading the paper. We also provide the implementation and evaluation codes with a GitHub repository link

    Dynamic Scene Reconstruction and Understanding

    Get PDF
    Traditional approaches to 3D reconstruction have achieved remarkable progress in static scene acquisition. The acquired data serves as priors or benchmarks for many vision and graphics tasks, such as object detection and robotic navigation. Thus, obtaining interpretable and editable representations from a raw monocular RGB-D video sequence is an outstanding goal in scene understanding. However, acquiring an interpretable representation becomes significantly more challenging when a scene contains dynamic activities; for example, a moving camera, rigid object movement, and non-rigid motions. These dynamic scene elements introduce a scene factorization problem, i.e., dividing a scene into elements and jointly estimating elements’ motion and geometry. Moreover, the monocular setting brings in the problems of tracking and fusing partially occluded objects as they are scanned from one viewpoint at a time. This thesis explores several ideas for acquiring an interpretable model in dynamic environments. Firstly, we utilize synthetic assets such as floor plans and object meshes to generate dynamic data for training and evaluation. Then, we explore the idea of learning geometry priors with an instance segmentation module, which predicts the location and grouping of indoor objects. We use the learned geometry priors to infer the occluded object geometry for tracking and reconstruction. While instance segmentation modules usually have a generalization issue, i.e., struggling to handle unknown objects, we observed that the empty space information in the background geometry is more reliable for detecting moving objects. Thus, we proposed a segmentation-by-reconstruction strategy for acquiring rigidly-moving objects and backgrounds. Finally, we present a novel neural representation to learn a factorized scene representation, reconstructing every dynamic element. The proposed model supports both rigid and non-rigid motions without pre-trained templates. We demonstrate that our systems and representation improve the reconstruction quality on synthetic test sets and real-world scans
    corecore