1,940 research outputs found

    AlteredAvatar: Stylizing Dynamic 3D Avatars with Fast Style Adaptation

    Full text link
    This paper presents a method that can quickly adapt dynamic 3D avatars to arbitrary text descriptions of novel styles. Among existing approaches for avatar stylization, direct optimization methods can produce excellent results for arbitrary styles but they are unpleasantly slow. Furthermore, they require redoing the optimization process from scratch for every new input. Fast approximation methods using feed-forward networks trained on a large dataset of style images can generate results for new inputs quickly, but tend not to generalize well to novel styles and fall short in quality. We therefore investigate a new approach, AlteredAvatar, that combines those two approaches using the meta-learning framework. In the inner loop, the model learns to optimize to match a single target style well; while in the outer loop, the model learns to stylize efficiently across many styles. After training, AlteredAvatar learns an initialization that can quickly adapt within a small number of update steps to a novel style, which can be given using texts, a reference image, or a combination of both. We show that AlteredAvatar can achieve a good balance between speed, flexibility and quality, while maintaining consistency across a wide range of novel views and facial expressions.Comment: 10 main pages, 14 figures. Project page: https://alteredavatar.github.i

    Facial Expression Retargeting from Human to Avatar Made Easy

    Full text link
    Facial expression retargeting from humans to virtual characters is a useful technique in computer graphics and animation. Traditional methods use markers or blendshapes to construct a mapping between the human and avatar faces. However, these approaches require a tedious 3D modeling process, and the performance relies on the modelers' experience. In this paper, we propose a brand-new solution to this cross-domain expression transfer problem via nonlinear expression embedding and expression domain translation. We first build low-dimensional latent spaces for the human and avatar facial expressions with variational autoencoder. Then we construct correspondences between the two latent spaces guided by geometric and perceptual constraints. Specifically, we design geometric correspondences to reflect geometric matching and utilize a triplet data structure to express users' perceptual preference of avatar expressions. A user-friendly method is proposed to automatically generate triplets for a system allowing users to easily and efficiently annotate the correspondences. Using both geometric and perceptual correspondences, we trained a network for expression domain translation from human to avatar. Extensive experimental results and user studies demonstrate that even nonprofessional users can apply our method to generate high-quality facial expression retargeting results with less time and effort.Comment: IEEE Transactions on Visualization and Computer Graphics (TVCG), to appea

    Choreographic and Somatic Approaches for the Development of Expressive Robotic Systems

    Full text link
    As robotic systems are moved out of factory work cells into human-facing environments questions of choreography become central to their design, placement, and application. With a human viewer or counterpart present, a system will automatically be interpreted within context, style of movement, and form factor by human beings as animate elements of their environment. The interpretation by this human counterpart is critical to the success of the system's integration: knobs on the system need to make sense to a human counterpart; an artificial agent should have a way of notifying a human counterpart of a change in system state, possibly through motion profiles; and the motion of a human counterpart may have important contextual clues for task completion. Thus, professional choreographers, dance practitioners, and movement analysts are critical to research in robotics. They have design methods for movement that align with human audience perception, can identify simplified features of movement for human-robot interaction goals, and have detailed knowledge of the capacity of human movement. This article provides approaches employed by one research lab, specific impacts on technical and artistic projects within, and principles that may guide future such work. The background section reports on choreography, somatic perspectives, improvisation, the Laban/Bartenieff Movement System, and robotics. From this context methods including embodied exercises, writing prompts, and community building activities have been developed to facilitate interdisciplinary research. The results of this work is presented as an overview of a smattering of projects in areas like high-level motion planning, software development for rapid prototyping of movement, artistic output, and user studies that help understand how people interpret movement. Finally, guiding principles for other groups to adopt are posited.Comment: Under review at MDPI Arts Special Issue "The Machine as Artist (for the 21st Century)" http://www.mdpi.com/journal/arts/special_issues/Machine_Artis

    An Implementation of Multimodal Fusion System for Intelligent Digital Human Generation

    Full text link
    With the rapid development of artificial intelligence (AI), digital humans have attracted more and more attention and are expected to achieve a wide range of applications in several industries. Then, most of the existing digital humans still rely on manual modeling by designers, which is a cumbersome process and has a long development cycle. Therefore, facing the rise of digital humans, there is an urgent need for a digital human generation system combined with AI to improve development efficiency. In this paper, an implementation scheme of an intelligent digital human generation system with multimodal fusion is proposed. Specifically, text, speech and image are taken as inputs, and interactive speech is synthesized using large language model (LLM), voiceprint extraction, and text-to-speech conversion techniques. Then the input image is age-transformed and a suitable image is selected as the driving image. Then, the modification and generation of digital human video content is realized by digital human driving, novel view synthesis, and intelligent dressing techniques. Finally, we enhance the user experience through style transfer, super-resolution, and quality evaluation. Experimental results show that the system can effectively realize digital human generation. The related code is released at https://github.com/zyj-2000/CUMT_2D_PhotoSpeaker

    A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities

    Full text link
    Embodied avatars as virtual agents have many applications and provide benefits over disembodied agents, allowing non-verbal social and interactional cues to be leveraged, in a similar manner to how humans interact with each other. We present an open embodied avatar built upon the Unreal Engine that can be controlled via a simple python programming interface. The avatar has lip syncing (phoneme control), head gesture and facial expression (using either facial action units or cardinal emotion categories) capabilities. We release code and models to illustrate how the avatar can be controlled like a puppet or used to create a simple conversational agent using public application programming interfaces (APIs). GITHUB link: https://github.com/danmcduff/AvatarSimComment: International Conference on Multimodal Interaction (ICMI 2019

    A Survey of Computer Graphics Facial Animation Methods: Comparing Traditional Approaches to Machine Learning Methods

    Get PDF
    Human communications rely on facial expression to denote mood, sentiment, and intent. Realistic facial animation of computer graphic models of human faces can be difficult to achieve as a result of the many details that must be approximated in generating believable facial expressions. Many theoretical approaches have been researched and implemented to create more and more accurate animations that can effectively portray human emotions. Even though many of these approaches are able to generate realistic looking expressions, they typically require a lot of artistic intervention to achieve a believable result. To reduce the intervention needed to create realistic facial animation, new approaches that utilize machine learning are being researched to reduce the amount of effort needed to generate believable facial animations. This survey paper summarizes over 20 research papers related to facial animation and compares the traditional animation approaches to newer machine learning methods as well as highlights the strengths, weaknesses, and use cases of each different approach

    Video-driven Neural Physically-based Facial Asset for Production

    Full text link
    Production-level workflows for producing convincing 3D dynamic human faces have long relied on an assortment of labor-intensive tools for geometry and texture generation, motion capture and rigging, and expression synthesis. Recent neural approaches automate individual components but the corresponding latent representations cannot provide artists with explicit controls as in conventional tools. In this paper, we present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets. For data collection, we construct a hybrid multiview-photometric capture stage, coupling with ultra-fast video cameras to obtain raw 3D facial assets. We then set out to model the facial expression, geometry and physically-based textures using separate VAEs where we impose a global MLP based expression mapping across the latent spaces of respective networks, to preserve characteristics across respective attributes. We also model the delta information as wrinkle maps for the physically-based textures, achieving high-quality 4K dynamic textures. We demonstrate our approach in high-fidelity performer-specific facial capture and cross-identity facial motion retargeting. In addition, our multi-VAE-based neural asset, along with the fast adaptation schemes, can also be deployed to handle in-the-wild videos. Besides, we motivate the utility of our explicit facial disentangling strategy by providing various promising physically-based editing results with high realism. Comprehensive experiments show that our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.Comment: For project page, see https://sites.google.com/view/npfa/ Notice: You may not copy, reproduce, distribute, publish, display, perform, modify, create derivative works, transmit, or in any way exploit any such content, nor may you distribute any part of this content over any network, including a local area network, sell or offer it for sale, or use such content to construct any kind of databas
    corecore