1,940 research outputs found
AlteredAvatar: Stylizing Dynamic 3D Avatars with Fast Style Adaptation
This paper presents a method that can quickly adapt dynamic 3D avatars to
arbitrary text descriptions of novel styles. Among existing approaches for
avatar stylization, direct optimization methods can produce excellent results
for arbitrary styles but they are unpleasantly slow. Furthermore, they require
redoing the optimization process from scratch for every new input. Fast
approximation methods using feed-forward networks trained on a large dataset of
style images can generate results for new inputs quickly, but tend not to
generalize well to novel styles and fall short in quality. We therefore
investigate a new approach, AlteredAvatar, that combines those two approaches
using the meta-learning framework. In the inner loop, the model learns to
optimize to match a single target style well; while in the outer loop, the
model learns to stylize efficiently across many styles. After training,
AlteredAvatar learns an initialization that can quickly adapt within a small
number of update steps to a novel style, which can be given using texts, a
reference image, or a combination of both. We show that AlteredAvatar can
achieve a good balance between speed, flexibility and quality, while
maintaining consistency across a wide range of novel views and facial
expressions.Comment: 10 main pages, 14 figures. Project page:
https://alteredavatar.github.i
Facial Expression Retargeting from Human to Avatar Made Easy
Facial expression retargeting from humans to virtual characters is a useful
technique in computer graphics and animation. Traditional methods use markers
or blendshapes to construct a mapping between the human and avatar faces.
However, these approaches require a tedious 3D modeling process, and the
performance relies on the modelers' experience. In this paper, we propose a
brand-new solution to this cross-domain expression transfer problem via
nonlinear expression embedding and expression domain translation. We first
build low-dimensional latent spaces for the human and avatar facial expressions
with variational autoencoder. Then we construct correspondences between the two
latent spaces guided by geometric and perceptual constraints. Specifically, we
design geometric correspondences to reflect geometric matching and utilize a
triplet data structure to express users' perceptual preference of avatar
expressions. A user-friendly method is proposed to automatically generate
triplets for a system allowing users to easily and efficiently annotate the
correspondences. Using both geometric and perceptual correspondences, we
trained a network for expression domain translation from human to avatar.
Extensive experimental results and user studies demonstrate that even
nonprofessional users can apply our method to generate high-quality facial
expression retargeting results with less time and effort.Comment: IEEE Transactions on Visualization and Computer Graphics (TVCG), to
appea
Choreographic and Somatic Approaches for the Development of Expressive Robotic Systems
As robotic systems are moved out of factory work cells into human-facing
environments questions of choreography become central to their design,
placement, and application. With a human viewer or counterpart present, a
system will automatically be interpreted within context, style of movement, and
form factor by human beings as animate elements of their environment. The
interpretation by this human counterpart is critical to the success of the
system's integration: knobs on the system need to make sense to a human
counterpart; an artificial agent should have a way of notifying a human
counterpart of a change in system state, possibly through motion profiles; and
the motion of a human counterpart may have important contextual clues for task
completion. Thus, professional choreographers, dance practitioners, and
movement analysts are critical to research in robotics. They have design
methods for movement that align with human audience perception, can identify
simplified features of movement for human-robot interaction goals, and have
detailed knowledge of the capacity of human movement. This article provides
approaches employed by one research lab, specific impacts on technical and
artistic projects within, and principles that may guide future such work. The
background section reports on choreography, somatic perspectives,
improvisation, the Laban/Bartenieff Movement System, and robotics. From this
context methods including embodied exercises, writing prompts, and community
building activities have been developed to facilitate interdisciplinary
research. The results of this work is presented as an overview of a smattering
of projects in areas like high-level motion planning, software development for
rapid prototyping of movement, artistic output, and user studies that help
understand how people interpret movement. Finally, guiding principles for other
groups to adopt are posited.Comment: Under review at MDPI Arts Special Issue "The Machine as Artist (for
the 21st Century)"
http://www.mdpi.com/journal/arts/special_issues/Machine_Artis
An Implementation of Multimodal Fusion System for Intelligent Digital Human Generation
With the rapid development of artificial intelligence (AI), digital humans
have attracted more and more attention and are expected to achieve a wide range
of applications in several industries. Then, most of the existing digital
humans still rely on manual modeling by designers, which is a cumbersome
process and has a long development cycle. Therefore, facing the rise of digital
humans, there is an urgent need for a digital human generation system combined
with AI to improve development efficiency. In this paper, an implementation
scheme of an intelligent digital human generation system with multimodal fusion
is proposed. Specifically, text, speech and image are taken as inputs, and
interactive speech is synthesized using large language model (LLM), voiceprint
extraction, and text-to-speech conversion techniques. Then the input image is
age-transformed and a suitable image is selected as the driving image. Then,
the modification and generation of digital human video content is realized by
digital human driving, novel view synthesis, and intelligent dressing
techniques. Finally, we enhance the user experience through style transfer,
super-resolution, and quality evaluation. Experimental results show that the
system can effectively realize digital human generation. The related code is
released at https://github.com/zyj-2000/CUMT_2D_PhotoSpeaker
A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities
Embodied avatars as virtual agents have many applications and provide
benefits over disembodied agents, allowing non-verbal social and interactional
cues to be leveraged, in a similar manner to how humans interact with each
other. We present an open embodied avatar built upon the Unreal Engine that can
be controlled via a simple python programming interface. The avatar has lip
syncing (phoneme control), head gesture and facial expression (using either
facial action units or cardinal emotion categories) capabilities. We release
code and models to illustrate how the avatar can be controlled like a puppet or
used to create a simple conversational agent using public application
programming interfaces (APIs). GITHUB link:
https://github.com/danmcduff/AvatarSimComment: International Conference on Multimodal Interaction (ICMI 2019
A Survey of Computer Graphics Facial Animation Methods: Comparing Traditional Approaches to Machine Learning Methods
Human communications rely on facial expression to denote mood, sentiment, and intent. Realistic facial animation of computer graphic models of human faces can be difficult to achieve as a result of the many details that must be approximated in generating believable facial expressions. Many theoretical approaches have been researched and implemented to create more and more accurate animations that can effectively portray human emotions. Even though many of these approaches are able to generate realistic looking expressions, they typically require a lot of artistic intervention to achieve a believable result. To reduce the intervention needed to create realistic facial animation, new approaches that utilize machine learning are being researched to reduce the amount of effort needed to generate believable facial animations. This survey paper summarizes over 20 research papers related to facial animation and compares the traditional animation approaches to newer machine learning methods as well as highlights the strengths, weaknesses, and use cases of each different approach
Video-driven Neural Physically-based Facial Asset for Production
Production-level workflows for producing convincing 3D dynamic human faces
have long relied on an assortment of labor-intensive tools for geometry and
texture generation, motion capture and rigging, and expression synthesis.
Recent neural approaches automate individual components but the corresponding
latent representations cannot provide artists with explicit controls as in
conventional tools. In this paper, we present a new learning-based,
video-driven approach for generating dynamic facial geometries with
high-quality physically-based assets. For data collection, we construct a
hybrid multiview-photometric capture stage, coupling with ultra-fast video
cameras to obtain raw 3D facial assets. We then set out to model the facial
expression, geometry and physically-based textures using separate VAEs where we
impose a global MLP based expression mapping across the latent spaces of
respective networks, to preserve characteristics across respective attributes.
We also model the delta information as wrinkle maps for the physically-based
textures, achieving high-quality 4K dynamic textures. We demonstrate our
approach in high-fidelity performer-specific facial capture and cross-identity
facial motion retargeting. In addition, our multi-VAE-based neural asset, along
with the fast adaptation schemes, can also be deployed to handle in-the-wild
videos. Besides, we motivate the utility of our explicit facial disentangling
strategy by providing various promising physically-based editing results with
high realism. Comprehensive experiments show that our technique provides higher
accuracy and visual fidelity than previous video-driven facial reconstruction
and animation methods.Comment: For project page, see https://sites.google.com/view/npfa/ Notice: You
may not copy, reproduce, distribute, publish, display, perform, modify,
create derivative works, transmit, or in any way exploit any such content,
nor may you distribute any part of this content over any network, including a
local area network, sell or offer it for sale, or use such content to
construct any kind of databas
- …