1,130 research outputs found
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction
Reconstructing 3D clothed human avatars from single images is a challenging
task, especially when encountering complex poses and loose clothing. Current
methods exhibit limitations in performance, largely attributable to their
dependence on insufficient 2D image features and inconsistent query methods.
Owing to this, we present the Global-correlated 3D-decoupling Transformer for
clothed Avatar reconstruction (GTA), a novel transformer-based architecture
that reconstructs clothed human avatars from monocular images. Our approach
leverages transformer architectures by utilizing a Vision Transformer model as
an encoder for capturing global-correlated image features. Subsequently, our
innovative 3D-decoupling decoder employs cross-attention to decouple tri-plane
features, using learnable embeddings as queries for cross-plane generation. To
effectively enhance feature fusion with the tri-plane 3D feature and human body
prior, we propose a hybrid prior fusion strategy combining spatial and
prior-enhanced queries, leveraging the benefits of spatial localization and
human body prior knowledge. Comprehensive experiments on CAPE and THuman2.0
datasets illustrate that our method outperforms state-of-the-art approaches in
both geometry and texture reconstruction, exhibiting high robustness to
challenging poses and loose clothing, and producing higher-resolution textures.
Codes will be available at https://github.com/River-Zhang/GTA.Comment: Accepted by NeurIPS 2023. Project page:
https://river-zhang.github.io/GTA-projectpage
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
TADA! Text to Animatable Digital Avatars
We introduce TADA, a simple-yet-effective approach that takes textual
descriptions and produces expressive 3D avatars with high-quality geometry and
lifelike textures, that can be animated and rendered with traditional graphics
pipelines. Existing text-based character generation methods are limited in
terms of geometry and texture quality, and cannot be realistically animated due
to inconsistent alignment between the geometry and the texture, particularly in
the face region. To overcome these limitations, TADA leverages the synergy of a
2D diffusion model and an animatable parametric body model. Specifically, we
derive an optimizable high-resolution body model from SMPL-X with 3D
displacements and a texture map, and use hierarchical rendering with score
distillation sampling (SDS) to create high-quality, detailed, holistic 3D
avatars from text. To ensure alignment between the geometry and texture, we
render normals and RGB images of the generated character and exploit their
latent embeddings in the SDS training process. We further introduce various
expression parameters to deform the generated character during training,
ensuring that the semantics of our generated character remain consistent with
the original SMPL-X model, resulting in an animatable character. Comprehensive
evaluations demonstrate that TADA significantly surpasses existing approaches
on both qualitative and quantitative measures. TADA enables creation of
large-scale digital character assets that are ready for animation and
rendering, while also being easily editable through natural language. The code
will be public for research purposes
3D Face Synthesis Driven by Personality Impression
Synthesizing 3D faces that give certain personality impressions is commonly
needed in computer games, animations, and virtual world applications for
producing realistic virtual characters. In this paper, we propose a novel
approach to synthesize 3D faces based on personality impression for creating
virtual characters. Our approach consists of two major steps. In the first
step, we train classifiers using deep convolutional neural networks on a
dataset of images with personality impression annotations, which are capable of
predicting the personality impression of a face. In the second step, given a 3D
face and a desired personality impression type as user inputs, our approach
optimizes the facial details against the trained classifiers, so as to
synthesize a face which gives the desired personality impression. We
demonstrate our approach for synthesizing 3D faces giving desired personality
impressions on a variety of 3D face models. Perceptual studies show that the
perceived personality impressions of the synthesized faces agree with the
target personality impressions specified for synthesizing the faces. Please
refer to the supplementary materials for all results.Comment: 8pages;6 figure
- …