1,201 research outputs found
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
Weakly-Supervised 3D Reconstruction of Clothed Humans via Normal Maps
We present a novel deep learning-based approach to the 3D reconstruction of
clothed humans using weak supervision via 2D normal maps. Given a single RGB
image or multiview images, our network infers a signed distance function (SDF)
discretized on a tetrahedral mesh surrounding the body in a rest pose.
Subsequently, inferred pose and camera parameters are used to generate a normal
map from the SDF. A key aspect of our approach is the use of Marching
Tetrahedra to (uniquely) compute a triangulated surface from the SDF on the
tetrahedral mesh, facilitating straightforward differentiation (and thus
backpropagation). Thus, given only ground truth normal maps (with no volumetric
information ground truth information), we can train the network to produce SDF
values from corresponding RGB images. Optionally, an additional multiview loss
leads to improved results. We demonstrate the efficacy of our approach for both
network inference and 3D reconstruction
Practical and Rich User Digitization
A long-standing vision in computer science has been to evolve computing
devices into proactive assistants that enhance our productivity, health and
wellness, and many other facets of our lives. User digitization is crucial in
achieving this vision as it allows computers to intimately understand their
users, capturing activity, pose, routine, and behavior. Today's consumer
devices - like smartphones and smartwatches provide a glimpse of this
potential, offering coarse digital representations of users with metrics such
as step count, heart rate, and a handful of human activities like running and
biking. Even these very low-dimensional representations are already bringing
value to millions of people's lives, but there is significant potential for
improvement. On the other end, professional, high-fidelity comprehensive user
digitization systems exist. For example, motion capture suits and multi-camera
rigs that digitize our full body and appearance, and scanning machines such as
MRI capture our detailed anatomy. However, these carry significant user
practicality burdens, such as financial, privacy, ergonomic, aesthetic, and
instrumentation considerations, that preclude consumer use. In general, the
higher the fidelity of capture, the lower the user's practicality. Most
conventional approaches strike a balance between user practicality and
digitization fidelity.
My research aims to break this trend, developing sensing systems that
increase user digitization fidelity to create new and powerful computing
experiences while retaining or even improving user practicality and
accessibility, allowing such technologies to have a societal impact. Armed with
such knowledge, our future devices could offer longitudinal health tracking,
more productive work environments, full body avatars in extended reality, and
embodied telepresence experiences, to name just a few domains.Comment: PhD thesi
Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction
Reconstructing 3D clothed human avatars from single images is a challenging
task, especially when encountering complex poses and loose clothing. Current
methods exhibit limitations in performance, largely attributable to their
dependence on insufficient 2D image features and inconsistent query methods.
Owing to this, we present the Global-correlated 3D-decoupling Transformer for
clothed Avatar reconstruction (GTA), a novel transformer-based architecture
that reconstructs clothed human avatars from monocular images. Our approach
leverages transformer architectures by utilizing a Vision Transformer model as
an encoder for capturing global-correlated image features. Subsequently, our
innovative 3D-decoupling decoder employs cross-attention to decouple tri-plane
features, using learnable embeddings as queries for cross-plane generation. To
effectively enhance feature fusion with the tri-plane 3D feature and human body
prior, we propose a hybrid prior fusion strategy combining spatial and
prior-enhanced queries, leveraging the benefits of spatial localization and
human body prior knowledge. Comprehensive experiments on CAPE and THuman2.0
datasets illustrate that our method outperforms state-of-the-art approaches in
both geometry and texture reconstruction, exhibiting high robustness to
challenging poses and loose clothing, and producing higher-resolution textures.
Codes will be available at https://github.com/River-Zhang/GTA.Comment: Accepted by NeurIPS 2023. Project page:
https://river-zhang.github.io/GTA-projectpage
DLCA-Recon: Dynamic Loose Clothing Avatar Reconstruction from Monocular Videos
Reconstructing a dynamic human with loose clothing is an important but
difficult task. To address this challenge, we propose a method named DLCA-Recon
to create human avatars from monocular videos. The distance from loose clothing
to the underlying body rapidly changes in every frame when the human freely
moves and acts. Previous methods lack effective geometric initialization and
constraints for guiding the optimization of deformation to explain this
dramatic change, resulting in the discontinuous and incomplete reconstruction
surface. To model the deformation more accurately, we propose to initialize an
estimated 3D clothed human in the canonical space, as it is easier for
deformation fields to learn from the clothed human than from SMPL. With both
representations of explicit mesh and implicit SDF, we utilize the physical
connection information between consecutive frames and propose a dynamic
deformation field (DDF) to optimize deformation fields. DDF accounts for
contributive forces on loose clothing to enhance the interpretability of
deformations and effectively capture the free movement of loose clothing.
Moreover, we propagate SMPL skinning weights to each individual and refine pose
and skinning weights during the optimization to improve skinning
transformation. Based on more reasonable initialization and DDF, we can
simulate real-world physics more accurately. Extensive experiments on public
and our own datasets validate that our method can produce superior results for
humans with loose clothing compared to the SOTA methods
- …