400 research outputs found
Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction
Reconstructing 3D clothed human avatars from single images is a challenging
task, especially when encountering complex poses and loose clothing. Current
methods exhibit limitations in performance, largely attributable to their
dependence on insufficient 2D image features and inconsistent query methods.
Owing to this, we present the Global-correlated 3D-decoupling Transformer for
clothed Avatar reconstruction (GTA), a novel transformer-based architecture
that reconstructs clothed human avatars from monocular images. Our approach
leverages transformer architectures by utilizing a Vision Transformer model as
an encoder for capturing global-correlated image features. Subsequently, our
innovative 3D-decoupling decoder employs cross-attention to decouple tri-plane
features, using learnable embeddings as queries for cross-plane generation. To
effectively enhance feature fusion with the tri-plane 3D feature and human body
prior, we propose a hybrid prior fusion strategy combining spatial and
prior-enhanced queries, leveraging the benefits of spatial localization and
human body prior knowledge. Comprehensive experiments on CAPE and THuman2.0
datasets illustrate that our method outperforms state-of-the-art approaches in
both geometry and texture reconstruction, exhibiting high robustness to
challenging poses and loose clothing, and producing higher-resolution textures.
Codes will be available at https://github.com/River-Zhang/GTA.Comment: Accepted by NeurIPS 2023. Project page:
https://river-zhang.github.io/GTA-projectpage
MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar
The ability to animate photo-realistic head avatars reconstructed from
monocular portrait video sequences represents a crucial step in bridging the
gap between the virtual and real worlds. Recent advancements in head avatar
techniques, including explicit 3D morphable meshes (3DMM), point clouds, and
neural implicit representation have been exploited for this ongoing research.
However, 3DMM-based methods are constrained by their fixed topologies,
point-based approaches suffer from a heavy training burden due to the extensive
quantity of points involved, and the last ones suffer from limitations in
deformation flexibility and rendering efficiency. In response to these
challenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based Head
Avatar), a novel approach that harnesses 3D Gaussian point representation
coupled with a Gaussian deformation field to learn explicit head avatars from
monocular portrait videos. We define our head avatars with Gaussian points
characterized by adaptable shapes, enabling flexible topology. These points
exhibit movement with a Gaussian deformation field in alignment with the target
pose and expression of a person, facilitating efficient deformation.
Additionally, the Gaussian points have controllable shape, size, color, and
opacity combined with Gaussian splatting, allowing for efficient training and
rendering. Experiments demonstrate the superior performance of our method,
which achieves state-of-the-art results among previous methods.Comment: The link to our projectpage is
https://yufan1012.github.io/MonoGaussianAvata
PERGAMO: Personalized 3D Garments from Monocular Video
Clothing plays a fundamental role in digital humans. Current approaches to
animate 3D garments are mostly based on realistic physics simulation, however,
they typically suffer from two main issues: high computational run-time cost,
which hinders their development; and simulation-to-real gap, which impedes the
synthesis of specific real-world cloth samples. To circumvent both issues we
propose PERGAMO, a data-driven approach to learn a deformable model for 3D
garments from monocular images. To this end, we first introduce a novel method
to reconstruct the 3D geometry of garments from a single image, and use it to
build a dataset of clothing from monocular videos. We use these 3D
reconstructions to train a regression model that accurately predicts how the
garment deforms as a function of the underlying body pose. We show that our
method is capable of producing garment animations that match the real-world
behaviour, and generalizes to unseen body motions extracted from motion capture
dataset.Comment: Published at Computer Graphics Forum (Proc. of ACM/SIGGRAPH SCA),
2022. Project website http://mslab.es/projects/PERGAMO
Neural Radiance Fields: Past, Present, and Future
The various aspects like modeling and interpreting 3D environments and
surroundings have enticed humans to progress their research in 3D Computer
Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall
et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in
Computer Graphics, Robotics, Computer Vision, and the possible scope of
High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D
models have gained traction from res with more than 1000 preprints related to
NeRFs published. This paper serves as a bridge for people starting to study
these fields by building on the basics of Mathematics, Geometry, Computer
Vision, and Computer Graphics to the difficulties encountered in Implicit
Representations at the intersection of all these disciplines. This survey
provides the history of rendering, Implicit Learning, and NeRFs, the
progression of research on NeRFs, and the potential applications and
implications of NeRFs in today's world. In doing so, this survey categorizes
all the NeRF-related research in terms of the datasets used, objective
functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
Implicit Neural Head Synthesis via Controllable Local Deformation Fields
High-quality reconstruction of controllable 3D head avatars from 2D videos is
highly desirable for virtual human applications in movies, games, and
telepresence. Neural implicit fields provide a powerful representation to model
3D head avatars with personalized shape, expressions, and facial parts, e.g.,
hair and mouth interior, that go beyond the linear 3D morphable model (3DMM).
However, existing methods do not model faces with fine-scale facial features,
or local control of facial parts that extrapolate asymmetric expressions from
monocular videos. Further, most condition only on 3DMM parameters with poor(er)
locality, and resolve local features with a global neural field. We build on
part-based implicit shape models that decompose a global deformation field into
local ones. Our novel formulation models multiple implicit deformation fields
with local semantic rig-like control via 3DMM-based parameters, and
representative facial landmarks. Further, we propose a local control loss and
attention mask mechanism that promote sparsity of each learned deformation
field. Our formulation renders sharper locally controllable nonlinear
deformations than previous implicit monocular approaches, especially mouth
interior, asymmetric expressions, and facial details.Comment: Accepted at CVPR 202
Relightable and Animatable Neural Avatar from Sparse-View Video
This paper tackles the challenge of creating relightable and animatable
neural avatars from sparse-view (or even monocular) videos of dynamic humans
under unknown illumination. Compared to studio environments, this setting is
more practical and accessible but poses an extremely challenging ill-posed
problem. Previous neural human reconstruction methods are able to reconstruct
animatable avatars from sparse views using deformed Signed Distance Fields
(SDF) but cannot recover material parameters for relighting. While
differentiable inverse rendering-based methods have succeeded in material
recovery of static objects, it is not straightforward to extend them to dynamic
humans as it is computationally intensive to compute pixel-surface intersection
and light visibility on deformed SDFs for inverse rendering. To solve this
challenge, we propose a Hierarchical Distance Query (HDQ) algorithm to
approximate the world space distances under arbitrary human poses.
Specifically, we estimate coarse distances based on a parametric human model
and compute fine distances by exploiting the local deformation invariance of
SDF. Based on the HDQ algorithm, we leverage sphere tracing to efficiently
estimate the surface intersection and light visibility. This allows us to
develop the first system to recover animatable and relightable neural avatars
from sparse view (or monocular) inputs. Experiments demonstrate that our
approach is able to produce superior results compared to state-of-the-art
methods. Our code will be released for reproducibility.Comment: Project page: https://zju3dv.github.io/relightable_avata
Instant Volumetric Head Avatars
We present Instant Volumetric Head Avatars (INSTA), a novel approach for
reconstructing photo-realistic digital avatars instantaneously. INSTA models a
dynamic neural radiance field based on neural graphics primitives embedded
around a parametric face model. Our pipeline is trained on a single monocular
RGB portrait video that observes the subject under different expressions and
views. While state-of-the-art methods take up to several days to train an
avatar, our method can reconstruct a digital avatar in less than 10 minutes on
modern GPU hardware, which is orders of magnitude faster than previous
solutions. In addition, it allows for the interactive rendering of novel poses
and expressions. By leveraging the geometry prior of the underlying parametric
face model, we demonstrate that INSTA extrapolates to unseen poses. In
quantitative and qualitative studies on various subjects, INSTA outperforms
state-of-the-art methods regarding rendering quality and training time.Comment: Website: https://zielon.github.io/insta/ Video:
https://youtu.be/HOgaeWTih7
Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars
Neural radiance fields are capable of reconstructing high-quality drivable
human avatars but are expensive to train and render. To reduce consumption, we
propose Animatable 3D Gaussian, which learns human avatars from input images
and poses. We extend 3D Gaussians to dynamic human scenes by modeling a set of
skinned 3D Gaussians and a corresponding skeleton in canonical space and
deforming 3D Gaussians to posed space according to the input poses. We
introduce hash-encoded shape and appearance to speed up training and propose
time-dependent ambient occlusion to achieve high-quality reconstructions in
scenes containing complex motions and dynamic shadows. On both novel view
synthesis and novel pose synthesis tasks, our method outperforms existing
methods in terms of training time, rendering speed, and reconstruction quality.
Our method can be easily extended to multi-human scenes and achieve comparable
novel view synthesis results on a scene with ten people in only 25 seconds of
training
NSF: Neural Surface Fields for Human Modeling from Monocular Depth
Obtaining personalized 3D animatable avatars from a monocular camera has
several real world applications in gaming, virtual try-on, animation, and
VR/XR, etc. However, it is very challenging to model dynamic and fine-grained
clothing deformations from such sparse data. Existing methods for modeling 3D
humans from depth data have limitations in terms of computational efficiency,
mesh coherency, and flexibility in resolution and topology. For instance,
reconstructing shapes using implicit functions and extracting explicit meshes
per frame is computationally expensive and cannot ensure coherent meshes across
frames. Moreover, predicting per-vertex deformations on a pre-designed human
template with a discrete surface lacks flexibility in resolution and topology.
To overcome these limitations, we propose a novel method `\keyfeature: Neural
Surface Fields' for modeling 3D clothed humans from monocular depth. NSF
defines a neural field solely on the base surface which models a continuous and
flexible displacement field. NSF can be adapted to the base surface with
different resolution and topology without retraining at inference time.
Compared to existing approaches, our method eliminates the expensive per-frame
surface extraction while maintaining mesh coherency, and is capable of
reconstructing meshes with arbitrary resolution without retraining. To foster
research in this direction, we release our code in project page at:
https://yuxuan-xue.com/nsf.Comment: Accpted to ICCV 2023; Homepage at: https://yuxuan-xue.com/ns
- …