212 research outputs found
FLARE: Fast Learning of Animatable and Relightable Mesh Avatars
Our goal is to efficiently learn personalized animatable 3D head avatars from
videos that are geometrically accurate, realistic, relightable, and compatible
with current rendering systems. While 3D meshes enable efficient processing and
are highly portable, they lack realism in terms of shape and appearance. Neural
representations, on the other hand, are realistic but lack compatibility and
are slow to train and render. Our key insight is that it is possible to
efficiently learn high-fidelity 3D mesh representations via differentiable
rendering by exploiting highly-optimized methods from traditional computer
graphics and approximating some of the components with neural networks. To that
end, we introduce FLARE, a technique that enables the creation of animatable
and relightable mesh avatars from a single monocular video. First, we learn a
canonical geometry using a mesh representation, enabling efficient
differentiable rasterization and straightforward animation via learned
blendshapes and linear blend skinning weights. Second, we follow
physically-based rendering and factor observed colors into intrinsic albedo,
roughness, and a neural representation of the illumination, allowing the
learned avatars to be relit in novel scenes. Since our input videos are
captured on a single device with a narrow field of view, modeling the
surrounding environment light is non-trivial. Based on the split-sum
approximation for modeling specular reflections, we address this by
approximating the pre-filtered environment map with a multi-layer perceptron
(MLP) modulated by the surface roughness, eliminating the need to explicitly
model the light. We demonstrate that our mesh-based avatar formulation,
combined with learned deformation, material, and lighting MLPs, produces
avatars with high-quality geometry and appearance, while also being efficient
to train and render compared to existing approaches.Comment: 15 pages, Accepted: ACM Transactions on Graphics (Proceedings of
SIGGRAPH Asia), 202
LiveHand: Real-time and Photorealistic Neural Hand Rendering
The human hand is the main medium through which we interact with our
surroundings. Hence, its digitization is of uttermost importance, with direct
applications in VR/AR, gaming, and media production amongst other areas. While
there are several works for modeling the geometry and articulations of hands,
little attention has been dedicated to capturing photo-realistic appearance. In
addition, for applications in extended reality and gaming, real-time rendering
is critical. In this work, we present the first neural-implicit approach to
photo-realistically render hands in real-time. This is a challenging problem as
hands are textured and undergo strong articulations with various pose-dependent
effects. However, we show that this can be achieved through our carefully
designed method. This includes training on a low-resolution rendering of a
neural radiance field, together with a 3D-consistent super-resolution module
and mesh-guided space canonicalization and sampling. In addition, we show the
novel application of a perceptual loss on the image space is critical for
achieving photorealism. We show rendering results for several identities, and
demonstrate that our method captures pose- and view-dependent appearance
effects. We also show a live demo of our method where we photo-realistically
render the human hand in real-time for the first time in literature. We ablate
all our design choices and show that our design optimizes for both photorealism
and rendering speed. Our code will be released to encourage further research in
this area.Comment: 11 pages, 8 figure
Neural function approximation on graphs: shape modelling, graph discrimination & compression
Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces
OT-Net: A Reusable Neural Optimal Transport Solver
With the widespread application of optimal transport (OT), its calculation
becomes essential, and various algorithms have emerged. However, the existing
methods either have low efficiency or cannot represent discontinuous maps. A
novel reusable neural OT solver OT-Net is thus presented, which first learns
Brenier's height representation via the neural network to obtain its potential,
and then gained the OT map by computing the gradient of the potential. The
algorithm has two merits, 1) it can easily represent discontinuous maps, which
allows it to match any target distribution with discontinuous supports and
achieve sharp boundaries. This can well eliminate mode collapse in the
generated models. 2) The OT map can be calculated straightly by the proposed
algorithm when new target samples are added, which greatly improves the
efficiency and reusability of the map. Moreover, the theoretical error bound of
the algorithm is analyzed, and we have demonstrated the empirical success of
our approach in image generation, color transfer, and domain adaptation
Latent Disentanglement for the Analysis and Generation of Digital Human Shapes
Analysing and generating digital human shapes is crucial for a wide variety of applications ranging from movie production to healthcare. The most common approaches for the analysis and generation of digital human shapes involve the creation of statistical shape models. At the heart of these techniques is the definition of a mapping between shapes and a low-dimensional representation. However, making these representations interpretable is still an open challenge. This thesis explores latent disentanglement as a powerful technique to make the latent space of geometric deep learning based statistical shape models more structured and interpretable. In particular, it introduces two novel techniques to disentangle the latent representation of variational autoencoders and generative adversarial networks with respect to the local shape attributes characterising the identity of the generated body and head meshes. This work was inspired by a shape completion framework that was proposed as a viable alternative to intraoperative registration in minimally invasive surgery of the liver. In addition, one of these methods for latent disentanglement was also applied to plastic surgery, where it was shown to improve the diagnosis of craniofacial syndromes and aid surgical planning
ASM: Adaptive Skinning Model for High-Quality 3D Face Modeling
The research fields of parametric face models and 3D face reconstruction have
been extensively studied. However, a critical question remains unanswered: how
to tailor the face model for specific reconstruction settings. We argue that
reconstruction with multi-view uncalibrated images demands a new model with
stronger capacity. Our study shifts attention from data-dependent 3D Morphable
Models (3DMM) to an understudied human-designed skinning model. We propose
Adaptive Skinning Model (ASM), which redefines the skinning model with more
compact and fully tunable parameters. With extensive experiments, we
demonstrate that ASM achieves significantly improved capacity than 3DMM, with
the additional advantage of model size and easy implementation for new
topology. We achieve state-of-the-art performance with ASM for multi-view
reconstruction on the Florence MICC Coop benchmark. Our quantitative analysis
demonstrates the importance of a high-capacity model for fully exploiting
abundant information from multi-view input in reconstruction. Furthermore, our
model with physical-semantic parameters can be directly utilized for real-world
applications, such as in-game avatar creation. As a result, our work opens up
new research directions for the parametric face models and facilitates future
research on multi-view reconstruction
FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields
As recent advances in Neural Radiance Fields (NeRF) have enabled
high-fidelity 3D face reconstruction and novel view synthesis, its manipulation
also became an essential task in 3D vision. However, existing manipulation
methods require extensive human labor, such as a user-provided semantic mask
and manual attribute search unsuitable for non-expert users. Instead, our
approach is designed to require a single text to manipulate a face
reconstructed with NeRF. To do so, we first train a scene manipulator, a latent
code-conditional deformable NeRF, over a dynamic scene to control a face
deformation using the latent code. However, representing a scene deformation
with a single latent code is unfavorable for compositing local deformations
observed in different instances. As so, our proposed Position-conditional
Anchor Compositor (PAC) learns to represent a manipulated scene with spatially
varying latent codes. Their renderings with the scene manipulator are then
optimized to yield high cosine similarity to a target text in CLIP embedding
space for text-driven manipulation. To the best of our knowledge, our approach
is the first to address the text-driven manipulation of a face reconstructed
with NeRF. Extensive results, comparisons, and ablation studies demonstrate the
effectiveness of our approach.Comment: ICCV 202
MyStyle: A Personalized Generative Prior
We introduce MyStyle, a personalized deep generative prior trained with a few
shots of an individual. MyStyle allows to reconstruct, enhance and edit images
of a specific person, such that the output is faithful to the person's key
facial characteristics. Given a small reference set of portrait images of a
person (~100), we tune the weights of a pretrained StyleGAN face generator to
form a local, low-dimensional, personalized manifold in the latent space. We
show that this manifold constitutes a personalized region that spans latent
codes associated with diverse portrait images of the individual. Moreover, we
demonstrate that we obtain a personalized generative prior, and propose a
unified approach to apply it to various ill-posed image enhancement problems,
such as inpainting and super-resolution, as well as semantic editing. Using the
personalized generative prior we obtain outputs that exhibit high-fidelity to
the input images and are also faithful to the key facial characteristics of the
individual in the reference set. We demonstrate our method with fair-use images
of numerous widely recognizable individuals for whom we have the prior
knowledge for a qualitative evaluation of the expected outcome. We evaluate our
approach against few-shots baselines and show that our personalized prior,
quantitatively and qualitatively, outperforms state-of-the-art alternatives.Comment: Project webpage: https://mystyle-personalized-prior.github.io/,
Video: https://youtu.be/QvOdQR3tlO
Intelligent Sensors for Human Motion Analysis
The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems
VolTeMorph: Realtime, Controllable and Generalisable Animation of Volumetric Representations
The recent increase in popularity of volumetric representations for scene
reconstruction and novel view synthesis has put renewed focus on animating
volumetric content at high visual quality and in real-time. While implicit
deformation methods based on learned functions can produce impressive results,
they are `black boxes' to artists and content creators, they require large
amounts of training data to generalise meaningfully, and they do not produce
realistic extrapolations outside the training data. In this work we solve these
issues by introducing a volume deformation method which is real-time, easy to
edit with off-the-shelf software and can extrapolate convincingly. To
demonstrate the versatility of our method, we apply it in two scenarios:
physics-based object deformation and telepresence where avatars are controlled
using blendshapes. We also perform thorough experiments showing that our method
compares favourably to both volumetric approaches combined with implicit
deformation and methods based on mesh deformation.Comment: 18 pages, 21 figure
- …