764 research outputs found
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
Synthesization and reconstruction of 3D faces by deep neural networks
The past few decades have witnessed substantial progress towards 3D facial modelling and reconstruction as it is high importance for many computer vision and graphics applications including Augmented/Virtual Reality (AR/VR), computer games, movie post-production, image/video editing, medical applications, etc. In the traditional approaches, facial texture and shape are represented as triangle mesh that can cover identity and expression variation with non-rigid deformation. A dataset of 3D face scans is then densely registered into a common topology in order to construct a linear statistical model. Such models are called 3D Morphable Models (3DMMs) and can be used for 3D face synthesization or reconstruction by a single or few 2D face images. The works presented in this thesis focus on the modernization of these traditional techniques in the light of recent advances of deep learning and thanks to the availability of large-scale datasets.
Ever since the introduction of 3DMMs by over two decades, there has been a lot of progress on it and they are still considered as one of the best methodologies to model 3D faces. Nevertheless, there are still several aspects of it that need to be upgraded to the "deep era". Firstly, the conventional 3DMMs are built by linear statistical approaches such as Principal Component Analysis (PCA) which omits high-frequency information by its nature. While this does not curtail shape, which is often smooth in the original data, texture models are heavily afflicted by losing high-frequency details and photorealism. Secondly, the existing 3DMM fitting approaches rely on very primitive (i.e. RGB values, sparse landmarks) or hand-crafted features (i.e. HOG, SIFT) as supervision that are sensitive to "in-the-wild" images (i.e. lighting, pose, occlusion), or somewhat missing identity/expression resemblance with the target image. Finally, shape, texture, and expression modalities are separately modelled by ignoring the correlation among them, placing a fundamental limit to the synthesization of semantically meaningful 3D faces. Moreover, photorealistic 3D face synthesis has not been studied thoroughly in the literature.
This thesis attempts to address the above-mentioned issues by harnessing the power of deep neural network and generative adversarial networks as explained below:
Due to the linear texture models, many of the state-of-the-art methods are still not capable of reconstructing facial textures with high-frequency details. For this, we take a radically different approach and build a high-quality texture model by Generative Adversarial Networks (GANs) that preserves details. That is, we utilize GANs to train a very powerful generator of facial texture in the UV space. And then show that it is possible to employ this generator network as a statistical texture prior to 3DMM fitting. The resulting texture reconstructions are plausible and photorealistic as GANs are faithful to the real-data distribution in both low- and high- frequency domains.
Then, we revisit the conventional 3DMM fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We propose to optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. In order to be robust towards initialization and expedite the fitting process, we also propose a novel self-supervised regression-based approach. We demonstrate excellent 3D face reconstructions that are photorealistic and identity preserving and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.
In order to extend the non-linear texture model for photo-realistic 3D face synthesis, we present a methodology that generates high-quality texture, shape, and normals jointly. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. Additionally, we study another approach for photo-realistic face synthesis by 3D guidance. This study proposes to generate 3D faces by linear 3DMM and then augment their 2D rendering by an image-to-image translation network to the photorealistic face domain. Both works demonstrate excellent photorealistic face synthesis and show that the generated faces are improving face recognition benchmarks as synthetic training data.
Finally, we study expression reconstruction for personalized 3D face models where we improve generalization and robustness of expression encoding. First, we propose a 3D augmentation approach on 2D head-mounted camera images to increase robustness to perspective changes. And, we also propose to train generic expression encoder network by populating the number of identities with a novel multi-id personalized model training architecture in a self-supervised manner. Both approaches show promising results in both qualitative and quantitative experiments.Open Acces
Hybrid Integration of Euclidean and Geodesic Distance-Based RBF Interpolation for Facial Animation
Simulating believable facial animation is a topic of
increasing interest in computer graphics and visual
effects. In this paper we present a hybrid technique for
the generation of facial animation based on motion
capture data. After capturing a range of facial expressions
defined by Facial Action Coding System (FACS), the
radial basis function (RBF) is used to transfer the motion
data onto two facial models, one realistic and one stylized.
The calculations of the distances for the RBF technique
are approached in three variants: Euclidean-based,
geodesic mesh-based and hybrid-based. The last one
takes the advantages of the first two approaches. In order
to raise the efficiency, the calculations are aided by preprocessed distance data. The results are then evaluated in
a quantitative and qualitative manner, comparing the
animation outcomes with the real footage. Our findings
show the efficiency of the hybrid technique when
generating facial animation with motion capture
Structure-aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation
Morphable models are essential for the statistical modeling of 3D faces.
Previous works on morphable models mostly focus on large-scale facial geometry
but ignore facial details. This paper augments morphable models in representing
facial details by learning a Structure-aware Editable Morphable Model (SEMM).
SEMM introduces a detail structure representation based on the distance field
of wrinkle lines, jointly modeled with detail displacements to establish better
correspondences and enable intuitive manipulation of wrinkle structure.
Besides, SEMM introduces two transformation modules to translate expression
blendshape weights and age values into changes in latent space, allowing
effective semantic detail editing while maintaining identity. Extensive
experiments demonstrate that the proposed model compactly represents facial
details, outperforms previous methods in expression animation qualitatively and
quantitatively, and achieves effective age editing and wrinkle line editing of
facial details. Code and model are available at
https://github.com/gerwang/facial-detail-manipulation.Comment: ECCV 202
4D Facial Expression Diffusion Model
Facial expression generation is one of the most challenging and long-sought
aspects of character animation, with many interesting applications. The
challenging task, traditionally having relied heavily on digital craftspersons,
remains yet to be explored. In this paper, we introduce a generative framework
for generating 3D facial expression sequences (i.e. 4D faces) that can be
conditioned on different inputs to animate an arbitrary 3D face mesh. It is
composed of two tasks: (1) Learning the generative model that is trained over a
set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input
facial mesh driven by the generated landmark sequences. The generative model is
based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved
remarkable success in generative tasks of other domains. While it can be
trained unconditionally, its reverse process can still be conditioned by
various condition signals. This allows us to efficiently develop several
downstream tasks involving various conditional generation, by using expression
labels, text, partial sequences, or simply a facial geometry. To obtain the
full mesh deformation, we then develop a landmark-guided encoder-decoder to
apply the geometrical deformation embedded in landmarks on a given facial mesh.
Experiments show that our model has learned to generate realistic, quality
expressions solely from the dataset of relatively small size, improving over
the state-of-the-art methods. Videos and qualitative comparisons with other
methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models
will be made available upon acceptance
Structural Surface Mapping for Shape Analysis
Natural surfaces are usually associated with feature graphs, such as the cortical surface with anatomical atlas structure. Such a feature graph subdivides the whole surface into meaningful sub-regions. Existing brain mapping and registration methods did not integrate anatomical atlas structures. As a result, with existing brain mappings, it is difficult to visualize and compare the atlas structures. And also existing brain registration methods can not guarantee the best possible alignment of the cortical regions which can help computing more accurate shape similarity metrics for neurodegenerative disease analysis, e.g., Alzheimer’s disease (AD) classification. Also, not much attention has been paid to tackle surface parameterization and registration with graph constraints in a rigorous way which have many applications in graphics, e.g., surface and image morphing.
This dissertation explores structural mappings for shape analysis of surfaces using the feature graphs as constraints. (1) First, we propose structural brain mapping which maps the brain cortical surface onto a planar convex domain using Tutte embedding of a novel atlas graph and harmonic map with atlas graph constraints to facilitate visualization and comparison between the atlas structures. (2) Next, we propose a novel brain registration technique based on an intrinsic atlas-constrained harmonic map which provides the best possible alignment of the cortical regions. (3) After that, the proposed brain registration technique has been applied to compute shape similarity metrics for AD classification. (4) Finally, we propose techniques to compute intrinsic graph-constrained parameterization and registration for general genus-0 surfaces which have been used in surface and image morphing applications
Bridging the gap between reconstruction and synthesis
Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images.
In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them.
In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version
Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks
Generating realistic 3D faces is of high importance for computer graphics and
computer vision applications. Generally, research on 3D face generation
revolves around linear statistical models of the facial surface. Nevertheless,
these models cannot represent faithfully either the facial texture or the
normals of the face, which are very crucial for photo-realistic face synthesis.
Recently, it was demonstrated that Generative Adversarial Networks (GANs) can
be used for generating high-quality textures of faces. Nevertheless, the
generation process either omits the geometry and normals, or independent
processes are used to produce 3D shape information. In this paper, we present
the first methodology that generates high-quality texture, shape, and normals
jointly, which can be used for photo-realistic synthesis. To do so, we propose
a novel GAN that can generate data from different modalities while exploiting
their correlations. Furthermore, we demonstrate how we can condition the
generation on the expression and create faces with various facial expressions.
The qualitative results shown in this paper are compressed due to size
limitations, full-resolution results and the accompanying video can be found in
the supplementary documents. The code and models are available at the project
page: https://github.com/barisgecer/TBGAN.Comment: Check project page: https://github.com/barisgecer/TBGAN for the full
resolution results and the accompanying vide
- …