2,586 research outputs found
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
In this work, we present a multimodal solution to the problem of 4D face
reconstruction from monocular videos. 3D face reconstruction from 2D images is
an under-constrained problem due to the ambiguity of depth. State-of-the-art
methods try to solve this problem by leveraging visual information from a
single image or video, whereas 3D mesh animation approaches rely more on audio.
However, in most cases (e.g. AR/VR applications), videos include both visual
and speech information. We propose AVFace that incorporates both modalities and
accurately reconstructs the 4D facial and lip motion of any speaker, without
requiring any 3D ground truth for training. A coarse stage estimates the
per-frame parameters of a 3D morphable model, followed by a lip refinement, and
then a fine stage recovers facial geometric details. Due to the temporal audio
and video information captured by transformer-based modules, our method is
robust in cases when either modality is insufficient (e.g. face occlusions).
Extensive qualitative and quantitative evaluation demonstrates the superiority
of our method over the current state-of-the-art
High-performance shape memory composites with intrinsic heating capabilities
Shape morphing structures have played a significant role within the field of aerospace for more than a century. While the shape morphing aerostructures of the past and present have depended on hinges and motors to achieve morphing, their future is expected to rely on smart materials and structures that have intrinsic shape morphing capabilities.
One such smart material, that has previously been developed at Imperial College London, is the carbon fibre reinforced epoxy polymer (CFRP) composite with thermoplastic (TP) interleaves. These interleaved composites exhibit controllable stiffness (CS) and shape memory (SM) capabilities under suitable thermal conditions. While these interleaved composites showed excellent shape morphing capabilities, they had several drawbacks. These composites showed poor flexural modulus and through-thickness shear strength compared to the epoxy-based non-interleaved CFRP. These composites also used an oven to achieve the high temperatures required to exhibit the CS and SM capabilities.
This thesis describes studies conducted to mitigate these drawbacks. In the first study described in this thesis, the source of the premature through-thickness shear failure in TP interleaved CFRP composites was discovered to be the low shear strength of the polystyrene (PS) interleaves used in previous works. It was then demonstrated that replacing PS with Poly(styrene-co-acrylonitrile) (SAN) could improve the through-thickness shear strength of the interleaved composites to be almost as high as that of pristine CFRP. Furthermore, the SAN-interleaved CFRP laminates also exhibited excellent CS and SM capabilities.
In the next study described in this thesis, it was demonstrated that the flexural modulus of TP interleaved CFRP composites can be substantially improved by two different methods- (i) reducing the thickness of the TP interleaves, and (ii) introducing reinforcements within the TP interleaves.
The following study describes how intrinsic heating capability was achieved in TP interleaved CFRP composites, through resistive heating of heater elements such as stainless steel (SS) meshes and woven carbon fabric (WCF) embedded within the layup of the composite. This intrinsic heating strategy was used to supply the temperature necessary for the corresponding composites to exhibit CS and SM capabilities. As a result, these intrinsically heated TP interleaved CFRP composites exhibited successful out-of-oven morphing capabilities.
In the final study described in this thesis, composite structures that were initially flat in their as-cured state, but were capable of deployment into planar and curved meshes were designed. Finite element numerical models were used to predict the deployment capabilities of these composite structures. Finally, the deployable composite mesh structures were manufactured and characterised.Open Acces
Reconstruction and Synthesis of Human-Scene Interaction
In this thesis, we argue that the 3D scene is vital for understanding, reconstructing, and synthesizing human motion. We present several approaches which take the scene into consideration in reconstructing and synthesizing Human-Scene Interaction (HSI). We first observe that state-of-the-art pose estimation methods ignore the 3D scene and hence reconstruct poses that are inconsistent with the scene. We address this by proposing a pose estimation method that takes the 3D scene explicitly into account. We call our method PROX for Proximal Relationships with Object eXclusion. We leverage the data generated using PROX and build a method to automatically place 3D scans of people with clothing in scenes. The core novelty of our method is encoding the proximal relationships between the human and the scene in a novel HSI model, called POSA for Pose with prOximitieS and contActs. POSA is limited to static HSI, however. We propose a real-time method for synthesizing dynamic HSI, which we call SAMP for Scene-Aware Motion Prediction. SAMP enables virtual humans to navigate cluttered indoor scenes and naturally interact with objects. Data-driven kinematic models, like SAMP, can produce high-quality motion when applied in environments similar to those shown in the dataset. However, when applied to new scenarios, kinematic models can struggle to generate realistic behaviors that respect scene constraints. In contrast, we present InterPhys which uses adversarial imitation learning and reinforcement learning to train physically-simulated characters that perform scene interaction tasks in a physical and life-like manner
FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
Speech-driven 3D facial animation synthesis has been a challenging task both
in industry and research. Recent methods mostly focus on deterministic deep
learning methods meaning that given a speech input, the output is always the
same. However, in reality, the non-verbal facial cues that reside throughout
the face are non-deterministic in nature. In addition, majority of the
approaches focus on 3D vertex based datasets and methods that are compatible
with existing facial animation pipelines with rigged characters is scarce. To
eliminate these issues, we present FaceDiffuser, a non-deterministic deep
learning model to generate speech-driven facial animations that is trained with
both 3D vertex and blendshape based datasets. Our method is based on the
diffusion technique and uses the pre-trained large speech representation model
HuBERT to encode the audio input. To the best of our knowledge, we are the
first to employ the diffusion method for the task of speech-driven 3D facial
animation synthesis. We have run extensive objective and subjective analyses
and show that our approach achieves better or comparable results in comparison
to the state-of-the-art methods. We also introduce a new in-house dataset that
is based on a blendshape based rigged character. We recommend watching the
accompanying supplementary video. The code and the dataset will be publicly
available.Comment: Pre-print of the paper accepted at ACM SIGGRAPH MIG 202
FLARE: Fast Learning of Animatable and Relightable Mesh Avatars
Our goal is to efficiently learn personalized animatable 3D head avatars from
videos that are geometrically accurate, realistic, relightable, and compatible
with current rendering systems. While 3D meshes enable efficient processing and
are highly portable, they lack realism in terms of shape and appearance. Neural
representations, on the other hand, are realistic but lack compatibility and
are slow to train and render. Our key insight is that it is possible to
efficiently learn high-fidelity 3D mesh representations via differentiable
rendering by exploiting highly-optimized methods from traditional computer
graphics and approximating some of the components with neural networks. To that
end, we introduce FLARE, a technique that enables the creation of animatable
and relightable mesh avatars from a single monocular video. First, we learn a
canonical geometry using a mesh representation, enabling efficient
differentiable rasterization and straightforward animation via learned
blendshapes and linear blend skinning weights. Second, we follow
physically-based rendering and factor observed colors into intrinsic albedo,
roughness, and a neural representation of the illumination, allowing the
learned avatars to be relit in novel scenes. Since our input videos are
captured on a single device with a narrow field of view, modeling the
surrounding environment light is non-trivial. Based on the split-sum
approximation for modeling specular reflections, we address this by
approximating the pre-filtered environment map with a multi-layer perceptron
(MLP) modulated by the surface roughness, eliminating the need to explicitly
model the light. We demonstrate that our mesh-based avatar formulation,
combined with learned deformation, material, and lighting MLPs, produces
avatars with high-quality geometry and appearance, while also being efficient
to train and render compared to existing approaches.Comment: 15 pages, Accepted: ACM Transactions on Graphics (Proceedings of
SIGGRAPH Asia), 202
DynamicSurf: Dynamic Neural RGB-D Surface Reconstruction with an Optimizable Feature Grid
We propose DynamicSurf, a model-free neural implicit surface reconstruction
method for high-fidelity 3D modelling of non-rigid surfaces from monocular
RGB-D video. To cope with the lack of multi-view cues in monocular sequences of
deforming surfaces, one of the most challenging settings for 3D reconstruction,
DynamicSurf exploits depth, surface normals, and RGB losses to improve
reconstruction fidelity and optimisation time. DynamicSurf learns a neural
deformation field that maps a canonical representation of the surface geometry
to the current frame. We depart from current neural non-rigid surface
reconstruction models by designing the canonical representation as a learned
feature grid which leads to faster and more accurate surface reconstruction
than competing approaches that use a single MLP. We demonstrate DynamicSurf on
public datasets and show that it can optimize sequences of varying frames with
speedup over pure MLP-based approaches while achieving comparable
results to the state-of-the-art methods. Project is available at
https://mirgahney.github.io//DynamicSurf.io/
MA-NeRF: Motion-Assisted Neural Radiance Fields for Face Synthesis from Sparse Images
We address the problem of photorealistic 3D face avatar synthesis from sparse
images. Existing Parametric models for face avatar reconstruction struggle to
generate details that originate from inputs. Meanwhile, although current
NeRF-based avatar methods provide promising results for novel view synthesis,
they fail to generalize well for unseen expressions. We improve from NeRF and
propose a novel framework that, by leveraging the parametric 3DMM models, can
reconstruct a high-fidelity drivable face avatar and successfully handle the
unseen expressions. At the core of our implementation are structured
displacement feature and semantic-aware learning module. Our structured
displacement feature will introduce the motion prior as an additional
constraints and help perform better for unseen expressions, by constructing
displacement volume. Besides, the semantic-aware learning incorporates
multi-level prior, e.g., semantic embedding, learnable latent code, to lift the
performance to a higher level. Thorough experiments have been doen both
quantitatively and qualitatively to demonstrate the design of our framework,
and our method achieves much better results than the current state-of-the-arts
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
Face recognition is a prevailing authentication solution in numerous
biometric applications. Physical adversarial attacks, as an important
surrogate, can identify the weaknesses of face recognition systems and evaluate
their robustness before deployed. However, most existing physical attacks are
either detectable readily or ineffective against commercial recognition
systems. The goal of this work is to develop a more reliable technique that can
carry out an end-to-end evaluation of adversarial robustness for commercial
systems. It requires that this technique can simultaneously deceive black-box
recognition models and evade defensive mechanisms. To fulfill this, we design
adversarial textured 3D meshes (AT3D) with an elaborate topology on a human
face, which can be 3D-printed and pasted on the attacker's face to evade the
defenses. However, the mesh-based optimization regime calculates gradients in
high-dimensional mesh space, and can be trapped into local optima with
unsatisfactory transferability. To deviate from the mesh-based space, we
propose to perturb the low-dimensional coefficient space based on 3D Morphable
Model, which significantly improves black-box transferability meanwhile
enjoying faster search efficiency and better visual quality. Extensive
experiments in digital and physical scenarios show that our method effectively
explores the security vulnerabilities of multiple popular commercial services,
including three recognition APIs, four anti-spoofing APIs, two prevailing
mobile phones and two automated access control systems
Multi-view 3D Face Reconstruction Based on Flame
At present, face 3D reconstruction has broad application prospects in various
fields, but the research on it is still in the development stage. In this
paper, we hope to achieve better face 3D reconstruction quality by combining
multi-view training framework with face parametric model Flame, propose a
multi-view training and testing model MFNet (Multi-view Flame Network). We
build a self-supervised training framework and implement constraints such as
multi-view optical flow loss function and face landmark loss, and finally
obtain a complete MFNet. We propose innovative implementations of multi-view
optical flow loss and the covisible mask. We test our model on AFLW and
facescape datasets and also take pictures of our faces to reconstruct 3D faces
while simulating actual scenarios as much as possible, which achieves good
results. Our work mainly addresses the problem of combining parametric models
of faces with multi-view face 3D reconstruction and explores the implementation
of a Flame based multi-view training and testing framework for contributing to
the field of face 3D reconstruction
- …