1,325 research outputs found

    3D Face Modelling, Analysis and Synthesis

    Get PDF
    Human faces have always been of a special interest to researchers in the computer vision and graphics areas. There has been an explosion in the number of studies around accurately modelling, analysing and synthesising realistic faces for various applications. The importance of human faces emerges from the fact that they are invaluable means of effective communication, recognition, behaviour analysis, conveying emotions, etc. Therefore, addressing the automatic visual perception of human faces efficiently could open up many influential applications in various domains, e.g. virtual/augmented reality, computer-aided surgeries, security and surveillance, entertainment, and many more. However, the vast variability associated with the geometry and appearance of human faces captured in unconstrained videos and images renders their automatic analysis and understanding very challenging even today. The primary objective of this thesis is to develop novel methodologies of 3D computer vision for human faces that go beyond the state of the art and achieve unprecedented quality and robustness. In more detail, this thesis advances the state of the art in 3D facial shape reconstruction and tracking, fine-grained 3D facial motion estimation, expression recognition and facial synthesis with the aid of 3D face modelling. We give a special attention to the case where the input comes from monocular imagery data captured under uncontrolled settings, a.k.a. \textit{in-the-wild} data. This kind of data are available in abundance nowadays on the internet. Analysing these data pushes the boundaries of currently available computer vision algorithms and opens up many new crucial applications in the industry. We define the four targeted vision problems (3D facial reconstruction &\& tracking, fine-grained 3D facial motion estimation, expression recognition, facial synthesis) in this thesis as the four 3D-based essential systems for the automatic facial behaviour understanding and show how they rely on each other. Finally, to aid the research conducted in this thesis, we collect and annotate a large-scale videos dataset of monocular facial performances. All of our proposed methods demonstarte very promising quantitative and qualitative results when compared to the state-of-the-art methods

    AI-generated Content for Various Data Modalities: A Survey

    Full text link
    AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

    Adaptive face modelling for reconstructing 3D face shapes from single 2D images

    Get PDF
    Example-based statistical face models using principle component analysis (PCA) have been widely deployed for three-dimensional (3D) face reconstruction and face recognition. The two common factors that are generally concerned with such models are the size of the training dataset and the selection of different examples in the training set. The representational power (RP) of an example-based model is its capability to depict a new 3D face for a given 2D face image. The RP of the model can be increased by correspondingly increasing the number of training samples. In this contribution, a novel approach is proposed to increase the RP of the 3D face reconstruction model by deforming a set of examples in the training dataset. A PCA-based 3D face model is adapted for each new near frontal input face image to reconstruct the 3D face shape. Further an extended Tikhonov regularisation method has been

    Towards Scalable Multi-View Reconstruction of Geometry and Materials

    Full text link
    In this paper, we propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be captured with stationary light stages. The input are high-resolution RGB-D images captured by a mobile, hand-held capture system with point lights for active illumination. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. To facilitate scalability to large numbers of observation views and optimization variables, we introduce a distributed optimization algorithm that reconstructs 2.5D keyframe-based representations of the scene. A novel multi-view consistency regularizer effectively synchronizes neighboring keyframes such that the local optimization results allow for seamless integration into a globally consistent 3D model. We provide a study on the importance of each component in our formulation and show that our method compares favorably to baselines. We further demonstrate that our method accurately reconstructs various objects and materials and allows for expansion to spatially larger scenes. We believe that this work represents a significant step towards making geometry and material estimation from hand-held scanners scalable

    State of the Art on Neural Rendering

    Get PDF
    Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems
    corecore