41 research outputs found

    Uncalibrated Non-Rigid Factorisation by Independent Subspace Analysis

    Get PDF
    We propose a general, prior-free approach for the uncalibrated non-rigid structure-from-motion problem for modelling and analysis of non-rigid objects such as human faces. The word general refers to an approach that recovers the non-rigid affine structure and motion from 2D point correspondences by assuming that (1) the non-rigid shapes are generated by a linear combination of rigid 3D basis shapes, (2) that the non-rigid shapes are affine in nature, i.e., they can be modelled as deviations from the mean, rigid shape, (3) and that the basis shapes are statistically independent. In contrast to the majority of existing works, no prior information is assumed for the structure and motion apart from the assumption the that underlying basis shapes are statistically independent. The independent 3D shape bases are recovered by independent subspace analysis (ISA). Likewise, in contrast to the most previous approaches, no calibration information is assumed for affine cameras; the reconstruction is solved up to a global affine ambiguity that makes our approach simple but efficient. In the experiments, we evaluated the method with several standard data sets including a real face expression data set of 7200 faces with 2D point correspondences and unknown 3D structure and motion for which we obtained promising results

    Deformable and articulated 3D reconstruction from monocular video sequences

    Get PDF
    PhDThis thesis addresses the problem of deformable and articulated structure from motion from monocular uncalibrated video sequences. Structure from motion is defined as the problem of recovering information about the 3D structure of scenes imaged by a camera in a video sequence. Our study aims at the challenging problem of non-rigid shapes (e.g. a beating heart or a smiling face). Non-rigid structures appear constantly in our everyday life, think of a bicep curling, a torso twisting or a smiling face. Our research seeks a general method to perform 3D shape recovery purely from data, without having to rely on a pre-computed model or training data. Open problems in the field are the difficulty of the non-linear estimation, the lack of a real-time system, large amounts of missing data in real-world video sequences, measurement noise and strong deformations. Solving these problems would take us far beyond the current state of the art in non-rigid structure from motion. This dissertation presents our contributions in the field of non-rigid structure from motion, detailing a novel algorithm that enforces the exact metric structure of the problem at each step of the minimisation by projecting the motion matrices onto the correct deformable or articulated metric motion manifolds respectively. An important advantage of this new algorithm is its ability to handle missing data which becomes crucial when dealing with real video sequences. We present a generic bilinear estimation framework, which improves convergence and makes use of the manifold constraints. Finally, we demonstrate a sequential, frame-by-frame estimation algorithm, which provides a 3D model and camera parameters for each video frame, while simultaneously building a model of object deformation

    3D Non-Rigid Reconstruction with Prior Shape Constraints

    Get PDF
    3D non-rigid shape recovery from a single uncalibrated camera is a challenging, under-constrained problem in computer vision. Although tremendous progress has been achieved towards solving the problem, two main limitations still exist in most previous solutions. First, current methods focus on non-incremental solutions, that is, the algorithms require collection of all the measurement data before the reconstruction takes place. This methodology is inherently unsuitable for applications requiring real-time solutions. At the same time, most of the existing approaches assume that 3D shapes can be accurately modelled in a linear subspace. These methods are simple and have been proven effective for reconstructions of objects with relatively small deformations, but have considerable limitations when the deformations are large or complex. The non-linear deformations are often observed in highly flexible objects for which the use of the linear model is impractical. Note that specific types of shape variation might be governed by only a small number of parameters and therefore can be well-represented in a low dimensional manifold. The methods proposed in this thesis aim to estimate the non-rigid shapes and the corresponding camera trajectories, based on both the observations and the prior learned manifold. Firstly, an incremental approach is proposed for estimating the deformable objects. An important advantage of this method is the ability to reconstruct the 3D shape from a newly observed image and update the parameters in 3D shape space. However, this recursive method assumes the deformable shapes only have small variations from a mean shape, thus is still not feasible for objects subject to large scale deformations. To address this problem, a series of approaches are proposed, all based on non-linear manifold learning techniques. Such manifold is used as a shape prior, with the reconstructed shapes constrained to lie within the manifold. Those non-linear manifold based approaches significantly improve the quality of reconstructed results and are well-adapted to different types of shapes undergoing significant and complex deformations. Throughout the thesis, methods are validated quantitatively on 2D points sequences projected from the 3D motion capture data for a ground truth comparison, and are qualitatively demonstrated on real example of 2D video sequences. Comparisons are made for the proposed methods against several state-of-the-art techniques, with results shown for a variety of challenging deformable objects. Extensive experiments also demonstrate the robustness of the proposed algorithms with respect to measurement noise and missing data

    Modelling human pose and shape based on a database of human 3D scans

    Get PDF
    Generating realistic human shapes and motion is an important task both in the motion picture industry and in computer games. In feature films, high quality and believability are the most important characteristics. Additionally, when creating virtual doubles the generated charactes have to match as closely as possible to given real persons. In contrast, in computer games the level of realism does not need to be as high but real-time performance is essential. It is desirable to meet all these requirements with a general model of human pose and shape. In addition, many markerless human tracking methods applied, e.g., in biomedicine or sports science can benefit greatly from the availability of such a model because most methods require a 3D model of the tracked subject as input, which can be generated on-the-fly given a suitable shape and pose model. In this thesis, a comprehensive procedure is presented to generate different general models of human pose. A database of 3D scans spanning the space of human pose and shape variations is introduced. Then, four different approaches for transforming the database into a general model of human pose and shape are presented, which improve the current state of the art. Experiments are performed to evaluate and compare the proposed models on real-world problems, i.e., characters are generated given semantic constraints and the underlying shape and pose of humans given 3D scans, multi-view video, or uncalibrated monocular images is estimated.Die Erzeugung realistischer Menschenmodelle ist eine wichtige Anwendung in der Filmindustrie und bei Computerspielen. In Spielen ist Echtzeitsynthese unabdingbar aber der Detailgrad muß nicht so hoch sein wie in Filmen. Für virtuelle Doubles, wie sie z.B. in Filmen eingesetzt werden, muss der generierte Charakter dem gegebenen realen Menschen möglichst ähnlich sein. Mit einem generellen Modell für menschliche Pose und Körperform ist es möglich alle diese Anforderungen zu erfüllen. Zusätzlich können viele Verfahren zur markerlosen Bewegungserfassung, wie sie z.B. in der Biomedizin oder in den Sportwissenschaften eingesetzt werden, von einem generellen Modell für Pose und Körperform profitieren. Da diese ein 3D Modell der erfassten Person benötigen, das jetzt zur Laufzeit generiert werden kann. In dieser Doktorarbeit wird ein umfassender Ansatz vorgestellt, um verschiedene Modelle für Pose und Körperform zu berechnen. Zunächst wird eine Datenbank von 3D Scans aufgebaut, die Pose- und Körperformvariationen von Menschen umfasst. Dann werden vier verschiedene Verfahren eingeführt, die daraus generelle Modelle für Pose und Körperform berechnen und Probleme beim Stand der Technik beheben. Die vorgestellten Modelle werden auf realistischen Problemstellungen getestet. So werden Menschenmodelle aus einigen wenigen Randbedingungen erzeugt und Pose und Körperform von Probanden wird aus 3D Scans, Multi-Kamera Videodaten und Einzelbildern der bekleideten Personen geschätzt

    Multilinear methods for disentangling variations with applications to facial analysis

    Get PDF
    Several factors contribute to the appearance of an object in a visual scene, including pose, illumination, and deformation, among others. Each factor accounts for a source of variability in the data. It is assumed that the multiplicative interactions of these factors emulate the entangled variability, giving rise to the rich structure of visual object appearance. Disentangling such unobserved factors from visual data is a challenging task, especially when the data have been captured in uncontrolled recording conditions (also referred to as “in-the-wild”) and label information is not available. The work presented in this thesis focuses on disentangling the variations contained in visual data, in particular applied to 2D and 3D faces. The motivation behind this work lies in recent developments in the field, such as (i) the creation of large, visual databases for face analysis, with (ii) the need of extracting information without the use of labels and (iii) the need to deploy systems under demanding, real-world conditions. In the first part of this thesis, we present a method to synthesise plausible 3D expressions that preserve the identity of a target subject. This method is supervised as the model uses labels, in this case 3D facial meshes of people performing a defined set of facial expressions, to learn. The ability to synthesise an entire facial rig from a single neutral expression has a large range of applications both in computer graphics and computer vision, ranging from the ecient and cost-e↵ective creation of CG characters to scalable data generation for machine learning purposes. Unlike previous methods based on multilinear models, the proposed approach is capable to extrapolate well outside the sample pool, which allows it to accurately reproduce the identity of the target subject and create artefact-free expression shapes while requiring only a small input dataset. We introduce global-local multilinear models that leverage the strengths of expression-specific and identity-specific local models combined with coarse motion estimations from a global model. The expression-specific and identity-specific local models are built from di↵erent slices of the patch-wise local multilinear model. Experimental results show that we achieve high-quality, identity-preserving facial expression synthesis results that outperform existing methods both quantitatively and qualitatively. In the second part of this thesis, we investigate how the modes of variations from visual data can be extracted. Our assumption is that visual data has an underlying structure consisting of factors of variation and their interactions. Finding this structure and the factors is important as it would not only help us to better understand visual data but once obtained we can edit the factors for use in various applications. Shape from Shading and expression transfer are just two of the potential applications. To extract the factors of variation, several supervised methods have been proposed but they require both labels regarding the modes of variations and the same number of samples under all modes of variations. Therefore, their applicability is limited to well-organised data, usually captured in well-controlled conditions. We propose a novel general multilinear matrix decomposition method that discovers the multilinear structure of possibly incomplete sets of visual data in unsupervised setting. We demonstrate the applicability of the proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the wild and with occlusion removal), expression transfer, and estimation of surface normals from images captured in the wild. Finally, leveraging the unsupervised multilinear method proposed as well as recent advances in deep learning, we propose a weakly supervised deep learning method for disentangling multiple latent factors of variation in face images captured in-the-wild. To this end, we propose a deep latent variable model, where we model the multiplicative interactions of multiple latent factors of variation explicitly as a multilinear structure. We demonstrate that the proposed approach indeed learns disentangled representations of facial expressions and pose, which can be used in various applications, including face editing, as well as 3D face reconstruction and classification of facial expression, identity and pose.Open Acces

    A Benchmark and Evaluation of Non-Rigid Structure from Motion

    Full text link
    Non-Rigid structure from motion (NRSfM), is a long standing and central problem in computer vision, allowing us to obtain 3D information from multiple images when the scene is dynamic. A main issue regarding the further development of this important computer vision topic, is the lack of high quality data sets. We here address this issue by presenting of data set compiled for this purpose, which is made publicly available, and considerably larger than previous state of the art. To validate the applicability of this data set, and provide and investigation into the state of the art of NRSfM, including potential directions forward, we here present a benchmark and a scrupulous evaluation using this data set. This benchmark evaluates 16 different methods with available code, which we argue reasonably spans the state of the art in NRSfM. We also hope, that the presented and public data set and evaluation, will provide benchmark tools for further development in this field
    corecore