41 research outputs found
Uncalibrated Non-Rigid Factorisation by Independent Subspace Analysis
We propose a general, prior-free approach for the uncalibrated non-rigid
structure-from-motion problem for modelling and analysis of non-rigid objects
such as human faces. The word general refers to an approach that recovers the
non-rigid affine structure and motion from 2D point correspondences by assuming
that (1) the non-rigid shapes are generated by a linear combination of rigid 3D
basis shapes, (2) that the non-rigid shapes are affine in nature, i.e., they
can be modelled as deviations from the mean, rigid shape, (3) and that the
basis shapes are statistically independent. In contrast to the majority of
existing works, no prior information is assumed for the structure and motion
apart from the assumption the that underlying basis shapes are statistically
independent. The independent 3D shape bases are recovered by independent
subspace analysis (ISA). Likewise, in contrast to the most previous approaches,
no calibration information is assumed for affine cameras; the reconstruction is
solved up to a global affine ambiguity that makes our approach simple but
efficient. In the experiments, we evaluated the method with several standard
data sets including a real face expression data set of 7200 faces with 2D point
correspondences and unknown 3D structure and motion for which we obtained
promising results
Deformable and articulated 3D reconstruction from monocular video sequences
PhDThis thesis addresses the problem of deformable and articulated structure from motion from
monocular uncalibrated video sequences. Structure from motion is defined as the problem of
recovering information about the 3D structure of scenes imaged by a camera in a video sequence.
Our study aims at the challenging problem of non-rigid shapes (e.g. a beating heart or a smiling
face). Non-rigid structures appear constantly in our everyday life, think of a bicep curling, a
torso twisting or a smiling face. Our research seeks a general method to perform 3D shape
recovery purely from data, without having to rely on a pre-computed model or training data.
Open problems in the field are the difficulty of the non-linear estimation, the lack of a real-time
system, large amounts of missing data in real-world video sequences, measurement noise and
strong deformations. Solving these problems would take us far beyond the current state of the
art in non-rigid structure from motion. This dissertation presents our contributions in the field
of non-rigid structure from motion, detailing a novel algorithm that enforces the exact metric
structure of the problem at each step of the minimisation by projecting the motion matrices
onto the correct deformable or articulated metric motion manifolds respectively. An important
advantage of this new algorithm is its ability to handle missing data which becomes crucial
when dealing with real video sequences. We present a generic bilinear estimation framework,
which improves convergence and makes use of the manifold constraints. Finally, we demonstrate
a sequential, frame-by-frame estimation algorithm, which provides a 3D model and camera
parameters for each video frame, while simultaneously building a model of object deformation
3D Non-Rigid Reconstruction with Prior Shape Constraints
3D non-rigid shape recovery from a single uncalibrated camera is a challenging, under-constrained problem in computer vision. Although tremendous progress has been achieved towards solving the problem, two main limitations still exist in most previous solutions. First, current methods focus on non-incremental solutions, that is, the algorithms require collection of all the measurement data before the reconstruction takes place. This methodology is inherently unsuitable for applications requiring real-time solutions. At the same time, most of the existing approaches assume that 3D shapes can be accurately modelled in a linear subspace. These methods are simple and have been proven effective for reconstructions of objects with relatively small deformations, but have considerable limitations when the deformations are large or complex. The non-linear deformations are often observed in highly flexible objects for which the use of the linear model is impractical.
Note that specific types of shape variation might be governed by only a small number of parameters and therefore can be well-represented in a low dimensional manifold. The methods proposed in this thesis aim to estimate the non-rigid shapes and the corresponding camera trajectories, based on both the observations and the prior learned manifold.
Firstly, an incremental approach is proposed for estimating the deformable objects. An important advantage of this method is the ability to reconstruct the 3D shape from a newly observed image and update the parameters in 3D shape space. However, this recursive method assumes the deformable shapes only have small variations from a mean shape, thus is still not feasible for objects subject to large scale deformations. To address this problem, a series of approaches are proposed, all based on non-linear manifold learning techniques. Such manifold is used as a shape prior, with the reconstructed shapes constrained to lie within the manifold. Those non-linear manifold based approaches significantly improve the quality of reconstructed results and are well-adapted to different types of shapes undergoing significant and complex deformations.
Throughout the thesis, methods are validated quantitatively on 2D points sequences projected from the 3D motion capture data for a ground truth comparison, and are qualitatively demonstrated on real example of 2D video sequences. Comparisons are made for the proposed methods against several state-of-the-art techniques, with results shown for a variety of challenging deformable objects. Extensive experiments also demonstrate the robustness of the proposed algorithms with respect to measurement noise and missing data
Modelling human pose and shape based on a database of human 3D scans
Generating realistic human shapes and motion is an important task both in the motion picture industry and in computer games. In feature films, high quality and believability are the most important characteristics. Additionally, when creating virtual doubles the generated charactes have to match as closely as possible to given real persons. In contrast, in computer games the level of realism does not need to be as high but real-time performance is essential. It is desirable to meet all these requirements with a general model of human pose and shape. In addition, many markerless human tracking methods applied, e.g., in biomedicine or sports science can benefit greatly from the availability of such a model because most methods require a 3D model of the tracked subject as input, which can be generated on-the-fly given a suitable shape and pose model.
In this thesis, a comprehensive procedure is presented to generate different general models of human pose. A database of 3D scans spanning the space of human pose and shape variations is introduced. Then, four different approaches for transforming the database into a general model of human pose and shape are presented, which improve the current state of the art. Experiments are performed to evaluate and compare the proposed models on real-world problems, i.e., characters are generated given semantic constraints and the underlying shape and pose of humans given 3D scans, multi-view video, or uncalibrated monocular images is estimated.Die Erzeugung realistischer Menschenmodelle ist eine wichtige Anwendung in der Filmindustrie und bei Computerspielen. In Spielen ist Echtzeitsynthese unabdingbar aber der Detailgrad muß nicht so hoch sein wie in Filmen. Für virtuelle Doubles, wie sie z.B. in Filmen eingesetzt werden, muss der generierte Charakter dem gegebenen realen Menschen möglichst ähnlich sein. Mit einem generellen Modell für menschliche Pose und Körperform ist es möglich alle diese Anforderungen zu erfüllen. Zusätzlich können viele Verfahren zur markerlosen Bewegungserfassung, wie sie z.B. in der Biomedizin oder in den Sportwissenschaften eingesetzt werden, von einem generellen Modell für Pose und Körperform profitieren. Da diese ein 3D Modell der erfassten Person benötigen, das jetzt zur Laufzeit generiert werden kann. In dieser Doktorarbeit wird ein umfassender Ansatz vorgestellt, um verschiedene Modelle für Pose und Körperform zu berechnen. Zunächst wird eine Datenbank von 3D Scans aufgebaut, die Pose- und Körperformvariationen von Menschen umfasst. Dann werden vier verschiedene Verfahren eingeführt, die daraus generelle Modelle für Pose und Körperform berechnen und Probleme beim Stand der Technik beheben. Die vorgestellten Modelle werden auf realistischen Problemstellungen getestet. So werden Menschenmodelle aus einigen wenigen Randbedingungen erzeugt und Pose und Körperform von Probanden wird aus 3D Scans, Multi-Kamera Videodaten und Einzelbildern der bekleideten Personen geschätzt
Multilinear methods for disentangling variations with applications to facial analysis
Several factors contribute to the appearance of an object in a visual scene, including pose,
illumination, and deformation, among others. Each factor accounts for a source of variability
in the data. It is assumed that the multiplicative interactions of these factors emulate the
entangled variability, giving rise to the rich structure of visual object appearance. Disentangling
such unobserved factors from visual data is a challenging task, especially when the data have
been captured in uncontrolled recording conditions (also referred to as “in-the-wild”) and label
information is not available. The work presented in this thesis focuses on disentangling the
variations contained in visual data, in particular applied to 2D and 3D faces. The motivation
behind this work lies in recent developments in the field, such as (i) the creation of large, visual
databases for face analysis, with (ii) the need of extracting information without the use of labels
and (iii) the need to deploy systems under demanding, real-world conditions.
In the first part of this thesis, we present a method to synthesise plausible 3D expressions
that preserve the identity of a target subject. This method is supervised as the model uses
labels, in this case 3D facial meshes of people performing a defined set of facial expressions, to
learn. The ability to synthesise an entire facial rig from a single neutral expression has a large
range of applications both in computer graphics and computer vision, ranging from the ecient
and cost-e↵ective creation of CG characters to scalable data generation for machine learning
purposes. Unlike previous methods based on multilinear models, the proposed approach is
capable to extrapolate well outside the sample pool, which allows it to accurately reproduce
the identity of the target subject and create artefact-free expression shapes while requiring
only a small input dataset. We introduce global-local multilinear models that leverage the
strengths of expression-specific and identity-specific local models combined with coarse motion
estimations from a global model. The expression-specific and identity-specific local models
are built from di↵erent slices of the patch-wise local multilinear model. Experimental results
show that we achieve high-quality, identity-preserving facial expression synthesis results that
outperform existing methods both quantitatively and qualitatively.
In the second part of this thesis, we investigate how the modes of variations from visual data
can be extracted. Our assumption is that visual data has an underlying structure consisting of
factors of variation and their interactions. Finding this structure and the factors is important
as it would not only help us to better understand visual data but once obtained we can edit the factors for use in various applications. Shape from Shading and expression transfer are just two
of the potential applications. To extract the factors of variation, several supervised methods
have been proposed but they require both labels regarding the modes of variations and the same
number of samples under all modes of variations. Therefore, their applicability is limited to
well-organised data, usually captured in well-controlled conditions. We propose a novel general
multilinear matrix decomposition method that discovers the multilinear structure of possibly
incomplete sets of visual data in unsupervised setting. We demonstrate the applicability of the
proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the
wild and with occlusion removal), expression transfer, and estimation of surface normals from
images captured in the wild.
Finally, leveraging the unsupervised multilinear method proposed as well as recent advances in
deep learning, we propose a weakly supervised deep learning method for disentangling multiple
latent factors of variation in face images captured in-the-wild. To this end, we propose a deep
latent variable model, where we model the multiplicative interactions of multiple latent factors
of variation explicitly as a multilinear structure. We demonstrate that the proposed approach
indeed learns disentangled representations of facial expressions and pose, which can be used in
various applications, including face editing, as well as 3D face reconstruction and classification
of facial expression, identity and pose.Open Acces
A Benchmark and Evaluation of Non-Rigid Structure from Motion
Non-Rigid structure from motion (NRSfM), is a long standing and central
problem in computer vision, allowing us to obtain 3D information from multiple
images when the scene is dynamic. A main issue regarding the further
development of this important computer vision topic, is the lack of high
quality data sets. We here address this issue by presenting of data set
compiled for this purpose, which is made publicly available, and considerably
larger than previous state of the art. To validate the applicability of this
data set, and provide and investigation into the state of the art of NRSfM,
including potential directions forward, we here present a benchmark and a
scrupulous evaluation using this data set. This benchmark evaluates 16
different methods with available code, which we argue reasonably spans the
state of the art in NRSfM. We also hope, that the presented and public data set
and evaluation, will provide benchmark tools for further development in this
field