7 research outputs found

    Deformable and articulated 3D reconstruction from monocular video sequences

    Get PDF
    PhDThis thesis addresses the problem of deformable and articulated structure from motion from monocular uncalibrated video sequences. Structure from motion is defined as the problem of recovering information about the 3D structure of scenes imaged by a camera in a video sequence. Our study aims at the challenging problem of non-rigid shapes (e.g. a beating heart or a smiling face). Non-rigid structures appear constantly in our everyday life, think of a bicep curling, a torso twisting or a smiling face. Our research seeks a general method to perform 3D shape recovery purely from data, without having to rely on a pre-computed model or training data. Open problems in the field are the difficulty of the non-linear estimation, the lack of a real-time system, large amounts of missing data in real-world video sequences, measurement noise and strong deformations. Solving these problems would take us far beyond the current state of the art in non-rigid structure from motion. This dissertation presents our contributions in the field of non-rigid structure from motion, detailing a novel algorithm that enforces the exact metric structure of the problem at each step of the minimisation by projecting the motion matrices onto the correct deformable or articulated metric motion manifolds respectively. An important advantage of this new algorithm is its ability to handle missing data which becomes crucial when dealing with real video sequences. We present a generic bilinear estimation framework, which improves convergence and makes use of the manifold constraints. Finally, we demonstrate a sequential, frame-by-frame estimation algorithm, which provides a 3D model and camera parameters for each video frame, while simultaneously building a model of object deformation

    Learning and recovering 3D surface deformations

    Get PDF
    Recovering the 3D deformations of a non-rigid surface from a single viewpoint has applications in many domains such as sports, entertainment, and medical imaging. Unfortunately, without any knowledge of the possible deformations that the object of interest can undergo, it is severely under-constrained, and extremely different shapes can have very similar appearances when reprojected onto an image plane. In this thesis, we first exhibit the ambiguities of the reconstruction problem when relying on correspondences between a reference image for which we know the shape and an input image. We then propose several approaches to overcoming these ambiguities. The core idea is that some a priori knowledge about how a surface can deform must be introduced to solve them. We therefore present different ways to formulate that knowledge that range from very generic constraints to models specifically designed for a particular object or material. First, we propose generally applicable constraints formulated as motion models. Such models simply link the deformations of the surface from one image to the next in a video sequence. The obvious advantage is that they can be used independently of the physical properties of the object of interest. However, to be effective, they require the presence of texture over the whole surface, and, additionally, do not prevent error accumulation from frame to frame. To overcome these weaknesses, we propose to introduce statistical learning techniques that let us build a model from a large set of training examples, that is, in our case, known 3D deformations. The resulting model then essentially performs linear or non-linear interpolation between the training examples. Following this approach, we first propose a linear global representation that models the behavior of the whole surface. As is the case with all statistical learning techniques, the applicability of this representation is limited by the fact that acquiring training data is far from trivial. A large surface can undergo many subtle deformations, and thus a large amount of training data must be available to build an accurate model. We therefore propose an automatic way of generating such training examples in the case of inextensible surfaces. Furthermore, we show that the resulting linear global models can be incorporated into a closed-form solution to the shape recovery problem. This lets us not only track deformations from frame to frame, but also reconstruct surfaces from individual images. The major drawback of global representations is that they can only model the behavior of a specific surface, which forces us to re-train a new model for every new shape, even though it is made of a material observed before. To overcome this issue, and simultaneously reduce the amount of required training data, we propose local deformation models. Such models describe the behavior of small portions of a surface, and can be combined to form arbitrary global shapes. For this purpose, we study both linear and non-linear statistical learning methods, and show that, whereas the latter are better suited for traking deformations from frame to frame, the former can also be used for reconstruction from a single image

    Single-View 3D Reconstruction of Animals

    Get PDF
    Humans have a remarkable ability to infer the 3D shape of objects from just a single image. Even for complex and non-rigid objects like people and animals, from just a single picture we can say much about its 3D shape, configuration and even the viewpoint that the photo was taken from. Today, the same cannot be said for computers – the existing solutions are limited, particularly for highly articulated and deformable objects. Hence, the purpose of this thesis is to develop methods for single-view 3D reconstruction of non-rigid objects, specifically for people and animals. Our goal is to recover a full 3D surface model of these objects from a single unconstrained image. The ability to do so, even with some user interaction, will have a profound impact in AR/VR and the entertainment industry. Immediate applications are virtual avatars and pets, virtual clothes fitting, immersive games, as well as applications in biology, neuroscience, ecology, and farming. However, this is a challenging problem because these objects can appear in many different forms. This thesis begins by providing the first fully automatic solution for recovering a 3D mesh of a human body from a single image. Our solution follows the classical paradigm of bottom-up estimation followed by top-down verification. The key is to solve for the mostly likely 3D model that explains the image observations by using powerful priors. The rest of the thesis explores how to extend a similar approach for other animals. Doing so reveals novel challenges whose common thread is the lack of specialized data. For solving the bottom-up estimation problem well, current methods rely on the availability of human supervision in the form of 2D part annotations. However, these annotations do not exist in the same scale for animals. We deal with this problem by means of data synthesis for the case of fine-grained categories such as bird species. There is also little work that systematically addresses the 3D scanning of animals, which almost all prior works require for learning a deformable 3D model. We propose a solution to learn a 3D deformable model from a set of annotated 2D images with a template 3D mesh and from a few set of 3D toy figurine scans. We show results on birds, house cats, horses, cows, dogs, big cats, and even hippos. This thesis makes steps towards a fully automatic system for single-view 3D reconstruction of animals. We hope this work inspires more future research in this direction

    Local Deformation Modelling for Non-Rigid Structure from Motion

    Get PDF
    PhDReconstructing the 3D geometry of scenes based on monocular image sequences is a long-standing problem in computer vision. Structure from motion (SfM) aims at a data-driven approach without requiring a priori models of the scene. When the scene is rigid, SfM is a well understood problem with solutions widely used in industry. However, if the scene is non-rigid, monocular reconstruction without additional information is an ill-posed problem and no satisfactory solution has yet been found. Current non-rigid SfM (NRSfM) methods typically aim at modelling deformable motion globally. Additionally, most of these methods focus on cases where deformable motion is seen as small variations from a mean shape. In turn, these methods fail at reconstructing highly deformable objects such as a flag waving in the wind. Additionally, reconstructions typically consist of low detail, sparse point-cloud representation of objects. In this thesis we aim at reconstructing highly deformable surfaces by modelling them locally. In line with a recent trend in NRSfM, we propose a piecewise approach which reconstructs local overlapping regions independently. These reconstructions are merged into a global object by imposing 3D consistency of the overlapping regions. We propose our own local model – the Quadratic Deformation model – and show how patch division and reconstruction can be formulated in a principled approach by alternating at minimizing a single geometric cost – the image re-projection error of the reconstruction. Moreover, we extend our approach to dense NRSfM, where reconstructions are preformed at the pixel level, improving the detail of state of the art reconstructions. Finally we show how our principled approach can be used to perform simultaneous segmentation and reconstruction of articulated motion, recovering meaningful segments which provide a coarse 3D skeleton of the object.Fundacao para a Ciencia e a Tecnologia (FCT) under Doctoral Grant SFRH/BD/70312/2010; European Research Council under ERC Starting Grant agreement 204871-HUMANI

    Non-Rigid Structure from Motion

    Get PDF
    This thesis revisits a challenging classical problem in geometric computer vision known as "Non-Rigid Structure-from-Motion" (NRSfM). It is a well-known problem where the task is to recover the 3D shape and motion of a non-rigidly moving object from image data. A reliable solution to this problem is valuable in several industrial applications such as virtual reality, medical surgery, animation movies etc. Nevertheless, to date, there does not exist any algorithm that can solve NRSfM for all kinds of conceivable motion. As a result, additional constraints and assumptions are often employed to solve NRSfM. The task is challenging due to the inherent unconstrained nature of the problem itself as many 3D varying configurations can have similar image projections. The problem becomes even more challenging if the camera is moving along with the object. The thesis takes on a modern view to this challenging problem and proposes a few algorithms that have set a new performance benchmark to solve NRSfM. The thesis not only discusses the classical work in NRSfM but also proposes some powerful elementary modification to it. The foundation of this thesis surpass the traditional single object NRSFM and for the first time provides an effective formulation to realise multi-body NRSfM. Most techniques for NRSfM under factorisation can only handle sparse feature correspondences. These sparse features are then used to construct a scene using the organisation of points, lines, planes or other elementary geometric primitive. Nevertheless, sparse representation of the scene provides an incomplete information about the scene. This thesis goes from sparse NRSfM to dense NRSfM for a single object, and then slowly lifts the intuition to realise dense 3D reconstruction of the entire dynamic scene as a global as rigid as possible deformation problem. The core of this work goes beyond the traditional approach to deal with deformation. It shows that relative scales for multiple deforming objects can be recovered under some mild assumption about the scene. The work proposes a new approach for dense detailed 3D reconstruction of a complex dynamic scene from two perspective frames. Since the method does not need any depth information nor it assumes a template prior, or per-object segmentation, or knowledge about the rigidity of the dynamic scene, it is applicable to a wide range of scenarios including YouTube Videos. Lastly, this thesis provides a new way to perceive the depth of a dynamic scene which essentially trivialises the notion of motion estimation as a compulsory step to solve this problem. Conventional geometric methods to address depth estimation requires a reliable estimate of motion parameters for each moving object, which is difficult to obtain and validate. In contrast, this thesis introduces a new motion-free approach to estimate the dense depth map of a complex dynamic scene for successive/multiple frames. The work show that given per-pixel optical flow correspondences between two consecutive frames and the sparse depth prior for the reference frame, we can recover the dense depth map for the successive frames without solving for motion parameters. By assigning the locally rigid structure to the piece-wise planar approximation of a dynamic scene which transforms as rigid as possible over frames, we can bypass the motion estimation step. Experiments results and MATLAB codes on relevant examples are provided to validate the motion-free idea

    SDP‐based approach to monocular reconstruction of inextensible surfaces

    No full text
    This study addresses the problem of monocular reconstruction of surfaces that deform isometrically, using points tracked in a single image. To tackle this problem, a flat three‐dimensional (3D) shape of the surface and its image are used as template. Such deformations are characterised by certain geometric constraints and to reconstruct the surfaces, these constraints have to be properly exploited. Therefore, the authors propose an algebraic formula that aims at the joint expression of the geometric constraints, namely those based on the differential properties and also those based on the upper‐bound model. This expression is, in fact, a unique formulation that results from integrating these two types of constraints, and leads to the intended reconstructions, even when the surface is not strictly isometric. The template shape is used to set the parameters of the expression, which is then optimised (along with the projection equations) by means of a semi‐definite programming (SDP) problem. This optimisation enables the estimation of 3D positions of the points on the surface. However, and for implementation purposes, this optimisation is applied separately to patches which, together, make up the whole surface. The experimental results show that the proposed approach improves the results from other methods in terms of accuracy
    corecore