8 research outputs found

    Force-based representation for non-rigid shape and elastic model estimation

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.IEEE This paper addresses the problem of simultaneously recovering 3D shape, pose and the elastic model of a deformable object from only 2D point tracks in a monocular video. This is a severely under-constrained problem that has been typically addressed by enforcing the shape or the point trajectories to lie on low-rank dimensional spaces. We show that formulating the problem in terms of a low-rank force space that induces the deformation and introducing the elastic model as an additional unknown, allows for a better physical interpretation of the resulting priors and a more accurate representation of the actual object's behavior. In order to simultaneously estimate force, pose, and the elastic model of the object we use an expectation maximization strategy, where each of these parameters are successively learned by partial M-steps. Once the elastic model is learned, it can be transfered to similar objects to code its 3D deformation. Moreover, our approach can robustly deal with missing data, and encode both rigid and non-rigid points under the same formalism. We thoroughly validate the approach on Mocap and real sequences, showing more accurate 3D reconstructions than state-of-the-art, and additionally providing an estimate of the full elastic model with no a priori information.Peer ReviewedPostprint (author's final draft

    A Constrained Latent Variable Model

    Get PDF
    Latent variable models provide valuable compact representations for learning and inference in many computer vision tasks. However, most existing models cannot directly encode prior knowledge about the specific problem at hand. In this paper, we introduce a constrained latent variable model whose generated output inherently accounts for such knowledge. To this end, we propose an approach that explicitly imposes equality and inequality constraints on the model's output during learning, thus avoiding the computational burden of having to account for these constraints at inference. Our learning mechanism can exploit non-linear kernels, while only involving sequential closed-form updates of the model parameters. We demonstrate the effectiveness of our constrained latent variable model on the problem of non-rigid 3D reconstruction from monocular images, and show that it yields qualitative and quantitative improvements over several baselines

    Monocular 3D Reconstruction of Locally Textured Surfaces

    Get PDF
    Most recent approaches to monocular non-rigid 3D shape recovery rely on exploiting point correspondences and work best when the whole surface is well-textured. The alternative is to rely either on contours or shading information, which has only been demonstrated in very restrictive settings. Here, we propose a novel approach to monocular deformable shape recovery that can operate under complex lighting and handle partially textured surfaces. At the heart of our algorithm are a learned mapping from intensity patterns to the shape of local surface patches and a principled approach to piecing together the resulting local shape estimates. We validate our approach quantitatively and qualitatively using both synthetic and real data

    Sensing Highly Non-Rigid Objects with RGBD Sensors for Robotic Systems

    Get PDF
    The goal of this research is to enable a robotic system to manipulate clothing and other highly non-rigid objects using an RGBD sensor. The focus of this thesis is to define and test various algorithms / models that are used to solve parts of the laundry process (i.e. handling, classifying, sorting, unfolding, and folding). First, a system is presented for automatically extracting and classifying items in a pile of laundry. Using only visual sensors, the robot identifies and extracts items sequentially from the pile. When an item is removed and isolated, a model is captured of the shape and appearance of the object, which is then compared against a dataset of known items. The contributions of this part of the laundry process are a novel method for extracting articles of clothing from a pile of laundry, a novel method of classifying clothing using interactive perception, and a multi-layer approach termed L-M-H, more specifically L-C-S-H for clothing classification. This thesis describes two different approaches to classify clothing into categories. The first approach relies upon silhouettes, edges, and other low-level image measurements of the articles of clothing. Experiments from the first approach demonstrate the ability of the system to efficiently classify and label into one of six categories (pants, shorts, short-sleeve shirt, long-sleeve shirt, socks, or underwear). These results show that, on average, classification rates using robot interaction are 59% higher than those that do not use interaction. The second approach relies upon color, texture, shape, and edge information from 2D and 3D data within a local and global perspective. The multi-layer approach compartmentalizes the problem into a high (H) layer, multiple mid-level (characteristics(C), selection masks(S)) layers, and a low (L) layer. This approach produces \u27local\u27 solutions to solve the global classification problem. Experiments demonstrate the ability of the system to efficiently classify each article of clothing into one of seven categories (pants, shorts, shirts, socks, dresses, cloths, or jackets). The results presented in this paper show that, on average, the classification rates improve by +27.47% for three categories, +17.90% for four categories, and +10.35% for seven categories over the baseline system, using support vector machines. Second, an algorithm is presented for automatically unfolding a piece of clothing. A piece of cloth is pulled in different directions at various points of the cloth in order to flatten the cloth. The features of the cloth are extracted and calculated to determine a valid location and orientation in which to interact with it. The features include the peak region, corner locations, and continuity / discontinuity of the cloth. In this thesis, a two-stage algorithm is presented, introducing a novel solution to the unfolding / flattening problem using interactive perception. Simulations using 3D simulation software, and experiments with robot hardware demonstrate the ability of the algorithm to flatten pieces of laundry using different starting configurations. These results show that, at most, the algorithm flattens out a piece of cloth from 11.1% to 95.6% of the canonical configuration. Third, an energy minimization algorithm is presented that is designed to estimate the configuration of a deformable object. This approach utilizes an RGBD image to calculate feature correspondence (using SURF features), depth values, and boundary locations. Input from a Kinect sensor is used to segment the deformable surface from the background using an alpha-beta swap algorithm. Using this segmentation, the system creates an initial mesh model without prior information of the surface geometry, and it reinitializes the configuration of the mesh model after a loss of input data. This approach is able to handle in-plane rotation, out-of-plane rotation, and varying changes in translation and scale. Results display the proposed algorithm over a dataset consisting of seven shirts, two pairs of shorts, two posters, and a pair of pants. The current approach is compared using a simulated shirt model in order to calculate the mean square error of the distance from the vertices on the mesh model to the ground truth, provided by the simulation model

    From light rays to 3D models

    Get PDF

    Resolving Ambiguities in Monocular 3D Reconstruction of Deformable Surfaces

    Get PDF
    In this thesis, we focus on the problem of recovering 3D shapes of deformable surfaces from a single camera. This problem is known to be ill-posed as for a given 2D input image there exist many 3D shapes that give visually identical projections. We present three methods which make headway towards resolving these ambiguities. We believe that our work represents a significant step towards making surface reconstruction methods of practical use. First, we propose a surface reconstruction method that overcomes the limitations of the state-of-the-art template-based and non-rigid structure from motion methods. We neither track points over many frames, nor require a sophisticated deformation model, or depend on a reference image. In our method, we establish correspondences between pairs of frames in which the shape is different and unknown. We then estimate homographies between corresponding local planar patches in both images. These yield approximate 3D reconstructions of points within each patch up to a scale factor. Since we consider overlapping patches, we can enforce them to be consistent over the whole surface. Finally, a local deformation model is used to fit a triangulated mesh to the 3D point cloud, which makes the reconstruction robust to both noise and outliers in the image data. Second, we propose a novel approach to recovering the 3D shape of a deformable surface from a monocular input by taking advantage of shading information in more generic contexts than conventional Shape-from-Shading (SfS) methods. This includes surfaces that may be fully or partially textured and lit by arbitrarily many light sources. To this end, given a lighting model, we learn the relationship between a shading pattern and the corresponding local surface shape. At run time, we first use this knowledge to recover the shape of surface patches and then enforce spatial consistency between the patches to produce a global 3D shape. Instead of treating texture as noise as in many SfS approaches, we exploit it as an additional source of information. We validate our approach quantitatively and qualitatively using both synthetic and real data. Third, we introduce a constrained latent variable model that inherently accounts for geometric constraints such as inextensibility defined on the mesh model. To this end, we learn a non-linear mapping from the latent space to the output space, which corresponds to vertex positions of a mesh model, such that the generated outputs comply with equality and inequality constraints expressed in terms of the problem variables. Since its output is encouraged to satisfy such constraints inherently, using our model removes the need for computationally expensive methods that enforce these constraints at run time. In addition, our approach is completely generic and could be used in many other different contexts as well, such as image classification to impose separation of the classes, and articulated tracking to constrain the space of possible poses

    Non-rigid structure from motion using quadratic deformation models

    No full text
    In this paper we present a new approach to the modelling of non-rigid 3D surfaces from the observation of 2D motion in images captured by an orthographic camera. Our aim is to characterize strong variations of the shape due, for instance, to bending motions. Such motions are hard to describe with previously used deformation models, such as the linear basis shapes model, which would tend to overestimate the dimensionality of the deformable data. Our approach uses a quadratic deformation model which is able to represent non-linear non-rigid motions such as bending, stretching, shearing and twisting. The model is bilinear and thus fits easily into previous schemes for Non-Rigid Structure from Motion (NRSfM). We formulate the NRSfM problem using a non-linear optimization scheme to minimize image reprojection error and recover the camera parameters, the 3D shape at rest and the quadratic deformation transformations. Our experiments with synthetic and real data show examples in which methods based on the linear basis shape model perform poorly or do not converge and instead the quadratic model is able to achieve accurate 3D reconstructions

    Local Deformation Modelling for Non-Rigid Structure from Motion

    Get PDF
    PhDReconstructing the 3D geometry of scenes based on monocular image sequences is a long-standing problem in computer vision. Structure from motion (SfM) aims at a data-driven approach without requiring a priori models of the scene. When the scene is rigid, SfM is a well understood problem with solutions widely used in industry. However, if the scene is non-rigid, monocular reconstruction without additional information is an ill-posed problem and no satisfactory solution has yet been found. Current non-rigid SfM (NRSfM) methods typically aim at modelling deformable motion globally. Additionally, most of these methods focus on cases where deformable motion is seen as small variations from a mean shape. In turn, these methods fail at reconstructing highly deformable objects such as a flag waving in the wind. Additionally, reconstructions typically consist of low detail, sparse point-cloud representation of objects. In this thesis we aim at reconstructing highly deformable surfaces by modelling them locally. In line with a recent trend in NRSfM, we propose a piecewise approach which reconstructs local overlapping regions independently. These reconstructions are merged into a global object by imposing 3D consistency of the overlapping regions. We propose our own local model – the Quadratic Deformation model – and show how patch division and reconstruction can be formulated in a principled approach by alternating at minimizing a single geometric cost – the image re-projection error of the reconstruction. Moreover, we extend our approach to dense NRSfM, where reconstructions are preformed at the pixel level, improving the detail of state of the art reconstructions. Finally we show how our principled approach can be used to perform simultaneous segmentation and reconstruction of articulated motion, recovering meaningful segments which provide a coarse 3D skeleton of the object.Fundacao para a Ciencia e a Tecnologia (FCT) under Doctoral Grant SFRH/BD/70312/2010; European Research Council under ERC Starting Grant agreement 204871-HUMANI
    corecore