1,223 research outputs found

    Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective

    Full text link
    This paper addresses the task of dense non-rigid structure-from-motion (NRSfM) using multiple images. State-of-the-art methods to this problem are often hurdled by scalability, expensive computations, and noisy measurements. Further, recent methods to NRSfM usually either assume a small number of sparse feature points or ignore local non-linearities of shape deformations, and thus cannot reliably model complex non-rigid deformations. To address these issues, in this paper, we propose a new approach for dense NRSfM by modeling the problem on a Grassmann manifold. Specifically, we assume the complex non-rigid deformations lie on a union of local linear subspaces both spatially and temporally. This naturally allows for a compact representation of the complex non-rigid deformation over frames. We provide experimental results on several synthetic and real benchmark datasets. The procured results clearly demonstrate that our method, apart from being scalable and more accurate than state-of-the-art methods, is also more robust to noise and generalizes to highly non-linear deformations.Comment: 10 pages, 7 figure, 4 tables. Accepted for publication in Conference on Computer Vision and Pattern Recognition (CVPR), 2018, typos fixed and acknowledgement adde

    Multi-body Non-rigid Structure-from-Motion

    Get PDF
    Conventional structure-from-motion (SFM) research is primarily concerned with the 3D reconstruction of a single, rigidly moving object seen by a static camera, or a static and rigid scene observed by a moving camera --in both cases there are only one relative rigid motion involved. Recent progress have extended SFM to the areas of {multi-body SFM} (where there are {multiple rigid} relative motions in the scene), as well as {non-rigid SFM} (where there is a single non-rigid, deformable object or scene). Along this line of thinking, there is apparently a missing gap of "multi-body non-rigid SFM", in which the task would be to jointly reconstruct and segment multiple 3D structures of the multiple, non-rigid objects or deformable scenes from images. Such a multi-body non-rigid scenario is common in reality (e.g. two persons shaking hands, multi-person social event), and how to solve it represents a natural {next-step} in SFM research. By leveraging recent results of subspace clustering, this paper proposes, for the first time, an effective framework for multi-body NRSFM, which simultaneously reconstructs and segments each 3D trajectory into their respective low-dimensional subspace. Under our formulation, 3D trajectories for each non-rigid structure can be well approximated with a sparse affine combination of other 3D trajectories from the same structure (self-expressiveness). We solve the resultant optimization with the alternating direction method of multipliers (ADMM). We demonstrate the efficacy of the proposed framework through extensive experiments on both synthetic and real data sequences. Our method clearly outperforms other alternative methods, such as first clustering the 2D feature tracks to groups and then doing non-rigid reconstruction in each group or first conducting 3D reconstruction by using single subspace assumption and then clustering the 3D trajectories into groups.Comment: 21 pages, 16 figure

    Non-Rigid Structure from Motion for Complex Motion

    Get PDF
    Recovering deformable 3D motion from temporal 2D point tracks in a monocular video is an open problem with many everyday applications throughout science and industry, or the new augmented reality. Recently, several techniques have been proposed to deal the problem called Non-Rigid Structure from Motion (NRSfM), however, they can exhibit poor reconstruction performance on complex motion. In this project, we will analyze these situations for primitive human actions such as walk, run, sit, jump, etc. on different scenarios, reviewing first the current techniques to finally present our novel method. This approach is able to model complex motion into a union of subspaces, rather than the summation occurring in standard low-rank shape methods, allowing better reconstruction accuracy. Experiments in a wide range of sequences and types of motion illustrate the benefits of this new approac

    A closed-form solution to estimate uncertainty in non-rigid structure from motion

    Full text link
    Semi-Definite Programming (SDP) with low-rank prior has been widely applied in Non-Rigid Structure from Motion (NRSfM). Based on a low-rank constraint, it avoids the inherent ambiguity of basis number selection in conventional base-shape or base-trajectory methods. Despite the efficiency in deformable shape reconstruction, it remains unclear how to assess the uncertainty of the recovered shape from the SDP process. In this paper, we present a statistical inference on the element-wise uncertainty quantification of the estimated deforming 3D shape points in the case of the exact low-rank SDP problem. A closed-form uncertainty quantification method is proposed and tested. Moreover, we extend the exact low-rank uncertainty quantification to the approximate low-rank scenario with a numerical optimal rank selection method, which enables solving practical application in SDP based NRSfM scenario. The proposed method provides an independent module to the SDP method and only requires the statistic information of the input 2D tracked points. Extensive experiments prove that the output 3D points have identical normal distribution to the 2D trackings, the proposed method and quantify the uncertainty accurately, and supports that it has desirable effects on routinely SDP low-rank based NRSfM solver.Comment: 9 pages, 2 figure

    DUST: dual union of spatio-temporal subspaces for monocular multiple object 3D reconstruction

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present an approach to reconstruct the 3D shape of multiple deforming objects from incomplete 2D trajectories acquired by a single camera. Additionally, we simultaneously provide spatial segmentation (i.e., we identify each of the objects in every frame) and temporal clustering (i.e., we split the sequence into primitive actions). This advances existing work, which only tackled the problem for one single object and non-occluded tracks. In order to handle several objects at a time from partial observations, we model point trajectories as a union of spatial and temporal subspaces, and optimize the parameters of both modalities, the non-observed point tracks and the 3D shape via augmented Lagrange multipliers. The algorithm is fully unsupervised and results in a formulation which does not need initialization. We thoroughly validate the method on challenging scenarios with several human subjects performing different activities which involve complex motions and close interaction. We show our approach achieves state-of-the-art 3D reconstruction results, while it also provides spatial and temporal segmentation.Peer ReviewedPostprint (author's final draft

    Simultaneous completion and spatiotemporal grouping of corrupted motion tracks

    Get PDF
    Given an unordered list of 2D or 3D point trajectories corrupted by noise and partial observations, in this paper we introduce a framework to simultaneously recover the incomplete motion tracks and group the points into spatially and temporally coherent clusters. This advances existing work, which only addresses partial problems and without considering a unified and unsupervised solution. We cast this problem as a matrix completion one, in which point tracks are arranged into a matrix with the missing entries set as zeros. In order to perform the double clustering, the measurement matrix is assumed to be drawn from a dual union of spatiotemporal subspaces. The bases and the dimensionality for these subspaces, the affinity matrices used to encode the temporal and spatial clusters to which each point belongs, and the non-visible tracks, are then jointly estimated via augmented Lagrange multipliers in polynomial time. A thorough evaluation on incomplete motion tracks for multiple-object typologies shows that the accuracy of the matrix we recover compares favorably to that obtained with existing low-rank matrix completion methods, specially under noisy measurements. In addition, besides recovering the incomplete tracks, the point trajectories are directly grouped into different object instances, and a number of semantically meaningful temporal primitive actions are automatically discoveredThis work has been partially supported by the Spanish State Research Agency through the MarĂ­a de Maeztu Seal of Excellence to IRI MDM-2016-0656, by the Spanish Ministry of Science and Innovation under project HuMoUR TIN2017-90086-R and the Salvador de Madariaga grant PRX19/00626, and by the ERA-net CHIST-ERA project IPALM PCI2019-103386.Peer ReviewedPostprint (published version

    Non-Rigid Structure from Motion

    Get PDF
    This thesis revisits a challenging classical problem in geometric computer vision known as "Non-Rigid Structure-from-Motion" (NRSfM). It is a well-known problem where the task is to recover the 3D shape and motion of a non-rigidly moving object from image data. A reliable solution to this problem is valuable in several industrial applications such as virtual reality, medical surgery, animation movies etc. Nevertheless, to date, there does not exist any algorithm that can solve NRSfM for all kinds of conceivable motion. As a result, additional constraints and assumptions are often employed to solve NRSfM. The task is challenging due to the inherent unconstrained nature of the problem itself as many 3D varying configurations can have similar image projections. The problem becomes even more challenging if the camera is moving along with the object. The thesis takes on a modern view to this challenging problem and proposes a few algorithms that have set a new performance benchmark to solve NRSfM. The thesis not only discusses the classical work in NRSfM but also proposes some powerful elementary modification to it. The foundation of this thesis surpass the traditional single object NRSFM and for the first time provides an effective formulation to realise multi-body NRSfM. Most techniques for NRSfM under factorisation can only handle sparse feature correspondences. These sparse features are then used to construct a scene using the organisation of points, lines, planes or other elementary geometric primitive. Nevertheless, sparse representation of the scene provides an incomplete information about the scene. This thesis goes from sparse NRSfM to dense NRSfM for a single object, and then slowly lifts the intuition to realise dense 3D reconstruction of the entire dynamic scene as a global as rigid as possible deformation problem. The core of this work goes beyond the traditional approach to deal with deformation. It shows that relative scales for multiple deforming objects can be recovered under some mild assumption about the scene. The work proposes a new approach for dense detailed 3D reconstruction of a complex dynamic scene from two perspective frames. Since the method does not need any depth information nor it assumes a template prior, or per-object segmentation, or knowledge about the rigidity of the dynamic scene, it is applicable to a wide range of scenarios including YouTube Videos. Lastly, this thesis provides a new way to perceive the depth of a dynamic scene which essentially trivialises the notion of motion estimation as a compulsory step to solve this problem. Conventional geometric methods to address depth estimation requires a reliable estimate of motion parameters for each moving object, which is difficult to obtain and validate. In contrast, this thesis introduces a new motion-free approach to estimate the dense depth map of a complex dynamic scene for successive/multiple frames. The work show that given per-pixel optical flow correspondences between two consecutive frames and the sparse depth prior for the reference frame, we can recover the dense depth map for the successive frames without solving for motion parameters. By assigning the locally rigid structure to the piece-wise planar approximation of a dynamic scene which transforms as rigid as possible over frames, we can bypass the motion estimation step. Experiments results and MATLAB codes on relevant examples are provided to validate the motion-free idea

    Structure from Articulated Motion: Accurate and Stable Monocular 3D Reconstruction without Training Data

    Full text link
    Recovery of articulated 3D structure from 2D observations is a challenging computer vision problem with many applications. Current learning-based approaches achieve state-of-the-art accuracy on public benchmarks but are restricted to specific types of objects and motions covered by the training datasets. Model-based approaches do not rely on training data but show lower accuracy on these datasets. In this paper, we introduce a model-based method called Structure from Articulated Motion (SfAM), which can recover multiple object and motion types without training on extensive data collections. At the same time, it performs on par with learning-based state-of-the-art approaches on public benchmarks and outperforms previous non-rigid structure from motion (NRSfM) methods. SfAM is built upon a general-purpose NRSfM technique while integrating a soft spatio-temporal constraint on the bone lengths. We use alternating optimization strategy to recover optimal geometry (i.e., bone proportions) together with 3D joint positions by enforcing the bone lengths consistency over a series of frames. SfAM is highly robust to noisy 2D annotations, generalizes to arbitrary objects and does not rely on training data, which is shown in extensive experiments on public benchmarks and real video sequences. We believe that it brings a new perspective on the domain of monocular 3D recovery of articulated structures, including human motion capture.Comment: 21 pages, 8 figures, 2 table

    Spline human motion recovery

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Simultaneous camera pose, 4D reconstruction of an object and deformation clustering from incomplete 2D point tracks in a video is a challenging problem. To solve it, in this work we introduce a union of piecewise subspaces to encode the 4D shape, where two modalities based on B-splines and Catmull-Rom curves are considered. We demonstrate that formulating the problem in terms of B-spline or Catmull-Rom functions, allows for a better physical interpretation of the resulting priors while C1 and C2 continuities are automatically imposed without needing any additional constraint. An optimization framework is proposed to sort out the problem in a unified, accurate, unsupervised and efficient manner. We extensively validate our claims on a wide range of human motions, including articulated and continuous deformations as well as those cases with noisy and missing measurements where our approach provides competing joint solutions.Peer ReviewedPostprint (author's final draft
    • …
    corecore