136 research outputs found

    Shape basis interpretation for monocular deformable 3D reconstruction

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper, we propose a novel interpretable shape model to encode object non-rigidity. We first use the initial frames of a monocular video to recover a rest shape, used later to compute a dissimilarity measure based on a distance matrix measurement. Spectral analysis is then applied to this matrix to obtain a reduced shape basis, that in contrast to existing approaches, can be physically interpreted. In turn, these pre-computed shape bases are used to linearly span the deformation of a wide variety of objects. We introduce the low-rank basis into a sequential approach to recover both camera motion and non-rigid shape from the monocular video, by simply optimizing the weights of the linear combination using bundle adjustment. Since the number of parameters to optimize per frame is relatively small, specially when physical priors are considered, our approach is fast and can potentially run in real time. Validation is done in a wide variety of real-world objects, undergoing both inextensible and extensible deformations. Our approach achieves remarkable robustness to artifacts such as noisy and missing measurements and shows an improved performance to competing methods.Peer ReviewedPostprint (author's final draft

    Piecewise BĂ©zier space: recovering 3D dynamic motion from video

    Get PDF
    © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we address the problem of jointly retrieving a 3D dynamic shape, camera motion, and deformation grouping from partial 2D point trajectories in a monocular video. To this end, we introduce a union of piecewise Bézier subspaces with enforcing continuities to model 3D motion. We show that formulating the problem in terms of piecewise curves, allows for a better physical interpretation of the resulting priors and a more accurate representation of the motion. An energy-based formulation is presented to solve the problem in an unsupervised, unified, accurate and efficient manner, by means of the use of augmented Lagrange multipliers. We thoroughly validate the approach on a wide variety of human video sequences, including those cases with noisy and missing observations, and providing more accurate joint estimations than state-of-the-art approaches.This work has been partially supported by the Spanish Ministry of Science and Innovation under project HuMoUR TIN2017-90086-R, by the ERA-Net Chistera project IPALM PCI2019-103386, and the María de Maeztu Seal of Excellence to IRI MDM-2016-0656Peer ReviewedPostprint (author's final draft

    Spline human motion recovery

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Simultaneous camera pose, 4D reconstruction of an object and deformation clustering from incomplete 2D point tracks in a video is a challenging problem. To solve it, in this work we introduce a union of piecewise subspaces to encode the 4D shape, where two modalities based on B-splines and Catmull-Rom curves are considered. We demonstrate that formulating the problem in terms of B-spline or Catmull-Rom functions, allows for a better physical interpretation of the resulting priors while C1 and C2 continuities are automatically imposed without needing any additional constraint. An optimization framework is proposed to sort out the problem in a unified, accurate, unsupervised and efficient manner. We extensively validate our claims on a wide range of human motions, including articulated and continuous deformations as well as those cases with noisy and missing measurements where our approach provides competing joint solutions.Peer ReviewedPostprint (author's final draft

    Unsupervised 3D reconstruction and grouping of rigid and non-rigid categories

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we present an approach to jointly recover camera pose, 3D shape, and object and deformation type grouping, from incomplete 2D annotations in a multi-instance collection of RGB images. Our approach is able to handle indistinctly both rigid and non-rigid categories. This advances existing work, which only addresses the problem for one single object or, they assume the groups to be known a priori when multiple instances are handled. In order to address this broader version of the problem, we encode object deformation by means of multiple unions of subspaces, that is able to span from small rigid motion to complex deformations. The model parameters are learned via Augmented Lagrange Multipliers, in a completely unsupervised manner that does not require any training data at all. Extensive experimental evaluation is provided in a wide variety of synthetic and real scenarios, including rigid and non-rigid categories with small and large deformations. We obtain state-of-the-art solutions in terms of 3D reconstruction accuracy, while also providing grouping results that allow splitting the input images into object instances and their associated type of deformation.Peer ReviewedPostprint (author's final draft

    Safari from visual signals: recovering volumetric 3d shapes

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we propose a convex approach for recovering a detailed 3D volumetric geometry of several objects from visual signals. To this end, we first present a minimal detailed surface energy that is optimized together with a volume constraint by considering some geometrical priors, and without requiring neither additional training data nor templates in order to constrain the solution. Our problem can be efficiently solved by means of a gradient descent, and be applied for single RGB images or monocular videos even with very small rigid motions. Temporal-aware solutions and driven by point correspondences are incorporated without assuming any 2D tracking data over time. Thanks to this formulation, both rigid and non-rigid objects can be considered. We have extensively validated our approach in a wide variety of scenarios in the wild, recovering challenging type of shapes that have not been previously attempted without assuming any training data.Peer ReviewedPostprint (author's final draft

    Vehicle pose estimation using G-Net: multi-class localization and depth estimation

    Get PDF
    In this paper we present a new network architecture, called G-Net, for 3D pose estimation on RGB images which is trained in a weakly supervised manner. We introduce a two step pipeline based on region-based Convolutional neural networks (CNNs) for feature localization, bounding box refinement based on non-maximum-suppression and depth estimation. The G-Net is able to estimate the depth from single monocular images with a self-tuned loss function. The combination of this predicted depth and the presented two-step localization allows the extraction of the 3D pose of the object. We show in experiments that our method achieves good results compared to other state-of-the-art approaches which are trained in a fully supervised manner.Peer ReviewedPostprint (author's final draft

    Image collection pop-up: 3D reconstruction and clustering of rigid and non-rigid categories

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper introduces an approach to simultaneously estimate 3D shape, camera pose, and object and type of deformation clustering, from partial 2D annotations in a multi-instance collection of images. Furthermore, we can indistinctly process rigid and non-rigid categories. This advances existing work, which only addresses the problem for one single object or, if multiple objects are considered, they are assumed to be clustered a priori. To handle this broader version of the problem, we model object deformation using a formulation based on multiple unions of subspaces, able to span from small rigid motion to complex deformations. The parameters of this model are learned via Augmented Lagrange Multipliers, in a completely unsupervised manner that does not require any training data at all. Extensive validation is provided in a wide variety of synthetic and real scenarios, including rigid and non-rigid categories with small and large deformations. In all cases our approach outperforms state-of-the-art in terms of 3D reconstruction accuracy, while also providing clustering results that allow segmenting the images into object instances and their associated type of deformation (or action the object is performing).Postprint (author's final draft

    DUST: dual union of spatio-temporal subspaces for monocular multiple object 3D reconstruction

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present an approach to reconstruct the 3D shape of multiple deforming objects from incomplete 2D trajectories acquired by a single camera. Additionally, we simultaneously provide spatial segmentation (i.e., we identify each of the objects in every frame) and temporal clustering (i.e., we split the sequence into primitive actions). This advances existing work, which only tackled the problem for one single object and non-occluded tracks. In order to handle several objects at a time from partial observations, we model point trajectories as a union of spatial and temporal subspaces, and optimize the parameters of both modalities, the non-observed point tracks and the 3D shape via augmented Lagrange multipliers. The algorithm is fully unsupervised and results in a formulation which does not need initialization. We thoroughly validate the method on challenging scenarios with several human subjects performing different activities which involve complex motions and close interaction. We show our approach achieves state-of-the-art 3D reconstruction results, while it also provides spatial and temporal segmentation.Peer ReviewedPostprint (author's final draft

    Force-based representation for non-rigid shape and elastic model estimation

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.IEEE This paper addresses the problem of simultaneously recovering 3D shape, pose and the elastic model of a deformable object from only 2D point tracks in a monocular video. This is a severely under-constrained problem that has been typically addressed by enforcing the shape or the point trajectories to lie on low-rank dimensional spaces. We show that formulating the problem in terms of a low-rank force space that induces the deformation and introducing the elastic model as an additional unknown, allows for a better physical interpretation of the resulting priors and a more accurate representation of the actual object's behavior. In order to simultaneously estimate force, pose, and the elastic model of the object we use an expectation maximization strategy, where each of these parameters are successively learned by partial M-steps. Once the elastic model is learned, it can be transfered to similar objects to code its 3D deformation. Moreover, our approach can robustly deal with missing data, and encode both rigid and non-rigid points under the same formalism. We thoroughly validate the approach on Mocap and real sequences, showing more accurate 3D reconstructions than state-of-the-art, and additionally providing an estimate of the full elastic model with no a priori information.Peer ReviewedPostprint (author's final draft

    Deformable motion 3D reconstruction by union of regularized subspaces

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents an approach to jointly retrieve camera pose, time-varying 3D shape, and automatic clustering based on motion primitives, from incomplete 2D trajectories in a monocular video. We introduce the concept of order-varying temporal regularization in order to exploit video data, that can be indistinctly applied to the 3D shape evolution as well as to the similarities between images. This results in a union of regularized subspaces which effectively encodes the 3D shape deformation. All parameters are learned via augmented Lagrange multipliers, in a unified and unsupervised manner that does not assume any training data at all. Experimental validation is reported on human motion from sparse to dense shapes, providing more robust and accurate solutions than state-of-the-art approaches in terms of 3D reconstruction, while also obtaining motion grouping results.Peer ReviewedPostprint (author's final draft
    • …
    corecore