154 research outputs found
Deformable motion 3D reconstruction by union of regularized subspaces
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents an approach to jointly retrieve camera pose, time-varying 3D shape, and automatic clustering based on motion primitives, from incomplete 2D trajectories in a monocular video. We introduce the concept of order-varying temporal regularization in order to exploit video data, that can be indistinctly applied to the 3D shape evolution as well as to the similarities between images. This results in a union of regularized subspaces which effectively encodes the 3D shape deformation. All parameters are learned via augmented Lagrange multipliers, in a unified and unsupervised manner that does not assume any training data at all. Experimental validation is reported on human motion from sparse to dense shapes, providing more robust and accurate solutions than state-of-the-art approaches in terms of 3D reconstruction, while also obtaining motion grouping results.Peer ReviewedPostprint (author's final draft
3D Shape Estimation from 2D Landmarks: A Convex Relaxation Approach
We investigate the problem of estimating the 3D shape of an object, given a
set of 2D landmarks in a single image. To alleviate the reconstruction
ambiguity, a widely-used approach is to confine the unknown 3D shape within a
shape space built upon existing shapes. While this approach has proven to be
successful in various applications, a challenging issue remains, i.e., the
joint estimation of shape parameters and camera-pose parameters requires to
solve a nonconvex optimization problem. The existing methods often adopt an
alternating minimization scheme to locally update the parameters, and
consequently the solution is sensitive to initialization. In this paper, we
propose a convex formulation to address this problem and develop an efficient
algorithm to solve the proposed convex program. We demonstrate the exact
recovery property of the proposed method, its merits compared to alternative
methods, and the applicability in human pose and car shape estimation.Comment: In Proceedings of CVPR 201
MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views
We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from
Motion (NRSfM). MHR-Net aims to find a set of reasonable reconstructions for a
2D view, and it also selects the most likely reconstruction from the set. To
deal with the challenging unsupervised generation of non-rigid shapes, we
develop a new Deterministic Basis and Stochastic Deformation scheme in MHR-Net.
The non-rigid shape is first expressed as the sum of a coarse shape basis and a
flexible shape deformation, then multiple hypotheses are generated with
uncertainty modeling of the deformation part. MHR-Net is optimized with
reprojection loss on the basis and the best hypothesis. Furthermore, we design
a new Procrustean Residual Loss, which reduces the rigid rotations between
similar shapes and further improves the performance. Experiments show that
MHR-Net achieves state-of-the-art reconstruction accuracy on Human3.6M, SURREAL
and 300-VW datasets.Comment: Accepted to ECCV 202
Spline human motion recovery
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Simultaneous camera pose, 4D reconstruction of an object and deformation clustering from incomplete 2D point tracks in a video is a challenging problem. To solve it, in this work we introduce a union of piecewise subspaces to encode the 4D shape, where two modalities based on B-splines and Catmull-Rom curves are considered. We demonstrate that formulating the problem in terms of B-spline or Catmull-Rom functions, allows for a better physical interpretation of the resulting priors while C1 and C2 continuities are automatically imposed without needing any additional constraint. An optimization framework is proposed to sort out the problem in a unified, accurate, unsupervised and efficient manner. We extensively validate our claims on a wide range of human motions, including articulated and continuous deformations as well as those cases with noisy and missing measurements where our approach provides competing joint solutions.Peer ReviewedPostprint (author's final draft
Piecewise BĂ©zier space: recovering 3D dynamic motion from video
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we address the problem of jointly retrieving a 3D dynamic shape, camera motion, and deformation grouping from partial 2D point trajectories in a monocular video. To this end, we introduce a union of piecewise BĂ©zier subspaces with enforcing continuities to model 3D motion. We show that formulating the problem in terms of piecewise curves, allows for a better physical interpretation of the resulting priors and a more accurate representation of the motion. An energy-based formulation is presented to solve the problem in an unsupervised, unified, accurate and efficient manner, by means of the use of augmented Lagrange multipliers. We thoroughly validate the approach on a wide variety of human video sequences, including those cases with noisy and missing observations, and providing more accurate joint estimations than state-of-the-art approaches.This work has been partially supported by the Spanish Ministry of Science and Innovation under project HuMoUR TIN2017-90086-R, by the ERA-Net Chistera project IPALM PCI2019-103386, and the MarĂa de Maeztu Seal of Excellence to IRI MDM-2016-0656Peer ReviewedPostprint (author's final draft
Unsupervised 3D reconstruction and grouping of rigid and non-rigid categories
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we present an approach to jointly recover camera pose, 3D shape, and object and deformation type grouping, from incomplete 2D annotations in a multi-instance collection of RGB images. Our approach is able to handle indistinctly both rigid and non-rigid categories. This advances existing work, which only addresses the problem for one single object or, they assume the groups to be known a priori when multiple instances are handled. In order to address this broader version of the problem, we encode object deformation by means of multiple unions of subspaces, that is able to span from small rigid motion to complex deformations. The model parameters are learned via Augmented Lagrange Multipliers, in a completely unsupervised manner that does not require any training data at all. Extensive experimental evaluation is provided in a wide variety of synthetic and real scenarios, including rigid and non-rigid categories with small and large deformations. We obtain state-of-the-art solutions in terms of 3D reconstruction accuracy, while also providing grouping results that allow splitting the input images into object instances and their associated type of deformation.Peer ReviewedPostprint (author's final draft
BLiRF: Bandlimited Radiance Fields for Dynamic Scene Modeling
Reasoning the 3D structure of a non-rigid dynamic scene from a single moving
camera is an under-constrained problem. Inspired by the remarkable progress of
neural radiance fields (NeRFs) in photo-realistic novel view synthesis of
static scenes, extensions have been proposed for dynamic settings. These
methods heavily rely on neural priors in order to regularize the problem. In
this work, we take a step back and reinvestigate how current implementations
may entail deleterious effects, including limited expressiveness, entanglement
of light and density fields, and sub-optimal motion localization. As a remedy,
we advocate for a bridge between classic non-rigid-structure-from-motion
(\nrsfm) and NeRF, enabling the well-studied priors of the former to constrain
the latter. To this end, we propose a framework that factorizes time and space
by formulating a scene as a composition of bandlimited, high-dimensional
signals. We demonstrate compelling results across complex dynamic scenes that
involve changes in lighting, texture and long-range dynamics
Matching and recovering 3D people from multiple views
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper introduces an approach to simultaneously match and recover 3D people from multiple calibrated cameras. To this end, we present an affinity measure between 2D detections across different views that enforces an uncertainty geometric consistency. This similarity is then exploited by a novel multi-view matching algorithm to cluster the detections, being robust against partial observations as well as bad detections and without assuming any prior about the number of people in the scene. After that, the multi-view correspondences are used in order to efficiently infer the 3D pose of each body by means of a 3D pictorial structure model in combination with physico-geometric constraints. Our algorithm is thoroughly evaluated on challenging scenarios where several human bodies are performing different activities which involve complex motions, producing large occlusions in some views and noisy observations. We outperform state-of-the-art results in terms of matching and 3D reconstruction.Peer ReviewedPostprint (author's final draft
- …