Search CORE

13 research outputs found

A bayesian approach to simultaneously recover camera pose and non-rigid shape from monocular images

Author: Moreno-Noguer Francesc
Porta Pleite Josep Maria
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

© . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/In this paper we bring the tools of the Simultaneous Localization and Map Building (SLAM) problem from a rigid to a deformable domain and use them to simultaneously recover the 3D shape of non-rigid surfaces and the sequence of poses of a moving camera. Under the assumption that the surface shape may be represented as a weighted sum of deformation modes, we show that the problem of estimating the modal weights along with the camera poses, can be probabilistically formulated as a maximum a posteriori estimate and solved using an iterative least squares optimization. In addition, the probabilistic formulation we propose is very general and allows introducing different constraints without requiring any extra complexity. As a proof of concept, we show that local inextensibility constraints that prevent the surface from stretching can be easily integrated. An extensive evaluation on synthetic and real data, demonstrates that our method has several advantages over current non-rigid shape from motion approaches. In particular, we show that our solution is robust to large amounts of noise and outliers and that it does not need to track points over the whole sequence nor to use an initialization close from the ground truth.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Incremental Non-Rigid Structure-from-Motion with Unknown Focal Length

Author: A Bartoli
A Chhatkuli
A Chhatkuli
C Russell
D Nistér
D Nistér
EW Dijkstra
H Longuet-Higgins
I Akhter
J Fayad
L Torresani
M Innmann
M Perriollat
M Salzmann
N Sundaram
PF Gotardo
RI Hartley
S Vicente
TD Ngo
Publication venue
Publication date: 01/01/2018
Field of study

The perspective camera and the isometric surface prior have recently gathered increased attention for Non-Rigid Structure-from-Motion (NRSfM). Despite the recent progress, several challenges remain, particularly the computational complexity and the unknown camera focal length. In this paper we present a method for incremental Non-Rigid Structure-from-Motion (NRSfM) with the perspective camera model and the isometric surface prior with unknown focal length. In the template-based case, we provide a method to estimate four parameters of the camera intrinsics. For the template-less scenario of NRSfM, we propose a method to upgrade reconstructions obtained for one focal length to another based on local rigidity and the so-called Maximum Depth Heuristics (MDH). On its basis we propose a method to simultaneously recover the focal length and the non-rigid shapes. We further solve the problem of incorporating a large number of points and adding more views in MDH-based NRSfM and efficiently solve them with Second-Order Cone Programming (SOCP). This does not require any shape initialization and produces results orders of times faster than many methods. We provide evaluations on standard sequences with ground-truth and qualitative reconstructions on challenging YouTube videos. These evaluations show that our method performs better in both speed and accuracy than the state of the art.Comment: ECCV 201

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Blending Learning and Inference in Structured Prediction

Author: Hazan Tamir
McAllester David
Schwing Alexander
Urtasun Raquel
Publication venue
Publication date: 30/08/2013
Field of study

In this paper we derive an efficient algorithm to learn the parameters of structured predictors in general graphical models. This algorithm blends the learning and inference tasks, which results in a significant speedup over traditional approaches, such as conditional random fields and structured support vector machines. For this purpose we utilize the structures of the predictors to describe a low dimensional structured prediction task which encourages local consistencies within the different structures while learning the parameters of the model. Convexity of the learning task provides the means to enforce the consistencies between the different parts. The inference-learning blending algorithm that we propose is guaranteed to converge to the optimum of the low dimensional primal and dual programs. Unlike many of the existing approaches, the inference-learning blending allows us to learn efficiently high-order graphical models, over regions of any size, and very large number of parameters. We demonstrate the effectiveness of our approach, while presenting state-of-the-art results in stereo estimation, semantic segmentation, shape reconstruction, and indoor scene understanding

arXiv.org e-Print Archive

CiteSeerX

A Benchmark and Evaluation of Non-Rigid Structure from Motion

Author: Aanæs Henrik
Del Bue Alessio
Doest Mads Emil Brix
Jensen Sebastian Hoppe Nesgaard
Publication venue
Publication date: 26/04/2018
Field of study

Non-Rigid structure from motion (NRSfM), is a long standing and central problem in computer vision, allowing us to obtain 3D information from multiple images when the scene is dynamic. A main issue regarding the further development of this important computer vision topic, is the lack of high quality data sets. We here address this issue by presenting of data set compiled for this purpose, which is made publicly available, and considerably larger than previous state of the art. To validate the applicability of this data set, and provide and investigation into the state of the art of NRSfM, including potential directions forward, we here present a benchmark and a scrupulous evaluation using this data set. This benchmark evaluates 16 different methods with available code, which we argue reasonably spans the state of the art in NRSfM. We also hope, that the presented and public data set and evaluation, will provide benchmark tools for further development in this field

arXiv.org e-Print Archive

Online Research Database In Technology

Linear Local Models for Monocular Reconstruction of Deformable Surfaces

Author: Fua Pascal
Salzmann Mathieu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/05/2011
Field of study

Recovering the 3D shape of a nonrigid surface from a single viewpoint is known to be both ambiguous and challenging. Resolving the ambiguities typically requires prior knowledge about the most likely deformations that the surface may undergo. It often takes the form of a global deformation model that can be learned from training data. While effective, this approach suffers from the fact that a new model must be learned for each new surface, which means acquiring new training data and may be impractical. In this paper, we replace the global models by linear local ones for surface patches, which can be assembled to represent arbitrary surface shapes as long as they are made of the same material. Not only do they eliminate the need to retrain the model for different surface shapes, they also let us formulate 3D shape reconstruction from correspondences as either an algebraic problem that can be solved in closed-form or a convex optimization problem whose solution can be found using standard numerical packages. We present quantitative results on synthetic data, as well as qualitative ones on real images

Infoscience - École polytechnique fédérale de Lausanne

Single View Reconstruction for Human Face and Motion with Priors

Author: Wang Xianwang
Publication venue: UKnowledge
Publication date: 01/01/2010
Field of study

Single view reconstruction is fundamentally an under-constrained problem. We aim to develop new approaches to model human face and motion with model priors that restrict the space of possible solutions. First, we develop a novel approach to recover the 3D shape from a single view image under challenging conditions, such as large variations in illumination and pose. The problem is addressed by employing the techniques of non-linear manifold embedding and alignment. Specifically, the local image models for each patch of facial images and the local surface models for each patch of 3D shape are learned using a non-linear dimensionality reduction technique, and the correspondences between these local models are then learned by a manifold alignment method. Local models successfully remove the dependency of large training databases for human face modeling. By combining the local shapes, the global shape of a face can be reconstructed directly from a single linear system of equations via least square. Unfortunately, this learning-based approach cannot be successfully applied to the problem of human motion modeling due to the internal and external variations in single view video-based marker-less motion capture. Therefore, we introduce a new model-based approach for capturing human motion using a stream of depth images from a single depth sensor. While a depth sensor provides metric 3D information, using a single sensor, instead of a camera array, results in a view-dependent and incomplete measurement of object motion. We develop a novel two-stage template fitting algorithm that is invariant to subject size and view-point variations, and robust to occlusions. Starting from a known pose, our algorithm first estimates a body configuration through temporal registration, which is used to search the template motion database for a best match. The best match body configuration as well as its corresponding surface mesh model are deformed to fit the input depth map, filling in the part that is occluded from the input and compensating for differences in pose and body-size between the input image and the template. Our approach does not require any makers, user-interaction, or appearance-based tracking. Experiments show that our approaches can achieve good modeling results for human face and motion, and are capable of dealing with variety of challenges in single view reconstruction, e.g., occlusion

University of Kentucky

Generalizations of the projective reconstruction theorem

Author: Nasihatkon Behrooz
Publication venue
Publication date: 21/11/2018
Field of study

We present generalizations of the classic theorem of projective reconstruction as a tool for the design and analysis of the projective reconstruction algorithms. Our main focus is algorithms such as bundle adjustment and factorization-based techniques, which try to solve the projective equations directly for the structure points and projection matrices, rather than the so called tensor-based approaches. First, we consider the classic case of 3D to 2D projections. Our new theorem shows that projective reconstruction is possible under a much weaker restriction than requiring, a priori, that all estimated projective depths are nonzero. By completely specifying possible forms of wrong configurations when some of the projective depths are allowed to be zero, the theory enables us to present a class of depth constraints under which any reconstruction of cameras and points projecting into given image points is projectively equivalent to the true camera-point configuration. This is very useful for the design and analysis of different factorization-based algorithms. Here, we analyse several constraints used in the literature using our theory, and also demonstrate how our theory can be used for the design of new constraints with desirable properties. The next part of the thesis is devoted to projective reconstruction in arbitrary dimensions, which is important due to its applications in the analysis of dynamical scenes. The current theory, due to Hartley and Schaffalitzky, is based on the Grassmann tensor, generalizing the notions of Fundamental matrix, trifocal tensor and quardifocal tensor used for 3D to 2D projections. We extend their work by giving a theory whose point of departure is the projective equations rather than the Grassmann tensor. First, we prove the uniqueness of the Grassmann tensor corresponding to each set of image points, a question that remained open in the work of Hartley and Schaffalitzky. Then, we show that projective equivalence follows from the set of projective equations, provided that the depths are all nonzero. Finally, we classify possible wrong solutions to the projective factorization problem, where not all the projective depths are restricted to be nonzero. We test our theory experimentally by running the factorization based algorithms for rigid structure and motion in the case of 3D to 2D projections. We further run simulations for projections from higher dimensions. In each case, we present examples demonstrating how the algorithm can converge to the degenerate solutions introduced in the earlier chapters. We also show how the use of proper constraints can result in a better performance in terms of finding a correct solution

The Australian National University

Recommended from our members

Vision-based Manipulation In-the-Wild

Author: Chi Cheng
Publication venue
Publication date: 01/01/2024
Field of study

Deploying robots in real-world environments involves immense engineering complexity, potentially surpassing the resources required for autonomous vehicles due to the increased dimensionality and task variety. To maximize the chances of successful real-world deployment, finding a simple solution that minimizes engineering complexity at every level, from hardware to algorithm to operations, is crucial. In this dissertation, we consider a vision-based manipulation system that can be deployed in-the-wild when trained to imitate sufficient quantity and diversity of human demonstration data on the desired task. At deployment time, the robot is driven by a single diffusion-based visuomotor policy, with raw RGB images as input and robot end-effector pose as output. Compared to existing policy representations, Diffusion Policy handles multimodal action distributions gracefully, being scalable to high-dimensional action spaces and exhibiting impressive training stability. These properties allow a single software system to be used for multiple tasks, with data collected by multiple demonstrators, deployed to multiple robot embodiments, and without significant hyper-parameter tuning. We developed a Universal Manipulation Interface (UMI), a portable, low-cost, and information-rich data collection system to enable direct manipulation skill learning from in-the-wild human demonstrations. UMI provides an intuitive interface for non-expert users by using hand-held grippers with mounted GoPro cameras. Compared to existing robotic data collection systems, UMI enables robotic data collection without needing a robot, drastically reducing the engineering and operational complexity. Trained with UMI data, the resulting diffusion policies can be deployed across multiple robot platforms in unseen environments for novel objects and to complete dynamic, bimanual, precise, and long-horizon tasks. The Diffusion Policy and UMI combination provides a simple full-stack solution to many manipulation problems. The turn-around time of building a single-task manipulation system (such as object tossing and cloth folding) can be reduced from a few months to a few days

Columbia University Academic Commons

From light rays to 3D models

Author: Donné Simon
Publication venue: Faculty of Engineering and Architecture Ghent University
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Applied 3D Vision - An Empirical Study.

Author: Jensen Sebastian Hoppe Nesgaard
Publication venue: DTU Compute
Publication date: 01/01/2018
Field of study

Online Research Database In Technology