44 research outputs found
Towards Reliable and Accurate Global Structure-from-Motion
Reconstruction of objects or scenes from sparse point detections across multiple views is one of the most tackled problems in computer vision. Given the coordinates of 2D points tracked in multiple images, the problem consists of estimating the corresponding 3D points and cameras\u27 calibrations (intrinsic and pose), and can be solved by minimizing reprojection errors using bundle adjustment. However, given bundle adjustment\u27s nonlinear objective function and iterative nature, a good starting guess is required to converge to global minima. Global and Incremental Structure-from-Motion methods appear as ways to provide good initializations to bundle adjustment, each with different properties. While Global Structure-from-Motion has been shown to result in more accurate reconstructions compared to Incremental Structure-from-Motion, the latter has better scalability by starting with a small subset of images and sequentially adding new views, allowing reconstruction of sequences with millions of images. Additionally, both Global and Incremental Structure-from-Motion methods rely on accurate models of the scene or object, and under noisy conditions or high model uncertainty might result in poor initializations for bundle adjustment. Recently pOSE, a class of matrix factorization methods, has been proposed as an alternative to conventional Global SfM methods. These methods use VarPro - a second-order optimization method - to minimize a linear combination of an approximation of reprojection errors and a regularization term based on an affine camera model, and have been shown to converge to global minima with a high rate even when starting from random camera calibration estimations.This thesis aims at improving the reliability and accuracy of global SfM through different approaches. First, by studying conditions for global optimality of point set registration, a point cloud averaging method that can be used when (incomplete) 3D point clouds of the same scene in different coordinate systems are available. Second, by extending pOSE methods to different Structure-from-Motion problem instances, such as Non-Rigid SfM or radial distortion invariant SfM. Third and finally, by replacing the regularization term of pOSE methods with an exponential regularization on the projective depth of the 3D point estimations, resulting in a loss that achieves reconstructions with accuracy close to bundle adjustment
Deformable and articulated 3D reconstruction from monocular video sequences
PhDThis thesis addresses the problem of deformable and articulated structure from motion from
monocular uncalibrated video sequences. Structure from motion is defined as the problem of
recovering information about the 3D structure of scenes imaged by a camera in a video sequence.
Our study aims at the challenging problem of non-rigid shapes (e.g. a beating heart or a smiling
face). Non-rigid structures appear constantly in our everyday life, think of a bicep curling, a
torso twisting or a smiling face. Our research seeks a general method to perform 3D shape
recovery purely from data, without having to rely on a pre-computed model or training data.
Open problems in the field are the difficulty of the non-linear estimation, the lack of a real-time
system, large amounts of missing data in real-world video sequences, measurement noise and
strong deformations. Solving these problems would take us far beyond the current state of the
art in non-rigid structure from motion. This dissertation presents our contributions in the field
of non-rigid structure from motion, detailing a novel algorithm that enforces the exact metric
structure of the problem at each step of the minimisation by projecting the motion matrices
onto the correct deformable or articulated metric motion manifolds respectively. An important
advantage of this new algorithm is its ability to handle missing data which becomes crucial
when dealing with real video sequences. We present a generic bilinear estimation framework,
which improves convergence and makes use of the manifold constraints. Finally, we demonstrate
a sequential, frame-by-frame estimation algorithm, which provides a 3D model and camera
parameters for each video frame, while simultaneously building a model of object deformation
Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective
This paper addresses the task of dense non-rigid structure-from-motion
(NRSfM) using multiple images. State-of-the-art methods to this problem are
often hurdled by scalability, expensive computations, and noisy measurements.
Further, recent methods to NRSfM usually either assume a small number of sparse
feature points or ignore local non-linearities of shape deformations, and thus
cannot reliably model complex non-rigid deformations. To address these issues,
in this paper, we propose a new approach for dense NRSfM by modeling the
problem on a Grassmann manifold. Specifically, we assume the complex non-rigid
deformations lie on a union of local linear subspaces both spatially and
temporally. This naturally allows for a compact representation of the complex
non-rigid deformation over frames. We provide experimental results on several
synthetic and real benchmark datasets. The procured results clearly demonstrate
that our method, apart from being scalable and more accurate than
state-of-the-art methods, is also more robust to noise and generalizes to
highly non-linear deformations.Comment: 10 pages, 7 figure, 4 tables. Accepted for publication in Conference
on Computer Vision and Pattern Recognition (CVPR), 2018, typos fixed and
acknowledgement adde
Statistical Models and Optimization Algorithms for High-Dimensional Computer Vision Problems
Data-driven and computational approaches are showing significant promise in solving several challenging problems in various fields such as bioinformatics, finance and many branches of engineering. In this dissertation, we explore the potential of these approaches, specifically statistical data models and optimization algorithms, for solving several challenging problems in computer vision. In doing so, we contribute to the literatures of both statistical data models and computer vision. In the context of statistical data models, we propose principled approaches for solving robust regression problems, both linear and kernel, and missing data matrix factorization problem. In computer vision, we propose statistically optimal and efficient algorithms for solving the remote face recognition and structure from motion (SfM) problems.
The goal of robust regression is to estimate the functional relation between two variables from a given data set which might be contaminated with outliers. Under the reasonable assumption that there are fewer outliers than inliers in a data set, we formulate the robust linear regression problem as a sparse learning problem, which can be solved using efficient polynomial-time algorithms. We also provide sufficient conditions under which the proposed algorithms correctly solve the robust regression problem. We then extend our robust formulation to the case of kernel regression, specifically to propose a robust version for relevance vector machine (RVM) regression.
Matrix factorization is used for finding a low-dimensional representation for data embedded in a high-dimensional space. Singular value decomposition is the standard algorithm for solving this problem. However, when the matrix has many missing elements this is a hard problem to solve. We formulate the missing data matrix factorization problem as a low-rank semidefinite programming problem (essentially a rank constrained SDP), which allows us to find accurate and efficient solutions for large-scale factorization problems.
Face recognition from remotely acquired images is a challenging problem because of variations due to blur and illumination. Using the convolution model for blur, we show that the set of all images obtained by blurring a given image forms a convex set. We then use convex optimization techniques to find the distances between a given blurred (probe) image and the gallery images to find the best match. Further, using a low-dimensional linear subspace model for illumination variations, we extend our theory in a similar fashion to recognize blurred and poorly illuminated faces.
Bundle adjustment is the final optimization step of the SfM problem where the goal is to obtain the 3-D structure of the observed scene and the camera parameters from multiple images of the scene. The traditional bundle adjustment algorithm, based on minimizing the l_2 norm of the image re-projection error, has cubic complexity in the number of unknowns. We propose an algorithm, based on minimizing the l_infinity norm of the re-projection error, that has quadratic complexity in the number of unknowns. This is achieved by reducing the large-scale optimization problem into many small scale sub-problems each of which can be solved using second-order cone programming
Recommended from our members
Widening the basin of convergence for the bundle adjustment type of problems in computer vision
Bundle adjustment is the process of simultaneously optimizing camera poses and 3D structure
given image point tracks. In structure-from-motion, it is typically used as the final refinement
step due to the nonlinearity of the problem, meaning that it requires sufficiently good
initialization. Contrary to this belief, recent literature showed that useful solutions can
be obtained even from arbitrary initialization for fixed-rank matrix factorization problems,
including bundle adjustment with affine cameras. This property of wide convergence basin of
high quality optima is desirable for any nonlinear optimization algorithm since obtaining good
initial values can often be non-trivial. The aim of this thesis is to find the key factor behind the
success of these recent matrix factorization algorithms and explore the potential applicability
of the findings to bundle adjustment, which is closely related to matrix factorization.
The thesis begins by unifying a handful of matrix factorization algorithms and comparing
similarities and differences between them. The theoretical analysis shows that the set
of successful algorithms actually stems from the same root of the optimization method
called variable projection (VarPro). The investigation then extends to address why VarPro
outperforms the joint optimization technique, which is widely used in computer vision. This
algorithmic comparison of these methods yields a larger unification, leading to a conclusion
that VarPro benefits from an unequal trust region assumption between two matrix factors.
The thesis then explores ways to incorporate VarPro to bundle adjustment problems
using projective and perspective cameras. Unfortunately, the added nonlinearity causes
a substantial decrease in the convergence basin of VarPro, and therefore a bootstrapping
strategy is proposed to bypass this issue. Experimental results show that it is possible to
yield feasible metric reconstructions and pose estimations from arbitrary initialization given
relatively clean point tracks, taking one step towards initialization-free structure-from-motion.Microsoft
Toshiba Research Europ
Simulation Guidée par l’Image pour la Réalité Augmentée durant la Chirurgie Hépatique
The main objective of this thesis is to provide surgeons with tools for pre and intra-operative decision support during minimally invasive hepaticsurgery. These interventions are usually based on laparoscopic techniques or, more recently, flexible endoscopy. During such operations, the surgeon tries to remove a significant number of liver tumors while preserving the functional role of the liver. This involves defining an optimal hepatectomy, i.e. ensuring that the volume of post-operative liver is at least at 55% of the original liver and the preserving at hepatic vasculature. Although intervention planning can now be considered on the basis of preoperative patient-specific, significant movements of the liver and its deformations during surgery data make this very difficult to use planning in practice. The work proposed in this thesis aims to provide augmented reality tools to be used in intra-operative conditions in order to visualize the position of tumors and hepatic vascular networks at any time.L’objectif principal de cette thèse est de fournir aux chirurgiens des outils d’aide à la décision pré et per-opératoire lors d’interventions minimalement invasives en chirurgie hépatique. Ces interventions reposent en général sur des techniques de laparoscopie ou plus récemment d’endoscopie flexible. Lors de telles interventions, le chirurgien cherche à retirer un nombre souvent important de tumeurs hépatiques, tout en préservant le rôle fonctionnel du foie. Cela implique de définir une hépatectomie optimale, c’est à dire garantissant un volume du foie post-opératoire d’au moins 55% du foie initial et préservant au mieux la vascularisation hépatique. Bien qu’une planification de l’intervention puisse actuellement s’envisager sur la base de données pré-opératoire spécifiques au patient, les mouvements importants du foie et ses déformations lors de l’intervention rendent cette planification très difficile à exploiter en pratique. Les travaux proposés dans cette thèse visent à fournir des outils de réalité augmentée utilisables en conditions per-opératoires et permettant de visualiser à chaque instant la position des tumeurs et réseaux vasculaires hépatiques
Bilinear Factorization via Augmented Lagrange Multipliers
Abstract. This paper presents a unified approach to solve different bilinear factorization problems in Computer Vision in the presence of missing data in the measurements. The problem is formulated as a con-strained optimization problem where one of the factors is constrained to lie on a specific manifold. To achieve this, we introduce an equivalent reformulation of the bilinear factorization problem. This reformulation decouples the core bilinear aspect from the manifold specificity. We then tackle the resulting constrained optimization problem with Bilinear fac-torization via Augmented Lagrange Multipliers (BALM). The mechanics of our algorithm are such that only a projector onto the manifold con-straint is needed. That is the strength and the novelty of our approach: it can handle seamlessly different Computer Vision problems. We present experiments and results for two popular factorization problems: Non-rigid Structure from Motion and Photometric Stereo.