31 research outputs found
The Quadric Reference Surface: Theory and Applications
The conceptual component of this work is about "reference surfaces'' which are the dual of reference frames often used for shape representation purposes. The theoretical component of this work involves the question of whether one can find a unique (and simple) mapping that aligns two arbitrary perspective views of an opaque textured quadric surface in 3D, given (i) few corresponding points in the two views, or (ii) the outline conic of the surface in one view (only) and few corresponding points in the two views. The practical component of this work is concerned with applying the theoretical results as tools for the task of achieving full correspondence between views of arbitrary objects
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
We present Magic123, a two-stage coarse-to-fine approach for high-quality,
textured 3D meshes generation from a single unposed image in the wild using
both2D and 3D priors. In the first stage, we optimize a neural radiance field
to produce a coarse geometry. In the second stage, we adopt a memory-efficient
differentiable mesh representation to yield a high-resolution mesh with a
visually appealing texture. In both stages, the 3D content is learned through
reference view supervision and novel views guided by a combination of 2D and 3D
diffusion priors. We introduce a single trade-off parameter between the 2D and
3D priors to control exploration (more imaginative) and exploitation (more
precise) of the generated geometry. Additionally, we employ textual inversion
and monocular depth regularization to encourage consistent appearances across
views and to prevent degenerate solutions, respectively. Magic123 demonstrates
a significant improvement over previous image-to-3D techniques, as validated
through extensive experiments on synthetic benchmarks and diverse real-world
images. Our code, models, and generated 3D assets are available at
https://github.com/guochengqian/Magic123.Comment: webpage: https://guochengqian.github.io/project/magic123
How to turn your camera into a perfect pinhole model
Camera calibration is a first and fundamental step in various computer vision
applications. Despite being an active field of research, Zhang's method remains
widely used for camera calibration due to its implementation in popular
toolboxes. However, this method initially assumes a pinhole model with
oversimplified distortion models. In this work, we propose a novel approach
that involves a pre-processing step to remove distortions from images by means
of Gaussian processes. Our method does not need to assume any distortion model
and can be applied to severely warped images, even in the case of multiple
distortion sources, e.g., a fisheye image of a curved mirror reflection. The
Gaussian processes capture all distortions and camera imperfections, resulting
in virtual images as though taken by an ideal pinhole camera with square
pixels. Furthermore, this ideal GP-camera only needs one image of a square grid
calibration pattern. This model allows for a serious upgrade of many algorithms
and applications that are designed in a pure projective geometry setting but
with a performance that is very sensitive to nonlinear lens distortions. We
demonstrate the effectiveness of our method by simplifying Zhang's calibration
method, reducing the number of parameters and getting rid of the distortion
parameters and iterative optimization. We validate by means of synthetic data
and real world images. The contributions of this work include the construction
of a virtual ideal pinhole camera using Gaussian processes, a simplified
calibration method and lens distortion removal.Comment: 15 pages, 3 figures, conference CIAR
Homography-Based Positioning and Planar Motion Recovery
Planar motion is an important and frequently occurring situation in mobile robotics applications. This thesis concerns estimation of ego-motion and pose of a single downwards oriented camera under the assumptions of planar motion and known internal camera parameters. The so called essential matrix (or its uncalibrated counterpart, the fundamental matrix) is frequently used in computer vision applications to compute a reconstruction in 3D of the camera locations and the observed scene. However, if the observed points are expected to lie on a plane - e.g. the ground plane - this makes the determination of these matrices an ill-posed problem. Instead, methods based on homographies are better suited to this situation.One section of this thesis is concerned with the extraction of the camera pose and ego-motion from such homographies. We present both a direct SVD-based method and an iterative method, which both solve this problem. The iterative method is extended to allow simultaneous determination of the camera tilt from several homographies obeying the same planar motion model. This extension improves the robustness of the original method, and it provides consistent tilt estimates for the frames that are used for the estimation. The methods are evaluated using experiments on both real and synthetic data.Another part of the thesis deals with the problem of computing the homographies from point correspondences. By using conventional homography estimation methods for this, the resulting homography is of a too general class and is not guaranteed to be compatible with the planar motion assumption. For this reason, we enforce the planar motion model at the homography estimation stage with the help of a new homography solver using a number of polynomial constraints on the entries of the homography matrix. In addition to giving a homography of the right type, this method uses only \num{2.5} point correspondences instead of the conventional four, which is good \eg{} when used in a RANSAC framework for outlier removal
Recalage géométrique avec plusieurs prototypes
Projet SYNTIMWe describe a general-purpose method for the accurate and robust interpretation of a data set of p-dimensional points by several deformable prototypes. This method is based on the fusion of two algorithms: a Generalization of the Iterative Closest Point (GICP) to different types of deformations for registration purposes, and a fuzzy clustering algorithm (FCM). Our method always converges monotonically to the nearest local minimum of a mean-square distance metric, and experiments show that the convergence is fast during the first few iterations. Therefore, we propose a scheme for choosing the initial solution to converge to an "interesting" local minimum. The method presented is very generic and can be applied: to shapes or objects in a p-dimensional space, to many shape patterns such as polyhedra, quadrics, polynomial functions, snakes, to many possible shape deformations such as rigid displacements, similitudes, affine and homographic transforms. Consequently, our method has important applications in registration with an ideal model prior to shape inspection, i.e. to interpret 2D or 3D sensed data obtained from calibrated or uncalibrated sensors. Experimental results illustrate some capabilities of our method.Nous décrivons un cadre général pour l'interprétation précise et robuste d'un ensemble de points par plusieurs prototypes déformables. Cette méthode est basée sur l'unification de deux algorithmes : une généralisation de l'algorithme "Iterative Closest Point" (GICP) à différents types de transformations pour des tâches de recalage, et un algorithme de classification floue (FCM) pour traiter plusieurs prototypes. Notre algorithme converge de façon monotone vers le plus proche minimum local d'un fonction de coût au moindre carré, et les expériences montrent que la convergence est rapide dans les premières étapes. En conséquence, nous avons proposé un schéma pour choisir la position initiale des prototypes pour qu'ils convergent vers une solution "intéressante". La méthode présentée est très générique et peut être appliquée : à des prototypes dans un espace de dimension p quelconque, à différentes formes de prototypes comme les polyèdres, les quadriques, les fonctions polynômiales, les snakes, à de nombreux types de déformations comme les déplacements rigides, les similitudes, les affinités et les homographies. Ainsi, notre méthode a un grand nombre d'applications en recalage avec un modèle idéal connu a priori, c'est-à-dire pour interpréter des données 2D et 3D obtenues par des capteurs calibrés ou non. Des résultats expérimentaux illustrent quelqu'unes des possibilités de notre approche
Robust SLAM and motion segmentation under long-term dynamic large occlusions
Visual sensors are key to robot perception, which can not only help robot localisation but also enable robots to interact with the environment. However, in new environments, robots can fail to distinguish the static and dynamic components in the visual input. Consequently, robots are unable to track objects or localise themselves. Methods often require precise robot proprioception to compensate for camera movement and separate the static background from the visual input. However, robot proprioception, such as \ac{IMU} or wheel odometry, usually faces the problem of drift accumulation. The state-of-the-art methods demonstrate promising performance but either (1) require semantic segmentation, which is inaccessible in unknown environments, or (2) treat dynamic components as outliers -- which is unfeasible when dynamic objects occupy a large proportion of the visual input.
This research work systematically unifies camera and multi-object tracking problems in indoor environments by proposing a multi-motion tracking system; and enables robots to differentiate the static and dynamic components in the visual input with the understanding of their own movements and actions. Detailed evaluation of both simulation environments and robotic platforms suggests that the proposed method outperforms the state-of-the-art dynamic SLAM methods when the majority of the camera view is occluded by multiple unmodeled objects over a long period of time