31 research outputs found

    The Quadric Reference Surface: Theory and Applications

    Get PDF
    The conceptual component of this work is about "reference surfaces'' which are the dual of reference frames often used for shape representation purposes. The theoretical component of this work involves the question of whether one can find a unique (and simple) mapping that aligns two arbitrary perspective views of an opaque textured quadric surface in 3D, given (i) few corresponding points in the two views, or (ii) the outline conic of the surface in one view (only) and few corresponding points in the two views. The practical component of this work is concerned with applying the theoretical results as tools for the task of achieving full correspondence between views of arbitrary objects

    Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

    Full text link
    We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images. Our code, models, and generated 3D assets are available at https://github.com/guochengqian/Magic123.Comment: webpage: https://guochengqian.github.io/project/magic123

    Projective Invariants from Multiple Images: A Direct and Linear Method

    Get PDF

    How to turn your camera into a perfect pinhole model

    Full text link
    Camera calibration is a first and fundamental step in various computer vision applications. Despite being an active field of research, Zhang's method remains widely used for camera calibration due to its implementation in popular toolboxes. However, this method initially assumes a pinhole model with oversimplified distortion models. In this work, we propose a novel approach that involves a pre-processing step to remove distortions from images by means of Gaussian processes. Our method does not need to assume any distortion model and can be applied to severely warped images, even in the case of multiple distortion sources, e.g., a fisheye image of a curved mirror reflection. The Gaussian processes capture all distortions and camera imperfections, resulting in virtual images as though taken by an ideal pinhole camera with square pixels. Furthermore, this ideal GP-camera only needs one image of a square grid calibration pattern. This model allows for a serious upgrade of many algorithms and applications that are designed in a pure projective geometry setting but with a performance that is very sensitive to nonlinear lens distortions. We demonstrate the effectiveness of our method by simplifying Zhang's calibration method, reducing the number of parameters and getting rid of the distortion parameters and iterative optimization. We validate by means of synthetic data and real world images. The contributions of this work include the construction of a virtual ideal pinhole camera using Gaussian processes, a simplified calibration method and lens distortion removal.Comment: 15 pages, 3 figures, conference CIAR

    Homography-Based Positioning and Planar Motion Recovery

    Get PDF
    Planar motion is an important and frequently occurring situation in mobile robotics applications. This thesis concerns estimation of ego-motion and pose of a single downwards oriented camera under the assumptions of planar motion and known internal camera parameters. The so called essential matrix (or its uncalibrated counterpart, the fundamental matrix) is frequently used in computer vision applications to compute a reconstruction in 3D of the camera locations and the observed scene. However, if the observed points are expected to lie on a plane - e.g. the ground plane - this makes the determination of these matrices an ill-posed problem. Instead, methods based on homographies are better suited to this situation.One section of this thesis is concerned with the extraction of the camera pose and ego-motion from such homographies. We present both a direct SVD-based method and an iterative method, which both solve this problem. The iterative method is extended to allow simultaneous determination of the camera tilt from several homographies obeying the same planar motion model. This extension improves the robustness of the original method, and it provides consistent tilt estimates for the frames that are used for the estimation. The methods are evaluated using experiments on both real and synthetic data.Another part of the thesis deals with the problem of computing the homographies from point correspondences. By using conventional homography estimation methods for this, the resulting homography is of a too general class and is not guaranteed to be compatible with the planar motion assumption. For this reason, we enforce the planar motion model at the homography estimation stage with the help of a new homography solver using a number of polynomial constraints on the entries of the homography matrix. In addition to giving a homography of the right type, this method uses only \num{2.5} point correspondences instead of the conventional four, which is good \eg{} when used in a RANSAC framework for outlier removal

    Recalage géométrique avec plusieurs prototypes

    Get PDF
    Projet SYNTIMWe describe a general-purpose method for the accurate and robust interpretation of a data set of p-dimensional points by several deformable prototypes. This method is based on the fusion of two algorithms: a Generalization of the Iterative Closest Point (GICP) to different types of deformations for registration purposes, and a fuzzy clustering algorithm (FCM). Our method always converges monotonically to the nearest local minimum of a mean-square distance metric, and experiments show that the convergence is fast during the first few iterations. Therefore, we propose a scheme for choosing the initial solution to converge to an "interesting" local minimum. The method presented is very generic and can be applied: to shapes or objects in a p-dimensional space, to many shape patterns such as polyhedra, quadrics, polynomial functions, snakes, to many possible shape deformations such as rigid displacements, similitudes, affine and homographic transforms. Consequently, our method has important applications in registration with an ideal model prior to shape inspection, i.e. to interpret 2D or 3D sensed data obtained from calibrated or uncalibrated sensors. Experimental results illustrate some capabilities of our method.Nous décrivons un cadre général pour l'interprétation précise et robuste d'un ensemble de points par plusieurs prototypes déformables. Cette méthode est basée sur l'unification de deux algorithmes : une généralisation de l'algorithme "Iterative Closest Point" (GICP) à différents types de transformations pour des tâches de recalage, et un algorithme de classification floue (FCM) pour traiter plusieurs prototypes. Notre algorithme converge de façon monotone vers le plus proche minimum local d'un fonction de coût au moindre carré, et les expériences montrent que la convergence est rapide dans les premières étapes. En conséquence, nous avons proposé un schéma pour choisir la position initiale des prototypes pour qu'ils convergent vers une solution "intéressante". La méthode présentée est très générique et peut être appliquée : à des prototypes dans un espace de dimension p quelconque, à différentes formes de prototypes comme les polyèdres, les quadriques, les fonctions polynômiales, les snakes, à de nombreux types de déformations comme les déplacements rigides, les similitudes, les affinités et les homographies. Ainsi, notre méthode a un grand nombre d'applications en recalage avec un modèle idéal connu a priori, c'est-à-dire pour interpréter des données 2D et 3D obtenues par des capteurs calibrés ou non. Des résultats expérimentaux illustrent quelqu'unes des possibilités de notre approche

    Robust SLAM and motion segmentation under long-term dynamic large occlusions

    Get PDF
    Visual sensors are key to robot perception, which can not only help robot localisation but also enable robots to interact with the environment. However, in new environments, robots can fail to distinguish the static and dynamic components in the visual input. Consequently, robots are unable to track objects or localise themselves. Methods often require precise robot proprioception to compensate for camera movement and separate the static background from the visual input. However, robot proprioception, such as \ac{IMU} or wheel odometry, usually faces the problem of drift accumulation. The state-of-the-art methods demonstrate promising performance but either (1) require semantic segmentation, which is inaccessible in unknown environments, or (2) treat dynamic components as outliers -- which is unfeasible when dynamic objects occupy a large proportion of the visual input. This research work systematically unifies camera and multi-object tracking problems in indoor environments by proposing a multi-motion tracking system; and enables robots to differentiate the static and dynamic components in the visual input with the understanding of their own movements and actions. Detailed evaluation of both simulation environments and robotic platforms suggests that the proposed method outperforms the state-of-the-art dynamic SLAM methods when the majority of the camera view is occluded by multiple unmodeled objects over a long period of time
    corecore