5,784 research outputs found

    Robust Multiple-View Geometry Estimation Based on GMM

    Get PDF
    Given three partially overlapping views of the scene from which a set of point or line correspondences have been extracted, 3D structure and camera motion parameters can be represented by the trifocal tensor, which is the key to many problems of computer vision on three views. Unlike in conventional typical methods, the residual value is the only rule to eliminate outliers with large value, we build a Gaussian mixture model assuming that the residuals corresponding to the inliers come from Gaussian distributions different from that of the residuals of outliers. Then Bayesian rule of minimal risk is employed to classify all the correspondences using the parameters computed from GMM. Experiments with both synthetic data and real images show that our method is more robust and precise than other typical methods because it can efficiently detect and delete the bad corresponding points, which include both bad locations and false matches

    Multiple View Geometry For Video Analysis And Post-production

    Get PDF
    Multiple view geometry is the foundation of an important class of computer vision techniques for simultaneous recovery of camera motion and scene structure from a set of images. There are numerous important applications in this area. Examples include video post-production, scene reconstruction, registration, surveillance, tracking, and segmentation. In video post-production, which is the topic being addressed in this dissertation, computer analysis of the motion of the camera can replace the currently used manual methods for correctly aligning an artificially inserted object in a scene. However, existing single view methods typically require multiple vanishing points, and therefore would fail when only one vanishing point is available. In addition, current multiple view techniques, making use of either epipolar geometry or trifocal tensor, do not exploit fully the properties of constant or known camera motion. Finally, there does not exist a general solution to the problem of synchronization of N video sequences of distinct general scenes captured by cameras undergoing similar ego-motions, which is the necessary step for video post-production among different input videos. This dissertation proposes several advancements that overcome these limitations. These advancements are used to develop an efficient framework for video analysis and post-production in multiple cameras. In the first part of the dissertation, the novel inter-image constraints are introduced that are particularly useful for scenes where minimal information is available. This result extends the current state-of-the-art in single view geometry techniques to situations where only one vanishing point is available. The property of constant or known camera motion is also described in this dissertation for applications such as calibration of a network of cameras in video surveillance systems, and Euclidean reconstruction from turn-table image sequences in the presence of zoom and focus. We then propose a new framework for the estimation and alignment of camera motions, including both simple (panning, tracking and zooming) and complex (e.g. hand-held) camera motions. Accuracy of these results is demonstrated by applying our approach to video post-production applications such as video cut-and-paste and shadow synthesis. As realistic image-based rendering problems, these applications require extreme accuracy in the estimation of camera geometry, the position and the orientation of the light source, and the photometric properties of the resulting cast shadows. In each case, the theoretical results are fully supported and illustrated by both numerical simulations and thorough experimentation on real data

    Multiple View Geometry Transformers for 3D Human Pose Estimation

    Full text link
    In this work, we aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation. Recent works have focused on end-to-end learning-based transformer designs, which struggle to resolve geometric information accurately, particularly during occlusion. Instead, we propose a novel hybrid model, MVGFormer, which has a series of geometric and appearance modules organized in an iterative manner. The geometry modules are learning-free and handle all viewpoint-dependent 3D tasks geometrically which notably improves the model's generalization ability. The appearance modules are learnable and are dedicated to estimating 2D poses from image signals end-to-end which enables them to achieve accurate estimates even when occlusion occurs, leading to a model that is both accurate and generalizable to new cameras and geometries. We evaluate our approach for both in-domain and out-of-domain settings, where our model consistently outperforms state-of-the-art methods, and especially does so by a significant margin in the out-of-domain setting. We will release the code and models: https://github.com/XunshanMan/MVGFormer.Comment: 14 pages, 8 figure

    SuperPoint: Self-Supervised Interest Point Detection and Description

    Full text link
    This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches when compared to LIFT, SIFT and ORB.Comment: Camera-ready version for CVPR 2018 Deep Learning for Visual SLAM Workshop (DL4VSLAM2018

    A Light Touch Approach to Teaching Transformers Multi-view Geometry

    Full text link
    Transformers are powerful visual learners, in large part due to their conspicuous lack of manually-specified priors. This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propose a "light touch" approach, guiding visual Transformers to learn multiple-view geometry but allowing them to break free when needed. We achieve this by using epipolar lines to guide the Transformer's cross-attention maps, penalizing attention values outside the epipolar lines and encouraging higher attention along these lines since they contain geometrically plausible matches. Unlike previous methods, our proposal does not require any camera pose information at test-time. We focus on pose-invariant object instance retrieval, where standard Transformer networks struggle, due to the large differences in viewpoint between query and retrieved images. Experimentally, our method outperforms state-of-the-art approaches at object retrieval, without needing pose information at test-time

    A minimalistic approach to appearance-based visual SLAM

    Get PDF
    This paper presents a vision-based approach to SLAM in indoor / outdoor environments with minimalistic sensing and computational requirements. The approach is based on a graph representation of robot poses, using a relaxation algorithm to obtain a globally consistent map. Each link corresponds to a relative measurement of the spatial relation between the two nodes it connects. The links describe the likelihood distribution of the relative pose as a Gaussian distribution. To estimate the covariance matrix for links obtained from an omni-directional vision sensor, a novel method is introduced based on the relative similarity of neighbouring images. This new method does not require determining distances to image features using multiple view geometry, for example. Combined indoor and outdoor experiments demonstrate that the approach can handle qualitatively different environments (without modification of the parameters), that it can cope with violations of the “flat floor assumption” to some degree, and that it scales well with increasing size of the environment, producing topologically correct and geometrically accurate maps at low computational cost. Further experiments demonstrate that the approach is also suitable for combining multiple overlapping maps, e.g. for solving the multi-robot SLAM problem with unknown initial poses

    Why do we optimize what we optimize in multiple view geometry?

    Get PDF
    Para que un computador sea capaz de entender la geometría 3D de su entorno, necesitamos derivar las relaciones geométricas entre las imágenes 2D y el mundo 3D.La geometría de múltiples vistas es el área de investigación que estudia este problema.La mayor parte de métodos existentes resuelve pequeñas partes de este gran problema minimizando una determinada función objetivo.Estas funciones normalmente se componen de errores algebraicos o geométricos que representan las desviaciones con respecto al modelo de observación.En resumen, en general tratamos de recuperar la estructura 3D del mundo y el movimiento de la cámara encontrando el modelo que minimiza la discrepancia con respecto a las observaciones.El enfoque de esta tesis se centra principalmente en dos aspectos de los problemas de reconstrucción multivista:los criterios de error y la robustez.Primero, estudiamos los criterios de error usados en varios problemas geométricos y nos preguntamos`¿Por qué optimizamos lo que optimizamos?'Específicamente, analizamos sus pros y sus contras y proponemos métodos novedosos que combinan los criterios existentes o adoptan una mejor alternativa.En segundo lugar, tratamos de alcanzar el estado del arte en robustez frente a valores atípicos y escenarios desafiantes, que a menudo se encuentran en la práctica.Para ello, proponemos múltiples ideas novedosas que pueden ser incorporadas en los métodos basados en optimización.Específicamente, estudiamos los siguientes problemas: SLAM monocular, triangulación a partir de dos y de múltiples vistas, promedio de rotaciones únicas y múltiples, ajuste de haces únicamente con rotaciones de cámara, promedio robusto de números y evaluación cuantitativa de estimación de trayectoria.Para SLAM monocular, proponemos un enfoque híbrido novedoso que combina las fortalezas de los métodos directos y los basados en características.Los métodos directos minimizan los errores fotométricos entre los píxeles correspondientes en varias imágenes, mientras que los métodos basados en características minimizan los errores de reproyección.Nuestro método combina de manera débilmente acoplada la odometría directa y el SLAM basado en características, y demostramos que mejora la robustez en escenarios desafiantes, así como la precisión cuando el movimiento de la cámara realiza frecuentes revisitas.Para la triangulación de dos vistas, proponemos métodos óptimos que minimizan los errores de reproyección angular en forma cerrada.Dado que el error angular es rotacionalmente invariante, estos métodos se pueden utilizar para cámaras perspectivas, lentes de ojo de pez u omnidireccionales.Además, son mucho más rápidos que los métodos óptimos existentes en la literatura.Otro método de triangulación de dos vistas que proponemos adopta un enfoque completamente diferente:Modificamos ligeramente el método clásico del punto medio y demostramos que proporciona un equilibrio superior de precisión 2D y 3D, aunque no es óptimo.Para la triangulación multivista, proponemos un método robusto y eficiente utilizando RANSAC de dos vistas.Presentamos varios criterios de finalización temprana para RANSAC de dos vistas utilizando el método de punto medio y mostramos que mejora la eficiencia cuando la proporción de medidas espúreas es alta.Además, mostramos que la incertidumbre de un punto triangulado se puede modelar en función de tres factores: el número de cámaras, el error medio de reproyección y el ángulo de paralaje máximo.Al aprender este modelo, la incertidumbre se puede interpolar para cada caso.Para promediar una sola rotación, proponemos un método robusto basado en el algoritmo de Weiszfeld.La idea principal es comenzar con una inicialización robusta y realizar un esquema de rechazo de valores espúreos implícito dentro del algoritmo de Weiszfeld para aumentar aún más la robustez.Además, usamos una aproximación de la mediana cordal en SO(3)SO(3) que proporciona una aceleración significativa del método. Para promediar rotaciones múltiples proponemos HARA, un enfoque novedoso que inicializa de manera incremental el grafo de rotaciones basado en una jerarquía de compatibilidad con tripletas.Esencialmente, construimos un árbol de expansión priorizando los enlaces con muchos soportes triples fuertes y agregando gradualmente aquellos con menos soportes y más débiles.Como resultado, reducimos el riesgo de agregar valores atípicos en la solución inicial, lo que nos permite filtrar los valores atípicos antes de la optimización no lineal.Además, mostramos que podemos mejorar los resultados usando la función suavizada L0+ en el paso de refinamiento local.A continuación, proponemos el ajuste de haces únicamente con rotaciones, un método novedoso para estimar las rotaciones absolutas de múltiples vistas independientemente de las traslaciones y la estructura de la escena.La clave es minimizar una función de coste especialmente diseñada basada en el error epipolar normalizado, que está estrechamente relacionado con el error de reproyección angular óptimo L1 entre otras cantidades geométricas.Nuestro enfoque brinda múltiples beneficios, como inmunidad total a translaciones y triangulaciones imprecisas, robustez frente a rotaciones puras y escenas planas, y la mejora de la precisión cuando se usa tras el promedio de promedio de rotaciones explicado anteriormente.También proponemos RODIAN, un método robusto para promediar un conjunto de números contaminados por una gran proporción de valores atípicos.En nuestro método, asumimos que los valores atípicos se distribuyen uniformemente dentro del rango de los datos y buscamos la región que es menos probable que contenga solo valores atípicos.Luego tomamos la mediana de los datos dentro de esta región.Nuestro método es rápido, robusto y determinista, y no se basa en un límite de error interno conocido.Finalmente, para la evaluación cuantitativa de la trayectoria, señalamos la debilidad del Error de Trayectoria Absoluta (ATE) comúnmente utilizado y proponemos una alternativa novedosa llamada Error de Trayectoria Discernible (DTE).En presencia de solo unos pocos valores espúreos, el ATE pierde su sensibilidad respecto al error de trayectoria de los valores típicos y respecto al número de datos atípicos o espúreos.El DTE supera esta debilidad al alinear la trayectoria estimada con la verdadera (ground truth) utilizando un método robusto basado en varios tipos diferentes de medianas.Usando ideas similares, también proponemos una métrica de solo rotación, llamada Error de Rotación Discernible (DRE).Además, proponemos un método simple para calibrar la rotación de cámara a marcador, que es un requisito previo para el cálculo de DTE y DRE.<br /

    New Rank Deficiency Conditions for Multiple View Geometry of Point Features

    Get PDF
    In this paper, a new rank deficiency condition for multiple images of a point is presented. It is shown that a set of m images correspond to a unique 3-D pre-image point if and only if an associated 3(n-1) x 2} matrix, the so-called H matrix, is of maximum rank 1. If the rank is always 0 (i.e., the matrix is zero), then the pre-image is only determined up to a line on which all the camera centers must lie. This condition is shown to be equivalent to all the multilinear constraints, but it tremendously simplifies the derivation and proof of all the algebraic relationships among bilinear, trilinear and quadrilinear constraints. Since rank deficiency is a purely linear algebraic condition, it gives rise to a set of natural linear algorithms for purposes such as matching feature points, mapping images to a new view and motion estimation from images of multiple points. These linear algorithms use all available data simultaneously without specifying a particular choice of a set of pairwise, triple-wise or quadruple-wise images. Hence, to a large extent, such algorithms allow us to bypass the use of trifocal or quadrifocal tensors for similar purposes. The proposed rank deficiency condition is believed to be a more concise and universal way of describing the algebraic relationship among multiple images. Although only point features are discussed in this paper, a similar condition is studied in a companion paper for line features, as well as how the duality between points and lines are reflected in such rank deficiency conditions.ONR / N00014-00-1-0621Ope

    Theorems and algorithms for multiple view geometry with applications to electron tomography

    Get PDF
    The thesis considers both theory and algorithms for geometric computer vision. The framework of the work is built around the application of autonomous transmission electron microscope image registration. The theoretical part of the thesis first develops a consistent robust estimator that is evaluated in estimating two view geometry with both affine and projective camera models. The uncertainty of the fundamental matrix is similarly estimated robustly, and the previous observation whether the covariance matrix of the fundamental matrix contains disparity information of the scene is explained and its utilization in matching is discussed. For point tracking purposes, a reliable wavelet-based matching technique and two EM algorithms for the maximum likelihood affine reconstruction under missing data are proposed. The thesis additionally discusses identification of degeneracy as well as affine bundle adjustment. The application part of the thesis considers transmission electron microscope image registration, first with fiducial gold markers and thereafter without markers. Both methods utilize the techniques proposed in the theoretical part of the thesis and, in addition, a graph matching method is proposed for matching gold markers. Conversely, alignment without markers is disposed by tracking interest points of the intensity surface of the images. At the present level of development, the former method is more accurate but the latter is appropriate for situations where fiducial markers cannot be used. Perhaps the most significant result of the thesis is the proposed robust estimator because of consistence proof and its many application areas, which are not limited to the computer vision field. The other algorithms could be found useful in multiple view applications in computer vision that have to deal with uncertainty, matching, tracking, and reconstruction. From the viewpoint of image registration, the thesis further achieved its aims since two accurate image alignment methods are suggested for obtaining the most exact reconstructions in electron tomography.reviewe
    corecore