13 research outputs found

    Geometric and Algebraic Aspects of 3D Affine and Projective Structures from Perspective 2D Views

    Get PDF
    We investigate the differences --- conceptually and algorithmically --- between affine and projective frameworks for the tasks of visual recognition and reconstruction from perspective views. It is shown that an affine invariant exists between any view and a fixed view chosen as a reference view. This implies that for tasks for which a reference view can be chosen, such as in alignment schemes for visual recognition, projective invariants are not really necessary. We then use the affine invariant to derive new algebraic connections between perspective views. It is shown that three perspective views of an object are connected by certain algebraic functions of image coordinates alone (no structure or camera geometry needs to be involved)

    An Efficient Solution to the Homography-Based Relative Pose Problem With a Common Reference Direction

    Get PDF
    International audienceIn this paper, we propose a novel approach to two-view minimal-case relative pose problems based on homography with a common reference direction. We explore the rank-1 constraint on the difference between the Euclidean homog-raphy matrix and the corresponding rotation, and propose an efficient two-step solution for solving both the calibrated and partially calibrated (unknown focal length) problems. We derive new 3.5-point, 3.5-point, 4-point solvers for two cameras such that the two focal lengths are unknown but equal, one of them is unknown, and both are unknown and possibly different, respectively. We present detailed analyses and comparisons with existing 6-and 7-point solvers, including results with smart phone images

    THREE-DIMENSIONAL VISION FOR STRUCTURE AND MOTION ESTIMATION

    Get PDF
    1997/1998Questa tesi, intitolata Visione Tridimensionale per la stima di Struttura e Moto, tratta di tecniche di Visione Artificiale per la stima delle proprietà geometriche del mondo tridimensionale a partire da immagini numeriche. Queste proprietà sono essenziali per il riconoscimento e la classificazione di oggetti, la navigazione di veicoli mobili autonomi, il reverse engineering e la sintesi di ambienti virtuali. In particolare, saranno descritti i moduli coinvolti nel calcolo della struttura della scena a partire dalle immagini, e verranno presentati contributi originali nei seguenti campi. Rettificazione di immagini steroscopiche. Viene presentato un nuovo algoritmo per la rettificazione, il quale trasforma una coppia di immagini stereoscopiche in maniera che punti corrispondenti giacciano su linee orizzontali con lo stesso indice. Prove sperimentali dimostrano il corretto comportamento del metodo, come pure la trascurabile perdita di accuratezza nella ricostruzione tridimensionale quando questa sia ottenuta direttamente dalle immagini rettificate. Calcolo delle corrispondenze in immagini stereoscopiche. Viene analizzato il problema della stereovisione e viene presentato un un nuovo ed efficiente algoritmo per l'identificazione di coppie di punti corrispondenti, capace di calcolare in modo robusto la disparità stereoscopica anche in presenza di occlusioni. L'algoritmo, chiamato SMW, usa uno schema multi-finestra adattativo assieme al controllo di coerenza destra-sinistra per calcolare la disparità e l'incertezza associata. Gli esperimenti condotti con immagini sintetiche e reali mostrano che SMW sortisce un miglioramento in accuratezza ed efficienza rispetto a metodi simili Inseguimento di punti salienti. L'inseguitore di punti salienti di Shi-Tomasi- Kanade viene migliorato introducendo uno schema automatico per lo scarto di punti spuri basato sulla diagnostica robusta dei campioni periferici ( outliers ). Gli esperimenti con immagini sintetiche e reali confermano il miglioramento rispetto al metodo originale, sia qualitativamente che quantitativamente. Ricostruzione non calibrata. Viene presentata una rassegna ragionata dei metodi per la ricostruzione di un modello tridimensionale della scena, a partire da una telecamera che si muove liberamente e di cui non sono noti i parametri interni. Il contributo consiste nel fornire una visione critica e unificata delle più recenti tecniche. Una tale rassegna non esiste ancora in letterarura. Moto tridimensionale. Viene proposto un algoritmo robusto per registrate e calcolare le corrispondenze in due insiemi di punti tridimensionali nei quali vi sia un numero significativo di elementi mancanti. Il metodo, chiamato RICP, sfrutta la stima robusta con la Minima Mediana dei Quadrati per eliminare l'effetto dei campioni periferici. Il confronto sperimentale con una tecnica simile, ICP, mostra la superiore robustezza e affidabilità di RICP.This thesis addresses computer vision techniques estimating geometrie properties of the 3-D world /rom digital images. Such properties are essential for object recognition and classification, mobile robots navigation, reverse engineering and synthesis of virtual environments. In particular, this thesis describes the modules involved in the computation of the structure of a scene given some images, and offers original contributions in the following fields. Stereo pairs rectification. A novel rectification algorithm is presented, which transform a stereo pair in such a way that corresponding points in the two images lie on horizontal lines with the same index. Experimental tests prove the correct behavior of the method, as well as the negligible decrease oLthe accuracy of 3-D reconstruction if performed from the rectified images directly. Stereo matching. The problem of computational stereopsis is analyzed, and a new, efficient stereo matching algorithm addressing robust disparity estimation in the presence of occlusions is presented. The algorithm, called SMW, is an adaptive, multi-window scheme using left-right consistency to compute disparity and its associated uncertainty. Experiments with both synthetic and real stereo pairs show how SMW improves on closely related techniques for both accuracy and efficiency. Features tracking. The Shi-Tomasi-Kanade feature tracker is improved by introducing an automatic scheme for rejecting spurious features, based on robust outlier diagnostics. Experiments with real and synthetic images confirm the improvement over the original tracker, both qualitatively and quantitatively. 111 Uncalibrated vision. A review on techniques for computing a three-dimensional model of a scene from a single moving camera, with unconstrained motion and unknown parameters is presented. The contribution is to give a critical, unified view of some of the most promising techniques. Such review does not yet exist in the literature. 3-D motion. A robust algorithm for registering and finding correspondences in two sets of 3-D points with significant percentages of missing data is proposed. The method, called RICP, exploits LMedS robust estimation to withstand the effect of outliers. Experimental comparison with a closely related technique, ICP, shows RICP's superior robustness and reliability.XI Ciclo1968Versione digitalizzata della tesi di dottorato cartacea

    A Model-Selection Framework for Multibody Structure-and-Motion of Image Sequences

    Get PDF
    Given an image sequence of a scene consisting of multiple rigidly moving objects, multi-body structure-and-motion (MSaM) is the task to segment the image feature tracks into the different rigid objects and compute the multiple-view geometry of each object. We present a framework for multibody structure-and-motion based on model selection. In a recover-and-select procedure, a redundant set of hypothetical scene motions is generated. Each subset of this pool of motion candidates is regarded as a possible explanation of the image feature tracks, and the most likely explanation is selected with model selection. The framework is generic and can be used with any parametric camera model, or with a combination of different models. It can deal with sets of correspondences, which change over time, and it is robust to realistic amounts of outliers. The framework is demonstrated for different camera and scene model

    Using Geometric Constraints for Camera Calibration and Positioning and 3D Scene Modelling

    Get PDF
    International audienceThis work concerns the incorporation of geometric information in camera calibration and 3D modelling. Using geometric constraints enables stabler results and allows to perform tasks with fewer images. Our approach is interactive; the user defines geometric primitives and constraints between them. It is based on the observation that constraints such as coplanarity, parallelism or orthogonality, are easy to delineate by a user, and are well adapted to model the main structure of e.g. architectural scenes. We propose methods for camera calibration, camera position estimation and 3D scene reconstruction, all based on such geometric constraints. Various approaches exist for calibration and positioning from constraints, often based on vanishing points. We generalize this by considering composite primitives based on triplets of vanishing points. These are frequent in architectural scenes and considering composites of vanishing points makes computations more stable. They are defined by depicting in the images points belonging to parallelepipedic structures (e.g. appropriate points on two connected walls). Constraints on angles or length ratios on these structures can then be easily imposed. A method is proposed that "collects" all these data for all considered images, and computes simultaneously the calibration and pose of all cameras via matrix factorization. 3D scene reconstruction is then performed using many more geometric constraints, i.e. not only those encapsulated by parallelepipedic structures. A method is proposed that reconstructs the whole scene in iterations, solving a linear equation system at each iteration, and which includes an analysis of the parts of the scene that can/cannot be reconstructed at the current stage. The complete approach is validated by various experimental results, for cases where a single or several views are available

    View-Invariance in Visual Human Motion Analysis

    Get PDF
    This thesis makes contributions towards the solutions to two problems in the area of visual human motion analysis: human action recognition and human body pose estimation. Although there has been a substantial amount of research addressing these two problems in the past, the important issue of viewpoint invariance in the representation and recognition of poses and actions has received relatively scarce attention, and forms a key goal of this thesis. Drawing on results from 2D projective invariance theory and 3D mutual invariants, we present three different approaches of varying degrees of generality, for human action representation and recognition. A detailed analysis of the approaches reveals key challenges, which are circumvented by enforcing spatial and temporal coherency constraints. An extensive performance evaluation of the approaches on 2D projections of motion capture data and manually segmented real image sequences demonstrates that in addition to viewpoint changes, the approaches are able to handle well, varying speeds of execution of actions (and hence different frame rates of the video), different subjects and minor variabilities in the spatiotemporal dynamics of the action. Next, we present a method for recovering the body-centric coordinates of key joints and parts of a canonically scaled human body, given an image of the body and the point correspondences of specific body joints in an image. This problem is difficult to solve because of body articulation and perspective effects. To make the problem tractable, previous researchers have resorted to restricting the camera model or requiring an unrealistic number of point correspondences, both of which are more restrictive than necessary. We present a solution for the general case of a perspective uncalibrated camera. Our method requires that the torso does not twist considerably, an assumption that is usually satisfied for many poses of the body. We evaluate the quantitative performance of the method on synthetic data and the qualitative performance of the method on real images taken with unknown cameras and viewpoints. Both these evaluations show the effectiveness of the method at recovering the pose of the human body

    Visual guidance of unmanned aerial manipulators

    Get PDF
    The ability to fly has greatly expanded the possibilities for robots to perform surveillance, inspection or map generation tasks. Yet it was only in recent years that research in aerial robotics was mature enough to allow active interactions with the environment. The robots responsible for these interactions are called aerial manipulators and usually combine a multirotor platform and one or more robotic arms. The main objective of this thesis is to formalize the concept of aerial manipulator and present guidance methods, using visual information, to provide them with autonomous functionalities. A key competence to control an aerial manipulator is the ability to localize it in the environment. Traditionally, this localization has required external infrastructure of sensors (e.g., GPS or IR cameras), restricting the real applications. Furthermore, localization methods with on-board sensors, exported from other robotics fields such as simultaneous localization and mapping (SLAM), require large computational units becoming a handicap in vehicles where size, load, and power consumption are important restrictions. In this regard, this thesis proposes a method to estimate the state of the vehicle (i.e., position, orientation, velocity and acceleration) by means of on-board, low-cost, light-weight and high-rate sensors. With the physical complexity of these robots, it is required to use advanced control techniques during navigation. Thanks to their redundancy on degrees-of-freedom, they offer the possibility to accomplish not only with mobility requirements but with other tasks simultaneously and hierarchically, prioritizing them depending on their impact to the overall mission success. In this work we present such control laws and define a number of these tasks to drive the vehicle using visual information, guarantee the robot integrity during flight, and improve the platform stability or increase arm operability. The main contributions of this research work are threefold: (1) Present a localization technique to allow autonomous navigation, this method is specifically designed for aerial platforms with size, load and computational burden restrictions. (2) Obtain control commands to drive the vehicle using visual information (visual servo). (3) Integrate the visual servo commands into a hierarchical control law by exploiting the redundancy of the robot to accomplish secondary tasks during flight. These tasks are specific for aerial manipulators and they are also provided. All the techniques presented in this document have been validated throughout extensive experimentation with real robotic platforms.La capacitat de volar ha incrementat molt les possibilitats dels robots per a realitzar tasques de vigilància, inspecció o generació de mapes. Tot i això, no és fins fa pocs anys que la recerca en robòtica aèria ha estat prou madura com per començar a permetre interaccions amb l’entorn d’una manera activa. Els robots per a fer-ho s’anomenen manipuladors aeris i habitualment combinen una plataforma multirotor i un braç robòtic. L’objectiu d’aquesta tesi és formalitzar el concepte de manipulador aeri i presentar mètodes de guiatge, utilitzant informació visual, per dotar d’autonomia aquest tipus de vehicles. Una competència clau per controlar un manipulador aeri és la capacitat de localitzar-se en l’entorn. Tradicionalment aquesta localització ha requerit d’infraestructura sensorial externa (GPS, càmeres IR, etc.), limitant així les aplicacions reals. Pel contrari, sistemes de localització exportats d’altres camps de la robòtica basats en sensors a bord, com per exemple mètodes de localització i mapejat simultànis (SLAM), requereixen de gran capacitat de còmput, característica que penalitza molt en vehicles on la mida, pes i consum elèctric son grans restriccions. En aquest sentit, aquesta tesi proposa un mètode d’estimació d’estat del robot (posició, velocitat, orientació i acceleració) a partir de sensors instal·lats a bord, de baix cost, baix consum computacional i que proporcionen mesures a alta freqüència. Degut a la complexitat física d’aquests robots, és necessari l’ús de tècniques de control avançades. Gràcies a la seva redundància de graus de llibertat, aquests robots ens ofereixen la possibilitat de complir amb els requeriments de mobilitat i, simultàniament, realitzar tasques de manera jeràrquica, ordenant-les segons l’impacte en l’acompliment de la missió. En aquest treball es presenten aquestes lleis de control, juntament amb la descripció de tasques per tal de guiar visualment el vehicle, garantir la integritat del robot durant el vol, millorar de l’estabilitat del vehicle o augmentar la manipulabilitat del braç. Aquesta tesi es centra en tres aspectes fonamentals: (1) Presentar una tècnica de localització per dotar d’autonomia el robot. Aquest mètode està especialment dissenyat per a plataformes amb restriccions de capacitat computacional, mida i pes. (2) Obtenir les comandes de control necessàries per guiar el vehicle a partir d’informació visual. (3) Integrar aquestes accions dins una estructura de control jeràrquica utilitzant la redundància del robot per complir altres tasques durant el vol. Aquestes tasques son específiques per a manipuladors aeris i també es defineixen en aquest document. Totes les tècniques presentades en aquesta tesi han estat avaluades de manera experimental amb plataformes robòtiques real

    Multiple View Geometry For Video Analysis And Post-production

    Get PDF
    Multiple view geometry is the foundation of an important class of computer vision techniques for simultaneous recovery of camera motion and scene structure from a set of images. There are numerous important applications in this area. Examples include video post-production, scene reconstruction, registration, surveillance, tracking, and segmentation. In video post-production, which is the topic being addressed in this dissertation, computer analysis of the motion of the camera can replace the currently used manual methods for correctly aligning an artificially inserted object in a scene. However, existing single view methods typically require multiple vanishing points, and therefore would fail when only one vanishing point is available. In addition, current multiple view techniques, making use of either epipolar geometry or trifocal tensor, do not exploit fully the properties of constant or known camera motion. Finally, there does not exist a general solution to the problem of synchronization of N video sequences of distinct general scenes captured by cameras undergoing similar ego-motions, which is the necessary step for video post-production among different input videos. This dissertation proposes several advancements that overcome these limitations. These advancements are used to develop an efficient framework for video analysis and post-production in multiple cameras. In the first part of the dissertation, the novel inter-image constraints are introduced that are particularly useful for scenes where minimal information is available. This result extends the current state-of-the-art in single view geometry techniques to situations where only one vanishing point is available. The property of constant or known camera motion is also described in this dissertation for applications such as calibration of a network of cameras in video surveillance systems, and Euclidean reconstruction from turn-table image sequences in the presence of zoom and focus. We then propose a new framework for the estimation and alignment of camera motions, including both simple (panning, tracking and zooming) and complex (e.g. hand-held) camera motions. Accuracy of these results is demonstrated by applying our approach to video post-production applications such as video cut-and-paste and shadow synthesis. As realistic image-based rendering problems, these applications require extreme accuracy in the estimation of camera geometry, the position and the orientation of the light source, and the photometric properties of the resulting cast shadows. In each case, the theoretical results are fully supported and illustrated by both numerical simulations and thorough experimentation on real data
    corecore