Search CORE

325 research outputs found

Lunar Crater Identification in Digital Images

Author: Christian John A.
Derksen Harm
Watkins Ryan
Publication venue
Publication date: 14/09/2020
Field of study

It is often necessary to identify a pattern of observed craters in a single image of the lunar surface and without any prior knowledge of the camera's location. This so-called "lost-in-space" crater identification problem is common in both crater-based terrain relative navigation (TRN) and in automatic registration of scientific imagery. Past work on crater identification has largely been based on heuristic schemes, with poor performance outside of a narrowly defined operating regime (e.g., nadir pointing images, small search areas). This work provides the first mathematically rigorous treatment of the general crater identification problem. It is shown when it is (and when it is not) possible to recognize a pattern of elliptical crater rims in an image formed by perspective projection. For the cases when it is possible to recognize a pattern, descriptors are developed using invariant theory that provably capture all of the viewpoint invariant information. These descriptors may be pre-computed for known crater patterns and placed in a searchable index for fast recognition. New techniques are also developed for computing pose from crater rim observations and for evaluating crater rim correspondences. These techniques are demonstrated on both synthetic and real images

arXiv.org e-Print Archive

Vision-Based Object Recognition and 3-D Pose Estimation Using Conic Features

Author: 김헌희
Publication venue: 한국해양대학교 대학원
Publication date: 01/02/2012
Field of study

This thesis deals with monocular vision-based object recognition and 3-D pose estimation based on conic features. Conic features including circles and ellipses are frequently observed in many man-made objects in real word as well as have the merit of robustness potentially in feature extraction in vision-based applications. Although the 3-D pose estimation problem of conic features in 3-D space has been studied well since 1990, the previous work has not provided a unique solution completely for full 3-D pose parameters (i.e., 3-orientations and 3-positions) due to complexity from high nonlinearity of a general conic. This thesis, therefore, renews conic features in a new perspective on geometric invariants in both 3-D space and 2-D projective space, incorporating other geometric features with conics. First, as the most essential step in dealing with conics, this thesis shows that the pose parameters of a circular feature in 3-D space can be derived analytically from incorporating a coplanar point. A procedure of pose parameter recovery is described in detail, and its performance is evaluated and discussed in view of pose estimation errors and sensitivity. Second, it is also revealed that the pose of an elliptic feature can be resolved when two coplanar points are incorporated on the basis of the polarity of two points for a conic in 2-D projective space. This thesis proposes a series of algorithms to determine the 3-D pose parameters uniquely, and evaluates the proposed method through a measure of estimation performance and sensitivity depending on point locations. Third, a pair of two conics is dealt with, which is regarded as an extension of the idea of the incorporation scheme to another conic feature from point features. Under the polarity concept, this thesis proves that the problem involving a pair of two conics can be formulated with the problem of one ellipse with two points so that its solution is derived in the same form as in the ellipse case. In order to treat two or more conic objects as well as to deal with an object recognition problem, the rest of thesis concentrates on the theoretical foundation of multiple object recognition. First, some effective modeling approaches are described. A general object model is specially designed to model multiple objects for object recognition and pose recovery in view of spatial geometry. In particular, this thesis defines a pairwise conic model that can describes the geometrical relation between two conics invariantly in 2-D projective space, which consists of a pairwise conic (PC), a pairwise conic invariant (PCI), and a pairwise conic pole (PCP). Based on the two kinds of models, an object learning and recognition system is proposed as a general framework for multiple object recognition. Considering simplicity and flexibility in object learning stage, this thesis introduces a semi-automatic learning scheme to construct the multiple object model from a model image at once. To utilize geometric relations among multiple objects effectively in object recognition, this thesis specifies some feature functions based on the pairwise conic model, and then describes an object recognition method in a fashion of linear-chain conditional random field (CRF). In particular, as a post refinement step of the recognition, a geometric alignment procedure is also proposed in algorithmic details to improve recognition performance against noisy conditions. Last, the multiple object recognition method is evaluated intensively through two practical applications that deal with a place recognition and an elevator button recognition problem for service robots. A series of experiment results supports the effectiveness of the proposed method, maintaining reliable performance against noisy conditions in the presence of perspective distortion and partial object occlusions.Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Research objective and expected contribution . . . . . . . . . . . . . . . . . . 6 1.4 Organization of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 3-D Pose Estimation of a Circular Feature 10 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.4 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Preliminaries: an elliptic cone in 3-D space and its homogeneous representation in 2-D projective space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Homogeneous representation . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Principal planes of a cone versus diagonalization of a conic matrix Q . 16 2.3 3-D interpretation of a circular feature for 3-D pose estimation . . . . . . . . 19 2.3.1 3-D orientation estimation . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.2 3-D position estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.3 Composition of homogeneous transformation and discrimination for the unique solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.1 A numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.2 Evaluation of pose estimation performance . . . . . . . . . . . . . . . 29 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3 3-D Pose Estimation of an Elliptic Feature 35 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.1.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Interpretation of an elliptic feature with coplanar points in 2-D projective space 38 3.2.1 The minimal number of points for pose estimation . . . . . . . . . . . 39 3.2.2 Analysis of possible constraints for relative positions of two points to an ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.3 Feature selection scheme for stable homography estimation . . . . . . 43 3.3 3-D pose estimation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.1 Extraction of triangular features from an elliptic object . . . . . . . . 47 3.3.2 Homography decomposition . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.3 Composition of homogeneous transformation matrix with unique solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.2 Evaluation of the proposed method . . . . . . . . . . . . . . . . . . . . 54 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4 3-D Pose Estimation of a Pair of Conic Features 61 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 3-D pose estimation of a conic feature incorporated with line features . . . . 61 4.3 3-D pose estimation of a conic feature incorporated with another conic feature 63 4.3.1 Some examples of self-polar triangle and invariants . . . . . . . . . . . 65 4.3.2 3-D pose estimation of a pair of coplanar conics . . . . . . . . . . . . . 67 4.3.3 Examples of 3-D pose estimation of a conic feature incorporated with another conic feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5 Multiple Object Recognition Based on Pairwise Conic Model 77 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Learning of geometric relation of multiple objects . . . . . . . . . . . . . . . . 78 5.3 Pairwise conic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.3.1 De_nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.4 Multiple object recognition based on pairwise conic model and conditional random _elds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4.1 Graphical model for multiple object recognition . . . . . . . . . . . . . 86 5.4.2 Linear-chain conditional random _eld . . . . . . . . . . . . . . . . . . 87 5.4.3 Determination of low-level feature functions for multiple object recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.4.4 Range selection trick for e_ciently computing the costs of low-level feature functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.4.5 Evaluation of observation sequence . . . . . . . . . . . . . . . . . . . . 93 5.4.6 Object recognition based on hierarchical CRF . . . . . . . . . . . . . . 95 5.5 Geometric alignment algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6 Application to Place Recognition for Service Robots 105 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.2 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2.1 Detection of 2-D geometric shapes . . . . . . . . . . . . . . . . . . . . 107 6.2.2 Examples of shape feature extraction . . . . . . . . . . . . . . . . . . . 109 6.3 Object modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.3.1 A place model that describes multiple landmark objects . . . . . . . . 112 6.3.2 Pairwise conic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.3.3 Incorporation of non-conic features with a pairwise conic model . . . . 114 6.4 Place learning and recognition system . . . . . . . . . . . . . . . . . . . . . . 121 6.4.1 HCRF-based recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.5 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.5.2 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7 Application to Elevator Button Recognition 136 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.1.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.2 Object modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.2.1 Geometric model for multiple button objects . . . . . . . . . . . . . . 140 7.2.2 Pairwise conic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.3 Learning and recognition system . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.3.1 Button object learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.3.2 CRF-based recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.4 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.4.2 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8 Concluding remarks 159 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 8.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 References 161 Summary (in Korean) 16

한국해양대학교(KMOU)

Angular variation as a monocular cue for spatial percepcion

Author: Navarro Toro Agustín Alfonso
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2009
Field of study

Monocular cues are spatial sensory inputs which are picked up exclusively from one eye. They are in majority static features that provide depth information and are extensively used in graphic art to create realistic representations of a scene. Since the spatial information contained in these cues is picked up from the retinal image, the existence of a link between it and the theory of direct perception can be conveniently assumed. According to this theory, spatial information of an environment is directly contained in the optic array. Thus, this assumption makes possible the modeling of visual perception processes through computational approaches. In this thesis, angular variation is considered as a monocular cue, and the concept of direct perception is adopted by a computer vision approach that considers it as a suitable principle from which innovative techniques to calculate spatial information can be developed. The expected spatial information to be obtained from this monocular cue is the position and orientation of an object with respect to the observer, which in computer vision is a well known field of research called 2D-3D pose estimation. In this thesis, the attempt to establish the angular variation as a monocular cue and thus the achievement of a computational approach to direct perception is carried out by the development of a set of pose estimation methods. Parting from conventional strategies to solve the pose estimation problem, a first approach imposes constraint equations to relate object and image features. In this sense, two algorithms based on a simple line rotation motion analysis were developed. These algorithms successfully provide pose information; however, they depend strongly on scene data conditions. To overcome this limitation, a second approach inspired in the biological processes performed by the human visual system was developed. It is based in the proper content of the image and defines a computational approach to direct perception. The set of developed algorithms analyzes the visual properties provided by angular variations. The aim is to gather valuable data from which spatial information can be obtained and used to emulate a visual perception process by establishing a 2D-3D metric relation. Since it is considered fundamental in the visual-motor coordination and consequently essential to interact with the environment, a significant cognitive effect is produced by the application of the developed computational approach in environments mediated by technology. In this work, this cognitive effect is demonstrated by an experimental study where a number of participants were asked to complete an action-perception task. The main purpose of the study was to analyze the visual guided behavior in teleoperation and the cognitive effect caused by the addition of 3D information. The results presented a significant influence of the 3D aid in the skill improvement, which showed an enhancement of the sense of presence.Las señales monoculares son entradas sensoriales capturadas exclusivamente por un solo ojo que ayudan a la percepción de distancia o espacio. Son en su mayoría características estáticas que proveen información de profundidad y son muy utilizadas en arte gráfico para crear apariencias reales de una escena. Dado que la información espacial contenida en dichas señales son extraídas de la retina, la existencia de una relación entre esta extracción de información y la teoría de percepción directa puede ser convenientemente asumida. De acuerdo a esta teoría, la información espacial de todo le que vemos está directamente contenido en el arreglo óptico. Por lo tanto, esta suposición hace posible el modelado de procesos de percepción visual a través de enfoques computacionales. En esta tesis doctoral, la variación angular es considerada como una señal monocular, y el concepto de percepción directa adoptado por un enfoque basado en algoritmos de visión por computador que lo consideran un principio apropiado para el desarrollo de nuevas técnicas de cálculo de información espacial. La información espacial esperada a obtener de esta señal monocular es la posición y orientación de un objeto con respecto al observador, lo cual en visión por computador es un conocido campo de investigación llamado estimación de la pose 2D-3D. En esta tesis doctoral, establecer la variación angular como señal monocular y conseguir un modelo matemático que describa la percepción directa, se lleva a cabo mediante el desarrollo de un grupo de métodos de estimación de la pose. Partiendo de estrategias convencionales, un primer enfoque implanta restricciones geométricas en ecuaciones para relacionar características del objeto y la imagen. En este caso, dos algoritmos basados en el análisis de movimientos de rotación de una línea recta fueron desarrollados. Estos algoritmos exitosamente proveen información de la pose. Sin embargo, dependen fuertemente de condiciones de la escena. Para superar esta limitación, un segundo enfoque inspirado en los procesos biológicos ejecutados por el sistema visual humano fue desarrollado. Está basado en el propio contenido de la imagen y define un enfoque computacional a la percepción directa. El grupo de algoritmos desarrollados analiza las propiedades visuales suministradas por variaciones angulares. El propósito principal es el de reunir datos de importancia con los cuales la información espacial pueda ser obtenida y utilizada para emular procesos de percepción visual mediante el establecimiento de relaciones métricas 2D- 3D. Debido a que dicha relación es considerada fundamental en la coordinación visuomotora y consecuentemente esencial para interactuar con lo que nos rodea, un efecto cognitivo significativo puede ser producido por la aplicación de métodos de L estimación de pose en entornos mediados tecnológicamente. En esta tesis doctoral, este efecto cognitivo ha sido demostrado por un estudio experimental en el cual un número de participantes fueron invitados a ejecutar una tarea de acción-percepción. El propósito principal de este estudio fue el análisis de la conducta guiada visualmente en teleoperación y el efecto cognitivo causado por la inclusión de información 3D. Los resultados han presentado una influencia notable de la ayuda 3D en la mejora de la habilidad, así como un aumento de la sensación de presencia

Methods for Recognizing Pose and Action of Articulated Objects with Collection of Planes in Motion

Author: Foroosh Hassan
Shen Yuping
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 17/06/2014
Field of study

The invention comprises an improved system, method, and computer-readable instructions for recognizing pose and action of articulated objects with collection of planes in motion. The method starts with a video sequence and a database of reference sequences corresponding to different known actions. The method identifies the sequence from the reference sequences such that the subject in performs the closest action to that observed. The method compares actions by comparing pose transitions. The cross-homography invariant may be used for view-invariant recognition of human body pose transition and actions

Study Of Human Activity In Video Data With An Emphasis On View-invariance

Author: Ashraf Nazim
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2012
Field of study

The perception and understanding of human motion and action is an important area of research in computer vision that plays a crucial role in various applications such as surveillance, HCI, ergonomics, etc. In this thesis, we focus on the recognition of actions in the case of varying viewpoints and different and unknown camera intrinsic parameters. The challenges to be addressed include perspective distortions, differences in viewpoints, anthropometric variations, and the large degrees of freedom of articulated bodies. In addition, we are interested in methods that require little or no training. The current solutions to action recognition usually assume that there is a huge dataset of actions available so that a classifier can be trained. However, this means that in order to define a new action, the user has to record a number of videos from different viewpoints with varying camera intrinsic parameters and then retrain the classifier, which is not very practical from a development point of view. We propose algorithms that overcome these challenges and require just a few instances of the action from any viewpoint with any intrinsic camera parameters. Our first algorithm is based on the rank constraint on the family of planar homographies associated with triplets of body points. We represent action as a sequence of poses, and decompose the pose into triplets. Therefore, the pose transition is broken down into a set of movement of body point planes. In this way, we transform the non-rigid motion of the body points into a rigid motion of body point iii planes. We use the fact that the family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between two subjects, observed by different perspective cameras and from different viewpoints. This method requires only one instance of the action. We then show that it is possible to extend the concept of triplets to line segments. In particular, we establish that if we look at the movement of line segments instead of triplets, we have more redundancy in data thus leading to better results. We demonstrate this concept on “fundamental ratios.” We decompose a human body pose into line segments instead of triplets and look at set of movement of line segments. This method needs only three instances of the action. If a larger dataset is available, we can also apply weighting on line segments for better accuracy. The last method is based on the concept of “Projective Depth”. Given a plane, we can find the relative depth of a point relative to the given plane. We propose three different ways of using “projective depth:” (i) Triplets - the three points of a triplet along with the epipole defines the plane and the movement of points relative to these body planes can be used to recognize actions; (ii) Ground plane - if we are able to extract the ground plane, we can find the “projective depth” of the body points with respect to it. Therefore, the problem of action recognition would translate to curve matching; and (iii) Mirror person - We can use the mirror view of the person to extract mirror symmetric planes. This method also needs only one instance of the action. Extensive experiments are reported on testing view invariance, robustness to noisy localization and occlusions of body points, and action recognition. The experimental results are very promising and demonstrate the efficiency of our proposed invariants. i

Multiple View Geometry For Video Analysis And Post-production

Author: Cao Xiaochun
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2006
Field of study

Multiple view geometry is the foundation of an important class of computer vision techniques for simultaneous recovery of camera motion and scene structure from a set of images. There are numerous important applications in this area. Examples include video post-production, scene reconstruction, registration, surveillance, tracking, and segmentation. In video post-production, which is the topic being addressed in this dissertation, computer analysis of the motion of the camera can replace the currently used manual methods for correctly aligning an artificially inserted object in a scene. However, existing single view methods typically require multiple vanishing points, and therefore would fail when only one vanishing point is available. In addition, current multiple view techniques, making use of either epipolar geometry or trifocal tensor, do not exploit fully the properties of constant or known camera motion. Finally, there does not exist a general solution to the problem of synchronization of N video sequences of distinct general scenes captured by cameras undergoing similar ego-motions, which is the necessary step for video post-production among different input videos. This dissertation proposes several advancements that overcome these limitations. These advancements are used to develop an efficient framework for video analysis and post-production in multiple cameras. In the first part of the dissertation, the novel inter-image constraints are introduced that are particularly useful for scenes where minimal information is available. This result extends the current state-of-the-art in single view geometry techniques to situations where only one vanishing point is available. The property of constant or known camera motion is also described in this dissertation for applications such as calibration of a network of cameras in video surveillance systems, and Euclidean reconstruction from turn-table image sequences in the presence of zoom and focus. We then propose a new framework for the estimation and alignment of camera motions, including both simple (panning, tracking and zooming) and complex (e.g. hand-held) camera motions. Accuracy of these results is demonstrated by applying our approach to video post-production applications such as video cut-and-paste and shadow synthesis. As realistic image-based rendering problems, these applications require extreme accuracy in the estimation of camera geometry, the position and the orientation of the light source, and the photometric properties of the resulting cast shadows. In each case, the theoretical results are fully supported and illustrated by both numerical simulations and thorough experimentation on real data

Vision-assisted modeling for model-based video representations

Author: Becker Shawn C. (Shawn Carter)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1997
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1997.Includes bibliographical references (leaves 134-145).by Shawn C. Becker.Ph.D

Calibration and Metrology Using Still and Video Images

Author: Guo Feng
Publication venue
Publication date: 03/08/2007
Field of study

Metrology, the measurement of real world metrics, has been investigated extensively in computer vision for many applications. The prevalence of video cameras and sequences has led to the demand for fully automated systems. Most of the existing video metrology methods are simple extensions of still-image algorithms, which have certain limitations, requiring constraints such as parallelism of lines. New techniques are needed in order to achieve accurate results for broader applications. An important preprocessing step and a closely related topic to metrology is calibration using planar patterns. Existing approaches lack exibility and robustness when extended to video sequences. This dissertation advances the state of the art in calibration and video metrology in three directions: (1) the concept of partial rectification is proposed along with new calibration techniques using a circle with diverse types of constraints; (2) new calibration methods for video sequences using planar patterns undergoing planar motion are proposed; and (3) new algorithms to extend video metrology to a wide range of applications are presented. A fully automated system using the new technique has been built for measuring the wheelbases of vehicles

Image Based View Synthesis

Author: Xiao Jiangjian
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2004
Field of study

This dissertation deals with the image-based approach to synthesize a virtual scene using sparse images or a video sequence without the use of 3D models. In our scenario, a real dynamic or static scene is captured by a set of un-calibrated images from different viewpoints. After automatically recovering the geometric transformations between these images, a series of photo-realistic virtual views can be rendered and a virtual environment covered by these several static cameras can be synthesized. This image-based approach has applications in object recognition, object transfer, video synthesis and video compression. In this dissertation, I have contributed to several sub-problems related to image based view synthesis. Before image-based view synthesis can be performed, images need to be segmented into individual objects. Assuming that a scene can approximately be described by multiple planar regions, I have developed a robust and novel approach to automatically extract a set of affine or projective transformations induced by these regions, correctly detect the occlusion pixels over multiple consecutive frames, and accurately segment the scene into several motion layers. First, a number of seed regions using correspondences in two frames are determined, and the seed regions are expanded and outliers are rejected employing the graph cuts method integrated with level set representation. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, the occlusion order constraints on multiple frames are explored, which guarantee that the occlusion area increases with the temporal order in a short period and effectively maintains segmentation consistency over multiple consecutive frames. Then the correct layer segmentation is obtained by using a graph cuts algorithm, and the occlusions between the overlapping layers are explicitly determined. Several experimental results are demonstrated to show that our approach is effective and robust. Recovering the geometrical transformations among images of a scene is a prerequisite step for image-based view synthesis. I have developed a wide baseline matching algorithm to identify the correspondences between two un-calibrated images, and to further determine the geometric relationship between images, such as epipolar geometry or projective transformation. In our approach, a set of salient features, edge-corners, are detected to provide robust and consistent matching primitives. Then, based on the Singular Value Decomposition (SVD) of an affine matrix, we effectively quantize the search space into two independent subspaces for rotation angle and scaling factor, and then we use a two-stage affine matching algorithm to obtain robust matches between these two frames. The experimental results on a number of wide baseline images strongly demonstrate that our matching method outperforms the state-of-art algorithms even under the significant camera motion, illumination variation, occlusion, and self-similarity. Given the wide baseline matches among images I have developed a novel method for Dynamic view morphing. Dynamic view morphing deals with the scenes containing moving objects in presence of camera motion. The objects can be rigid or non-rigid, each of them can move in any orientation or direction. The proposed method can generate a series of continuous and physically accurate intermediate views from only two reference images without any knowledge about 3D. The procedure consists of three steps: segmentation, morphing and post-warping. Given a boundary connection constraint, the source and target scenes are segmented into several layers for morphing. Based on the decomposition of affine transformation between corresponding points, we uniquely determine a physically correct path for post-warping by the least distortion method. I have successfully generalized the dynamic scene synthesis problem from the simple scene with only rotation to the dynamic scene containing non-rigid objects. My method can handle dynamic rigid or non-rigid objects, including complicated objects such as humans. Finally, I have also developed a novel algorithm for tri-view morphing. This is an efficient image-based method to navigate a scene based on only three wide-baseline un-calibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images using our wide baseline matching method, an accurate trifocal plane is extracted from the trifocal tensor implied in these three images. Next, employing a trinocular-stereo algorithm and barycentric blending technique, we generate an arbitrary novel view to navigate the scene in a 2D space. Furthermore, after self-calibration of the cameras, a 3D model can also be correctly augmented into this virtual environment synthesized by the tri-view morphing algorithm. We have applied our view morphing framework to several interesting applications: 4D video synthesis, automatic target recognition, multi-view morphing