6 research outputs found

    Uncalibrated and Self-calibrated Cameras

    Get PDF

    Coherent spatial and temporal occlusion generation

    Full text link

    3D coarse-to-fine reconstruction from multiple image sequences.

    Get PDF
    Ip Che Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 119-127).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Previous Work --- p.2Chapter 1.2.1 --- Reconstruction for Architecture Scene --- p.2Chapter 1.2.2 --- Super-resolution --- p.4Chapter 1.2.3 --- Coarse-to-Fine Approach --- p.4Chapter 1.3 --- Proposed solution --- p.6Chapter 1.4 --- Contribution --- p.6Chapter 1.5 --- Publications --- p.7Chapter 1.6 --- Layout of the thesis --- p.7Chapter 2 --- Background Techniques --- p.8Chapter 2.1 --- Interest Point Detectors --- p.8Chapter 2.1.1 --- Scale-space --- p.9Chapter 2.1.2 --- Harris Corner detectors --- p.10Chapter 2.1.3 --- Other Kinds of Interest Point Detectors --- p.17Chapter 2.1.4 --- Summary --- p.18Chapter 2.2 --- Steerable filters --- p.19Chapter 2.2.1 --- Orientation estimation --- p.20Chapter 2.3 --- Point Descriptors --- p.22Chapter 2.3.1 --- Image derivatives under illumination change --- p.23Chapter 2.3.2 --- Image derivatives under geometric scale change --- p.24Chapter 2.3.3 --- An example of a point descriptor --- p.25Chapter 2.3.4 --- Other examples --- p.25Chapter 2.4 --- Feature Tracking Techniques --- p.26Chapter 2.4.1 --- Kanade-Lucas-Tomasi (KLT) Tracker --- p.26Chapter 2.4.2 --- Guided Tracking Algorithm --- p.28Chapter 2.5 --- RANSAC --- p.29Chapter 2.6 --- Structure-from-motion (SFM) Algorithm --- p.31Chapter 2.6.1 --- Factorization methods --- p.33Chapter 2.6.2 --- Epipolar Geometry --- p.39Chapter 2.6.3 --- Bundle Adjustment --- p.47Chapter 2.6.4 --- Summary --- p.50Chapter 3 --- Hierarchical Registration of 3D Models --- p.52Chapter 3.1 --- Overview --- p.53Chapter 3.1.1 --- The Arrangement of Image Sequences --- p.53Chapter 3.1.2 --- The Framework --- p.54Chapter 3.2 3 --- D Model Reconstruction for Each Sequence --- p.57Chapter 3.3 --- Multi-scale Image Matching --- p.59Chapter 3.3.1 --- Scale-space interest point detection --- p.61Chapter 3.3.2 --- Point descriptor --- p.61Chapter 3.3.3 --- Point-to-point matching --- p.63Chapter 3.3.4 --- Image transformation estimation --- p.64Chapter 3.3.5 --- Multi-level image matching --- p.66Chapter 3.4 --- Linkage Establishment --- p.68Chapter 3.5 --- 3D Model Registration --- p.70Chapter 3.6 --- VRML Modelling --- p.73Chapter 4 --- Experiment --- p.74Chapter 4.1 --- Synthetic Experiments --- p.74Chapter 4.1.1 --- Study on Rematching Algorithm --- p.74Chapter 4.1.2 --- Comparison between Affine and Metric transforma- tions for 3D Registration --- p.80Chapter 4.2 --- Real Scene Experiments --- p.86Chapter 5 --- Conclusion --- p.112Chapter 5.1 --- Future Work --- p.114Chapter A --- Camera Parameters --- p.116Chapter A.1 --- Intrinsic Parameters --- p.116Chapter A.2 --- Extrinsic Parameters --- p.117Bibliography --- p.12

    Video Stabilization Using SIFT Features, Fuzzy Clustering, and Kalman Filtering

    Get PDF
    Video stabilization removes unwanted motion from video sequences, often caused by vibrations or other instabilities. This improves video viewability and can aid in detection and tracking in computer vision algorithms. We have developed a digital video stabilization process using scale-invariant feature transform (SIFT) features for tracking motion between frames. These features provide information about location and orientation in each frame. The orientation information is generally not available with other features, so we employ this knowledge directly in motion estimation. We use a fuzzy clustering scheme to separate the SIFT features representing camera motion from those representing the motion of moving objects in the scene. Each frame\u27s translation and rotation is accumulated over time, and a Kalman filter is applied to estimate the desired motion. We provide experimental results from several video sequences using peak signal-to-noise ratio (PSNR) and qualitative analysis to demonstrate the results of each design decision we made in the development of this video stabilization method

    Stereo vision based on compressed feature correlation and graph cut

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2005.Includes bibliographical references (p. 131-145).This dissertation has developed a fast and robust algorithm to solve the dense correspondence problem with a good performance in untextured regions by merging Sparse Array Correlation from the computational fluids community into graph cut from the computer vision community. The proposed methodology consists of two independent modules. The first module is named Compressed Feature Correlation which is originated from Particle Image Velocimetry (PIV). The algorithm uses an image compression scheme that retains pixel values in high-intensity gradient areas while eliminating pixels with little correlation information in smooth surface regions resulting in a highly reduced image datasets. In addition, by utilizing an error correlation function, pixel comparisons are made through single integer calculations eliminating time consuming multiplication and floating point arithmetic. Unlike the traditional fixed window sorting scheme, adaptive correlation window positioning is implemented by dynamically placing strong features at the center of each correlation window. A confidence measure is developed to validate correlation outputs. The sparse depth map generated by this ultra-fast Compressed Feature Correlation may either serve as inputs to global methods or be interpolated into dense depth map when object boundaries are clearly defined. The second module enables a modified graph cut algorithm with an improved energy model that accepts prior information by fixing data energy penalties. The image pixels with known disparity values stabilize and speed up global optimization. As a result less iterations are necessary and sensitivity to parameters is reduced.(cont.) An efficient hybrid approach is implemented based on the above two modules. By coupling a simpler and much less expensive algorithm, Compressed Feature Correlation, with a more expensive algorithm, graph cut, the computational expense of the hybrid calculation is one third of performing the entire calculation using the more expensive of the two algorithms, while accuracy and robustness are improved at the same time. Qualitative and quantitative results on both simulated disparities and real stereo images are presented.by Sheng Sarah Tan.Ph.D

    Visual system identification: learning physical parameters and latent spaces from pixels

    Get PDF
    In this thesis, we develop machine learning systems that are able to leverage the knowledge of equations of motion (scene-specific or scene-agnostic) to perform object discovery, physical parameter estimation, position and velocity estimation, camera pose estimation, and learn structured latent spaces that satisfy physical dynamics rules. These systems are unsupervised, learning from unlabelled videos, and use as inductive biases the general equations of motion followed by objects of interest in the scene. This is an important task as in many complex real world environments ground-truth states are not available, although there is physical knowledge of the underlying system. Our goals with this approach, i.e. integration of physics knowledge with unsupervised learning models, are to improve vision-based prediction, enable new forms of control, increase data-efficiency and provide model interpretability, all of which are key areas of interest in machine learning. With the above goals in mind, we start by asking the following question: given a scene in which the objects’ motions are known up to some physical parameters (e.g. a ball bouncing off the floor with unknown restitution coefficient), how do we build a model that uses such knowledge to discover the objects in the scene and estimate these physical parameters? Our first model, PAIG (Physics-as-Inverse-Graphics), approaches this problem from a vision-as-inverse-graphics perspective, describing the visual scene as a composition of objects defined by their location and appearance, which are rendered onto the frame in a graphics manner. This is a known approach in the unsupervised learning literature, where the fundamental problem then becomes that of derendering, that is, inferring and discovering these locations and appearances for each object. In PAIG we introduce a key rendering component, the Coordinate-Consistent Decoder, which enables the integration of the known equations of motion with an inverse-graphics autoencoder architecture (trainable end-to-end), to perform simultaneous object discovery and physical parameter estimation. Although trained on simple simulated 2D scenes, we show that knowledge of the physical equations of motion of the objects in the scene can be used to greatly improve future prediction and provide physical scene interpretability. Our second model, V-SysId, tackles the limitations shown by the PAIG architecture, namely the training difficulty, the restriction to simulated 2D scenes, and the need for noiseless scenes without distractors. Here, we approach the problem from rst principles by asking the question: are neural networks a necessary component to solve this problem? Can we use simpler ideas from classical computer vision instead? With V- SysId, we approach the problem of object discovery and physical parameter estimation from a keypoint extraction, tracking and selection perspective, composed of 3 separate stages: proposal keypoint extraction and tracking, 3D equation tting and camera pose estimation from 2D trajectories, and entropy-based trajectory selection. Since all the stages use lightweight algorithms and optimisers, V-SysId is able to perform joint object discovery, physical parameter and camera pose estimation from even a single video, drastically improving data-efficiency. Additionally, due to the fact that it does not use a rendering/derendering approach, it can be used in real 3D scenes with many distractor objects. We show that this approach enables a number of interest applications, such as vision-based robot end-effector localisation and remote breath rate measurement. Finally, we move into the area of structured recurrent variational models from vision, where we are motivated by the following observation: in existing models, applying a force in the direction from a start point and an end point (in latent space), does not result in a movement from the start point towards the end point, even on the simplest unconstrained environments. This means that the latent space learned by these models does not follow Newton’s law, where the acceleration vector has the same direction as the force vector (in point-mass systems), and prevents the use of PID controllers, which are the simplest and most well understood type of controller. We solve this problem by building inductive biases from Newtonian physics into the latent variable model, which we call NewtonianVAE. Crucially, Newtonian correctness in the latent space brings about the ability to perform proportional (or PID) control, as opposed to the more computationally expensive model predictive control (MPC). PID controllers are ubiquitous in industrial applications, but had thus far lacked integration with unsupervised vision models. We show that the NewtonianVAE learns physically correct latent spaces in simulated 2D and 3D control systems, which can be used to perform goal-based discovery and control in imitation learning, and path following via Dynamic Motion Primitives
    corecore