197 research outputs found

    Synchronization Problems in Computer Vision

    Get PDF
    The goal of \u201csynchronization\u201d is to infer the unknown states of a network of nodes, where only the ratio (or difference) between pairs of states can be measured. Typically, states are represented by elements of a group, such as the Symmetric Group or the Special Euclidean Group. The former can represent local labels of a set of features, which refer to the multi-view matching application, whereas the latter can represent camera reference frames, in which case we are in the context of structure from motion, or local coordinates where 3D points are represented, in which case we are dealing with multiple point-set registration. A related problem is that of \u201cbearing-based network localization\u201d where each node is located at a fixed (unknown) position in 3-space and pairs of nodes can measure the direction of the line joining their locations. In this thesis we are interested in global techniques where all the measures are considered at once, as opposed to incremental approaches that grow a solution by adding pieces iteratively

    Computer Vision without Vision : Methods and Applications of Radio and Audio Based SLAM

    Get PDF
    The central problem of this thesis is estimating receiver-sender node positions from measured receiver-sender distances or equivalent measurements. This problem arises in many applications such as microphone array calibration, radio antenna array calibration, mapping and positioning using ultra-wideband and mapping and positioning using round-trip-time measurements between mobile phones and Wi-Fi-units. Previous research has explored some of these problems, creating minimal solvers for instance, but these solutions lack real world implementation. Due to the nature of using different media, finding reliable receiver-sender distances is tough, with many of the measurements being erroneous or to a worse extent missing. Therefore in this thesis, we explore using minimal solvers to create robust solutions, that encompass small erroneous measurements and work around missing and grossly erroneous measurements.This thesis focuses mainly on Time-of-Arrival measurements using radio technologies such as Two-way-Ranging in Ultra-Wideband and a new IEEE standard 802.11mc found on many WiFi modules. The methods investigated, also related to Computer Vision problems such as Stucture-from-Motion. As part of this thesis, a range of new commercial radio technologies are characterised in terms of ranging in real world enviroments. In doing so, we have shown how these technologies can be used as a more accurate alternative to the Global Positioning System in indoor enviroments. Further to these solutions, more methods are proposed for large scale problems when multiple users will collect the data, commonly known as Big Data. For these cases, more data is not always better, so a method is proposed to try find the relevant data to calibrate large systems

    Towards Efficient 3D Reconstructions from High-Resolution Satellite Imagery

    Get PDF
    Recent years have witnessed the rapid growth of commercial satellite imagery. Compared with other imaging products, such as aerial or streetview imagery, modern satellite images are captured at high resolution and with multiple spectral bands, thus provide unique viewing angles, global coverage, and frequent updates of the Earth surfaces. With automated processing and intelligent analysis algorithms, satellite images can enable global-scale 3D modeling applications. This dissertation explores computer vision algorithms to reconstruct 3D models from satellite images at different levels: geometric, semantic, and parametric reconstructions. However, reconstructing satellite imagery is particularly challenging for the following reasons: 1) Satellite images typically contain an enormous amount of raw pixels. Efficient algorithms are needed to minimize the substantial computational burden. 2) The ground sampling distances of satellite images are comparatively low. Visual entities, such as buildings, appear visually small and cluttered, thus posing difficulties for 3D modeling. 3) Satellite images usually have complex camera models and inaccurate vendor-provided camera calibrations. Rational polynomial coefficients (RPC) camera models, although widely used, need to be appropriately handled to ensure high-quality reconstructions. To obtain geometric reconstructions efficiently, we propose an edge-aware interpolation-based algorithm to obtain 3D point clouds from satellite image pairs. Initial 2D pixel matches are first established and triangulated to compensate the RPC calibration errors. Noisy dense correspondences can then be estimated by interpolating the inlier matches in an edge-aware manner. After refining the correspondence map with a fast bilateral solver, we can obtain dense 3D point clouds via triangulation. Pixel-wise semantic classification results for satellite images are usually noisy due to the negligence of spatial neighborhood information. Thus, we propose to aggregate multiple corresponding observations of the same 3D point to obtain high-quality semantic models. Instead of just leveraging geometric reconstructions to provide such correspondences, we formulate geometric modeling and semantic reasoning in a joint Markov Random Field (MRF) model. Our experiments show that both tasks can benefit from the joint inference. Finally, we propose a novel deep learning based approach to perform single-view parametric reconstructions from satellite imagery. By parametrizing buildings as 3D cuboids, our method simultaneously localizes building instances visible in the image and estimates their corresponding cuboid models. Aerial LiDAR and vectorized GIS maps are utilized as supervision. Our network upsamples CNN features to detect small but cluttered building instances. In addition, we estimate building contours through a separate fully convolutional network to avoid overlapping building cuboids.Doctor of Philosoph

    Novel Camera Architectures for Localization and Mapping on Intelligent Mobile Platforms

    Get PDF
    Self-localization and environment mapping play a very important role in many robotics application such as autonomous driving and mixed reality consumer products. Although the most powerful solutions rely on a multitude of sensors including lidars and camera, the community maintains a high interest in developing cost-effective, purely vision-based localization and mapping approaches. The core problem of standard vision-only solutions is accuracy and robustness, especially in challenging visual conditions. The thesis aims to introduce new solutions to localization and mapping problems on intelligent mobile devices by taking advantages of novel camera architectures. The thesis investigates on using surround-view multi-camera systems, which combine the benefits of omni-directional measurements with a sufficient baseline for producing measurements in metric scale, and event cameras, that perform well under challenging illumination conditions and have high temporal resolutions. The thesis starts by looking into the motion estimation framework with multi-perspective camera systems. The framework could be divided into two sub-parts, a front-end module that initializes motion and estimates absolute pose after bootstrapping, and a back-end module that refines the estimate over a larger-scale sequence. First, the thesis proposes a complete real-time pipeline for visual odometry with non-overlapping, multi-perspective camera systems, and in particular presents a solution to the scale initialization problem, in order to solve the unobservability of metric scale under degenerate cases with such systems. Second, the thesis focuses on the further improvement of front-end relative pose estimation for vehicle-mounted surround-view multi-camera systems. It presents a new, reliable solution able to handle all kinds of relative displacements in the plane despite the possibly non-holonomic characteristics, and furthermore introduces a novel two-view optimization scheme which minimizes a geometrically relevant error without relying on 3D points related optimization variables. Third, the thesis explores the continues-time parametrization for exact modelling of non-holonomic ground vehicle trajectories in the back-end optimization of visual SLAM pipeline. It demonstrates the use of B-splines for an exact imposition of smooth, non-holonomic trajectories inside the 6 DoF bundle adjustment, and show that a significant improvement in robustness and accuracy in degrading visual conditions can be achieved. In order to deal with challenges in scenarios with high dynamics, low texture distinctiveness, or challenging illumination conditions, the thesis focuses on the solution to localization and mapping problem on Autonomous Ground Vehicle(AGV) using event cameras. Inspired by the time-continuous parametrizations of image warping functions introduced by previous works, the thesis proposes two new algorithms to tackle several motion estimation problems by performing contrast maximization approach. It firstly looks at the fronto-parallel motion estimation of an event camera, in stark contrast to the prior art, a globally optimal solution to this motion estimation problem is derived by using a branch-and-bound optimization scheme. Then, the thesis introduces a new solution to handle the localization and mapping problem of single event camera by continuous ray warping and volumetric contrast maximization, which can perform joint optimization over motion and structure for cameras exerting both translational and rotational displacements in an arbitrarily structured environment. The present thesis thus makes important contributions on both front-end and back-end of SLAM pipelines based on novel, promising camera architectures

    Local Accuracy and Global Consistency for Efficient SLAM

    Get PDF
    This thesis is concerned with the problem of Simultaneous Localisation and Mapping (SLAM) using visual data only. Given the video stream of a moving camera, we wish to estimate the structure of the environment and the motion of the device most accurately and in real-time. Two effective approaches were presented in the past. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods rely on the optimisation approach of bundle adjustment, but computationally must select only a small number of past frames to process. We perform a rigorous comparison between the two approaches for visual SLAM. Especially, we show that accuracy comes from a large number of points, while the number of intermediate frames only has a minor impact. We conclude that keyframe bundle adjustment is superior to ltering due to a smaller computational cost. Based on these experimental results, we develop an efficient framework for large-scale visual SLAM using the keyframe strategy. We demonstrate that SLAM using a single camera does not only drift in rotation and translation, but also in scale. In particular, we perform large-scale loop closure correction using a novel variant of pose-graph optimisation which also takes scale drift into account. Starting from this two stage approach which tackles local motion estimation and loop closures separately, we develop a unified framework for real-time visual SLAM. By employing a novel double window scheme, we present a constant-time approach which enables the local accuracy of bundle adjustment while ensuring global consistency. Furthermore, we suggest a new scheme for local registration using metric loop closures and present several improvements for the visual front-end of SLAM. Our contributions are evaluated exhaustively on a number of synthetic experiments and real-image data-set from single cameras and range imaging devices

    Local Accuracy and Global Consistency for Efficient SLAM

    No full text
    This thesis is concerned with the problem of Simultaneous Localisation and Mapping (SLAM) using visual data only. Given the video stream of a moving camera, we wish to estimate the structure of the environment and the motion of the device most accurately and in real-time. Two effective approaches were presented in the past. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods rely on the optimisation approach of bundle adjustment, but computationally must select only a small number of past frames to process. We perform a rigorous comparison between the two approaches for visual SLAM. Especially, we show that accuracy comes from a large number of points, while the number of intermediate frames only has a minor impact. We conclude that keyframe bundle adjustment is superior to ltering due to a smaller computational cost. Based on these experimental results, we develop an efficient framework for large-scale visual SLAM using the keyframe strategy. We demonstrate that SLAM using a single camera does not only drift in rotation and translation, but also in scale. In particular, we perform large-scale loop closure correction using a novel variant of pose-graph optimisation which also takes scale drift into account. Starting from this two stage approach which tackles local motion estimation and loop closures separately, we develop a unified framework for real-time visual SLAM. By employing a novel double window scheme, we present a constant-time approach which enables the local accuracy of bundle adjustment while ensuring global consistency. Furthermore, we suggest a new scheme for local registration using metric loop closures and present several improvements for the visual front-end of SLAM. Our contributions are evaluated exhaustively on a number of synthetic experiments and real-image data-set from single cameras and range imaging devices
    • …
    corecore