    Exploiting Structural Regularities and Beyond: Vision-based Localization and Mapping in Man-Made Environments

    Image-based estimation of camera motion, known as visual odometry (VO), plays a very important role in many robotic applications such as control and navigation of unmanned mobile robots, especially when no external navigation reference signal is available. The core problem of VO is the estimation of the camera’s ego-motion (i.e. tracking) either between successive frames, namely relative pose estimation, or with respect to a global map, namely absolute pose estimation. This thesis aims to develop efficient, accurate and robust VO solutions by taking advantage of structural regularities in man-made environments, such as piece-wise planar structures, Manhattan World and more generally, contours and edges. Furthermore, to handle challenging scenarios that are beyond the limits of classical sensor based VO solutions, we investigate a recently emerging sensor — the event camera and study on event-based mapping — one of the key problems in the event-based VO/SLAM. The main achievements are summarized as follows. First, we revisit an old topic on relative pose estimation: accurately and robustly estimating the fundamental matrix given a collection of independently estimated homograhies. Three classical methods are reviewed and then we show a simple but nontrivial two-step normalization within the direct linear method that achieves similar performance to the less attractive and more computationally intensive hallucinated points based method. Second, an efficient 3D rotation estimation algorithm for depth cameras in piece-wise planar environments is presented. It shows that by using surface normal vectors as an input, planar modes in the corresponding density distribution function can be discovered and continuously tracked using efficient non-parametric estimation techniques. The relative rotation can be estimated by registering entire bundles of planar modes by using robust L1-norm minimization. Third, an efficient alternative to the iterative closest point algorithm for real-time tracking of modern depth cameras in ManhattanWorlds is developed. We exploit the common orthogonal structure of man-made environments in order to decouple the estimation of the rotation and the three degrees of freedom of the translation. The derived camera orientation is absolute and thus free of long-term drift, which in turn benefits the accuracy of the translation estimation as well. Fourth, we look into a more general structural regularity—edges. A real-time VO system that uses Canny edges is proposed for RGB-D cameras. Two novel alternatives to classical distance transforms are developed with great properties that significantly improve the classical Euclidean distance field based methods in terms of efficiency, accuracy and robustness. Finally, to deal with challenging scenarios that go beyond what standard RGB/RGB-D cameras can handle, we investigate the recently emerging event camera and focus on the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping

    Iterative Multi-Planar Camera Calibration: Improving stability using Model Selection

    Colloque avec actes et comité de lecture. internationale.International audienceTracking, or camera pose determination, is the main technical challenge in numerous applications in computer vision and especially in Augmented Reality. However, pose computation processes commonly exhibit some fluctuations and lack of precision in the estimation of the parameters. This leads to unpleasant visual impressions when augmented scenes are considered. In this paper, we propose an efficient and reliable method for real time camera tracking which avoid unpleasant statistical fluctuations. This method is based on the knowledge of a piecewise planar structure in the scene and makes use of model selection to reduce fluctuations. Videos are attached to this paper which prove the effectiveness of our approach

    Model-based camera tracking for augmented reality

    Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2014.Thesis (Master's) -- Bilkent University, 2014.Includes bibliographical references leaves 45-49.Augmented reality (AR) is the enhancement of real scenes with virtual entities. It is used to enhance user experience and interaction in various ways. Educational applications, architectural visualizations, military training scenarios and pure entertainment-based applications are often enhanced by augmented reality to provide more immersive and interactive experience for the users. With hand-held devices getting more powerful and cheap, such applications are becoming very popular. To provide natural AR experiences, extrinsic camera parameters (position and rotation) must be calculated in an accurate, robust and efficient way so that virtual entities can be overlaid onto the real environments correctly. Estimating extrinsic camera parameters in real-time is a challenging task. In most camera tracking frameworks, visual tracking serve as the main method for estimating the camera pose. In visual tracking systems, keypoint and edge features are often used for pose estimation. For rich-textured environments, keypoint-based methods work quite well and heavily used. Edge-based tracking, on the other hand, is more preferable when the environment is rich in geometry but has little or no visible texture. Pose estimation for edge based tracking systems generally depends on the control points that are assigned on the model edges. For accurate tracking, visibility of these control points must be determined in a correct manner. Control point visibility determination is computationally expensive process. We propose a method to reduce computational cost of the edge-based tracking by preprocessing the visibility information of the control points. For that purpose, we use persistent control points which are generated in the world space during preprocessing step. Additionally, we use more accurate adaptive projection algorithm for persistent control points to provide more uniform control point distribution in the screen space. We test our camera tracker in different environments to show the effectiveness and performance of the proposed algorithm. The preprocessed visibility information enables constant time calculations of control point visibility while preserving the accuracy of the tracker. We demonstrate a sample AR application with user interaction to present our AR framework, which is developed for a commercially available and widely used game engine.Aman, AytekM.S

    Trifocal Relative Pose from Lines at Points and its Efficient Solution

    We present a new minimal problem for relative pose estimation mixing point features with lines incident at points observed in three views and its efficient homotopy continuation solver. We demonstrate the generality of the approach by analyzing and solving an additional problem with mixed point and line correspondences in three views. The minimal problems include correspondences of (i) three points and one line and (ii) three points and two lines through two of the points which is reported and analyzed here for the first time. These are difficult to solve, as they have 216 and - as shown here - 312 solutions, but cover important practical situations when line and point features appear together, e.g., in urban scenes or when observing curves. We demonstrate that even such difficult problems can be solved robustly using a suitable homotopy continuation technique and we provide an implementation optimized for minimal problems that can be integrated into engineering applications. Our simulated and real experiments demonstrate our solvers in the camera geometry computation task in structure from motion. We show that new solvers allow for reconstructing challenging scenes where the standard two-view initialization of structure from motion fails.Comment: This material is based upon work supported by the National Science Foundation under Grant No. DMS-1439786 while most authors were in residence at Brown University's Institute for Computational and Experimental Research in Mathematics -- ICERM, in Providence, R

    Математичне моделювання проектування нових робочих органів

    The possibility to use real-time computer vision in video sequences gives many opportunities for a system to interact with the environment. Possible ways for interaction are e.g. augmented reality like in the MATRIS project where the purpose is to add new objects into the video sequence, or surveillance where the purpose is to find abnormal events. The increase of the speed of computers the last years has simplified this process and it is now possible to use at least some of the more advanced computer vision algorithms that are available. The computational speed of computers is however still a problem, for an efficient real-time system efficient code and methods are necessary. This thesis deals with both problems, one part is about efficient implementations using single instruction multiple data (SIMD) instructions and one part is about robust tracking. An efficient real-time system requires efficient implementations of the used computer vision methods. Efficient implementations requires knowledge about the CPU and the possibilities given. In this thesis, one method called SIMD is explained. SIMD is useful when the same operation is applied to multiple data which usually is the case in computer vision, the same operation is executed on each pixel. Following the position of a feature or object in a video sequence is called tracking. Tracking can be used for a number of applications. The application in this thesis is to use tracking for pose estimation. One way to do tracking is to cut out a small region around the feature, creating a patch and find the position on this patch in the other frames. To find the position, a measure of the difference between the patch and the image in a given position is used. This thesis thoroughly investigates the sum of absolute difference (SAD) error measure. The investigation involves different ways to improve the robustness and to decrease the average error. One method to estimate the average error, the covariance of the position error is proposed. An estimate of the average error is needed when different measurements are combined. Finally, a system for camera pose estimation is presented. The computer vision part of this system is based on the result in this thesis. This presentation contains also a discussion about the result of this system.Report code: LIU-TEK-LIC-2007:5. The report code in the thesis is incorrect.</p

    Low Power Depth Estimation of Rigid Objects for Time-of-Flight Imaging

    Depth sensing is useful in a variety of applications that range from augmented reality to robotics. Time-of-flight (TOF) cameras are appealing because they obtain dense depth measurements with minimal latency. However, for many battery-powered devices, the illumination source of a TOF camera is power hungry and can limit the battery life of the device. To address this issue, we present an algorithm that lowers the power for depth sensing by reducing the usage of the TOF camera and estimating depth maps using concurrently collected images. Our technique also adaptively controls the TOF camera and enables it when an accurate depth map cannot be estimated. To ensure that the overall system power for depth sensing is reduced, we design our algorithm to run on a low power embedded platform, where it outputs 640x480 depth maps at 30 frames per second. We evaluate our approach on several RGB-D datasets, where it produces depth maps with an overall mean relative error of 0.96% and reduces the usage of the TOF camera by 85%. When used with commercial TOF cameras, we estimate that our algorithm can lower the total power for depth sensing by up to 73%

    Planar PØP: feature-less pose estimation with applications in UAV localization

    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present a featureless pose estimation method that, in contrast to current Perspective-n-Point (PnP) approaches, it does not require n point correspondences to obtain the camera pose, allowing for pose estimation from natural shapes that do not necessarily have distinguished features like corners or intersecting edges. Instead of using n correspondences (e.g. extracted with a feature detector) we will use the raw polygonal representation of the observed shape and directly estimate the pose in the pose-space of the camera. This method compared with a general PnP method, does not require n point correspondences neither a priori knowledge of the object model (except the scale), which is registered with a picture taken from a known robot pose. Moreover, we achieve higher precision because all the information of the shape contour is used to minimize the area between the projected and the observed shape contours. To emphasize the non-use of n point correspondences between the projected template and observed contour shape, we call the method Planar PØP. The method is shown both in simulation and in a real application consisting on a UAV localization where comparisons with a precise ground-truth are provided.Peer ReviewedPostprint (author's final draft

    Event-based Vision: A Survey

    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world