Search CORE

7 research outputs found

Monocular SLAM with locally planar landmarks via geometric Rao-Blackwellized particle filtering on lie groups

Author: Junghyun Kwon
Kyoung Mu Lee
Publication venue
Publication date: 01/01/2010
Field of study

We propose a novel geometric Rao-Blackwellized particle filtering framework for monocular SLAM with locally planar landmarks. We represent the states for the camera pose and the landmark plane normal as SE(3) and SO(3), respectively, which are both Lie groups. The measurement error is also represented as another Lie group SL(3) corresponding to the space of homography matrices. We then formulate the unscented transformation on Lie groups for optimal importance sampling and landmark estimation via unscented Kalman filter. The feasibility of our framework is demonstrated via various experiments. 1

CiteSeerX

Crossref

실내 서비스로봇을 위한 전방 단안카메라 기반 SLAM 시스템

Author: 이태재
Publication venue: 서울대학교 대학원
Publication date: 01/08/2017
Field of study

학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 조동일.This dissertation presents a new forward-viewing monocular vision-based simultaneous localization and mapping (SLAM) method. The method is developed to be applicable in real-time on a low-cost embedded system for indoor service robots. The developed system utilizes a cost-effective mono-camera as a primary sensor and robot wheel encoders as well as a gyroscope as supplementary sensors. The proposed method is robust in various challenging indoor environments which contain low-textured areas, moving people, or changing environments. In this work, vanishing point (VP) and line features are utilized as landmarks for SLAM. The orientation of a robot is directly estimated using the direction of the VP. Then the estimation models for the robot position and the line landmark are derived as simple linear equations. Using these models, the camera poses and landmark positions are efficiently corrected by a novel local map correction method. To achieve high accuracy in a long-term exploration, a probabilistic loop detection procedure and a pose correction procedure are performed when the robot revisits the previously mapped areas. The performance of the proposed method is demonstrated under various challenging environments using dataset-based experiments using a desktop computer and real-time experiments using a low-cost embedded system. The experimental environments include a real home-like setting and a dedicated Vicon motion-tracking systems equipped space. These conditions contain low-textured areas, moving people, or changing environments. The proposed method is also tested using the RAWSEEDS benchmark dataset.Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Objectives 10 1.3 Contributions 11 1.4 Organization 12 Chapter 2 Previous works 13 Chapter 3 Methodology 17 3.1 System overview 17 3.2 Manhattan grid and system initialization 23 3.3 Vanishing point based robot orientation estimation 25 3.4 Line landmark position estimation 29 3.5 Camera position estimation 35 3.6 Local map correction 37 3.7 Loop closing 40 3.7.1 Extracting multiple BRIEF-Gist descriptors 40 3.7.2 Data structure for fast comparison 43 3.7.3 Bayesian filtering based loop detection 45 3.7.4 Global pose correction 47 Chapter 4 Experiments 49 4.1 Home environment dataset 51 4.2 Vicon dataset 60 4.3 Benchmark dataset in large scale indoor environment 74 4.4 Embedded real-time SLAM in home environment 79 Chapter 5 Conclusion 82 Appendix: performance evaluation of various loop detection methods in home environmnet 84 Reference 90Docto

SNU Open Repository and Archive

A Unified Hybrid Formulation for Visual SLAM

Author: Younes Georges
Publication venue: 'University of Waterloo'
Publication date: 13/01/2021
Field of study

Visual Simultaneous Localization and Mapping (Visual SLAM (VSLAM)), is the process of estimating the six degrees of freedom ego-motion of a camera, from its video feed, while simultaneously constructing a 3D model of the observed environment. Extensive research in the field for the past two decades has yielded real-time and efficient algorithms for VSLAM, allowing various interesting applications in augmented reality, cultural heritage, robotics and the automotive industry, to name a few. The underlying formula behind VSLAM is a mixture of image processing, geometry, graph theory, optimization and machine learning; the theoretical and practical development of these building blocks led to a wide variety of algorithms, each leveraging different assumptions to achieve superiority under the presumed conditions of operation. An exhaustive survey on the topic outlined seven main components in a generic VSLAM pipeline, namely: the matching paradigm, visual initialization, data association, pose estimation, topological/metric map generation, optimization, and global localization. Before claiming VSLAM a solved problem, numerous challenging subjects pertaining to robustness in each of the aforementioned components have to be addressed; namely: resilience to a wide variety of scenes (poorly textured or self repeating scenarios), resilience to dynamic changes (moving objects), and scalability for long-term operation (computational resources awareness and management). Furthermore, current state-of-the art VSLAM pipelines are tailored towards static, basic point cloud reconstructions, an impediment to perception applications such as path planning, obstacle avoidance and object tracking. To address these limitations, this work proposes a hybrid scene representation, where different sources of information extracted solely from the video feed are fused in a hybrid VSLAM system. The proposed pipeline allows for seamless integration of data from pixel-based intensity measurements and geometric entities to produce and make use of a coherent scene representation. The goal is threefold: 1) Increase camera tracking accuracy under challenging motions, 2) improve robustness to challenging poorly textured environments and varying illumination conditions, and 3) ensure scalability and long-term operation by efficiently maintaining a global reusable map representation

University of Waterloo's Institutional Repository

통합시스템을이용한 다시점스테레오 매칭과영상복원

Author: 박해솔
Publication venue: 서울대학교 대학원
Publication date: 01/02/2017
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 이경무.Estimating camera pose and scene structures from seriously degraded images is challenging problem. Most existing multi-view stereo algorithms assume high-quality input images and therefore have unreliable results for blurred, noisy, or low-resolution images. Experimental results show that the approach of using off-the-shelf image reconstruction algorithms as independent preprocessing is generally ineffective or even sometimes counterproductive. This is because naive frame-wise image reconstruction methods fundamentally ignore the consistency between images, although they seem to produce visually plausible results. In this thesis, from the fact that image reconstruction and multi-view stereo problems are interrelated, we present a unified framework to solve these problems jointly. The validity of this approach is empirically verified for four different problems, dense depth map reconstruction, camera pose estimation, super-resolution, and deblurring from images obtained by a single moving camera. By reflecting the physical imaging process, we cast our objective into a cost minimization problem, and solve the solution using alternate optimization techniques. Experiments show that the proposed method can restore high-quality depth maps from seriously degraded images for both synthetic and real video, as opposed to the failure of simple multi-view stereo methods. Our algorithm also produces superior super-resolution and deblurring results compared to simple preprocessing with conventional super-resolution and deblurring techniques. Moreover, we show that the proposed framework can be generalized to handle more common scenarios. First, it can solve image reconstruction and multi-view stereo problems for multi-view single-shot images captured by a light field camera. By using information of calibrated multi-view images, it recovers the motions of individual objects in the input image as well as the unknown camera motion during the shutter time. The contribution of this thesis is proposing a new perspective on the solution of the existing computer vision problems from an integrated viewpoint. We show that by solving interrelated problems jointly, we can obtain physically more plausible solution and better performance, especially when input images are challenging. The proposed optimization algorithm also makes our algorithm more practical in terms of computational complexity.1 Introduction 1 1.1 Outline of Dissertation 2 2 Background 5 3 Generalized Imaging Model 9 3.1 Camera Projection Model 9 3.2 Depth and Warping Operation 11 3.3 Representation of Camera Pose in SE(3) 12 3.4 Proposed Imaging Model 12 4 Rendering Synthetic Datasets 17 4.1 Making Blurred Image Sequences using Depth-based Image Rendering 18 4.2 Making Blurred Image Sequences using Blender 18 5 A Unified Framework for Single-shot Multi-view Images 21 5.1 Introduction 21 5.2 Related Works 24 5.3 Deblurring with 4D Light Fields 27 5.3.1 Motion Blur Formulation in Light Fields 27 5.3.2 Initialization 28 5.4 Joint Estimation 30 5.4.1 Energy Formulation 30 5.4.2 Update Latent Image 31 5.4.3 Update Camera Pose and Depth map 33 5.5 Experimental Results 34 5.5.1 Synthetic Data 34 5.5.2 Real Data 36 5.6 Conclusion 37 6 A Unified Framework for a Monocular Image Sequence 41 6.1 Introduction 41 6.2 Related Works 44 6.3 Modeling Imaging Process 46 6.4 Unified Energy Formulation 47 6.4.1 Matching term 47 6.4.2 Self-consistency term 48 6.4.3 Regularization term 49 6.5 Optimization 50 6.5.1 Update of the depth maps and camera poses 51 6.5.2 Update of the latent images . 52 6.5.3 Initialization 53 6.5.4 Occlusion Handling 54 6.6 Experimental Results 54 6.6.1 Synthetic datasets 55 6.6.2 Real datasets 61 6.6.3 The effect of parameters 65 6.7 Conclusion 66 7 A Unified Framework for SLAM 69 7.1 Motivation 69 7.2 Baseline 70 7.3 Proposed Method 72 7.4 Experimental Results 73 7.4.1 Quantitative comparison 73 7.4.2 Qualitative results 77 7.4.3 Runtime 79 7.5 Conclusion 80 8 Conclusion 83 8.1 Summary and Contribution of the Dissertation 83 8.2 Future Works 84 Bibliography 86 초록 94Docto

SNU Open Repository and Archive

움직이는 단일 카메라를 이용한 3차원 복원과 디블러링, 초해상도 복원의 동시적 수행 기법

Author: 이희석
Publication venue: 서울대학교 대학원
Publication date: 01/08/2013
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 8. 이경무.영상 기반 3차원 복원은 컴퓨터 비전의 기본적인 연구 주제 가운데 하나로 최근 몇 년간 많은 발전이 있어왔다. 특히 자동 로봇을 위한 네비게이션 및 휴대 기기를 이용한 증강 현실 등에 널리 활용될 수 있는 단일 카메라를 이용한 3차원 복원 기법은 복원의 정확도, 복원 가능 범위 및 처리 속도 측면에서 많은 실용 가능성을 보여주고 있다. 그러나 그 성능은 여전히 조심스레 촬영된 높은 품질의 입력 영상에 대해서만 시험되고 있다. 움직이는 단일 카메라를 이용한 3차원 복원의 실제 동작 환경에서는 입력 영상이 화소 잡음이나 움직임에 의한 번짐 등에 의하여 손상될 수 있고, 영상의 해상도 또한 정확한 카메라 위치 인식 및 3차원 복원을 위해서는 충분히 높지 않을 수 있다. 많은 연구에서 고성능 영상 화질 향상 기법들이 제안되어 왔지만 이들은 일반적으로 높은 계산 비용을 필요로 하기 때문에 실시간 동작 능력이 중요한 단일 카메라 기반 3차원 복원에 사용되기에는 부적합하다. 본 논문에서는 보다 정확하고 안정된 복원을 위하여 영상 개선이 결합된 새로운 단일 카메라 기반 3차원 복원 기법을 다룬다. 이를 위하여 영상 품질이 저하되는 중요한 두 요인인 움직임에 의한 영상 번짐과 낮은 해상도 문제가 각각 점 기반 복원 및 조밀 복원 기법들과 결합된다. 영상 품질 저하를 포함한 영상 획득 과정은 카메라 및 장면의 3차원 기하 구조와 관측된 영상 사이의 관계를 이용하여 모델링 할 수 있고, 이러한 영상 품질 저하 과정을 고려함으로써 정확한 3차원 복원을 하는 것이 가능해진다. 또한, 영상 번짐 제거를 위한 번짐 커널 또는 영상의 초해상도 복원을 위한 화소 대응 정보 등이 3차원 복원 과정과 동시에 얻어지는것이 가능하여, 영상 개선이 보다 간편하고 빠르게 수행될 수 있다. 제안되는 기법은 3차원 복원과 영상 개선 문제를 동시에 해결함으로써 각각의 결과가 상호 보완적으로 향상된다는 점에서 그 장점을 가지고 있다. 본 논문에서는 실험적 평가를 통하여 제안되는 3차원 복원 및 영상 개선의 효과성을 입증하도록 한다.Vision-based 3D reconstruction is one of the fundamental problems in computer vision, and it has been researched intensively significantly in the last decades. In particular, 3D reconstruction using a single camera, which has a wide range of applications such as autonomous robot navigation and augmented reality, shows great possibilities in its reconstruction accuracy, scale of reconstruction coverage, and computational efficiency. However, until recently, the performances of most algorithms have been tested only with carefully recorded, high quality input sequences. In practical situations, input images for 3D reconstruction can be severely distorted due to various factors such as pixel noise and motion blur, and the resolution of images may not be high enough to achieve accurate camera localization and scene reconstruction results. Although various high-performance image enhancement methods have been proposed in many studies, the high computational costs of those methods prevent applying them to the 3D reconstruction systems where the real-time capability is an important issue. In this dissertation, novel single camera-based 3D reconstruction methods that are combined with image enhancement methods is studied to improve the accuracy and reliability of 3D reconstruction. To this end, two critical image degradations, motion blur and low image resolution, are addressed for both sparse reconstruction and dense 3D reconstruction systems, and novel integrated enhancement methods for those degradations are presented. Using the relationship between the observed images and 3D geometry of the camera and scenes, the image formation process including image degradations is modeled by the camera and scene geometry. Then, by taking the image degradation factors in consideration, accurate 3D reconstruction then is achieved. Furthermore, the information required for image enhancement, such as blur kernels for deblurring and pixel correspondences for super-resolution, is simultaneously obtained while reconstructing 3D scene, and this makes the image enhancement much simpler and faster. The proposed methods have an advantage that the results of 3D reconstruction and image enhancement are improved by each other with the simultaneous solution of these problems. Experimental evaluations demonstrate the effectiveness of the proposed 3D reconstruction and image enhancement methods.1. Introduction 2. Sparse 3D Reconstruction and Image Deblurring 3. Sparse 3D Reconstruction and Image Super-Resolution 4. Dense 3D Reconstruction and Image Deblurring 5. Dense 3D Reconstruction and Image Super-Resolution 6. Dense 3D Reconstruction, Image Deblurring, and Super-Resolution 7. ConclusionDocto

SNU Open Repository and Archive

Semi-dense filter-based visual odometry for automotive augmented reality applications

Author: Schmid Stephan
Publication venue
Publication date: 01/01/2019
Field of study

In order to integrate virtual objects convincingly into a real scene, Augmented Reality (AR) systems typically need to solve two problems: Firstly, the movement and position of the AR system within the environment needs to be known to be able to compensate the motion of the AR system in order to make placement of the virtual objects stable relative to the real world and to provide overall correct placement of virtual objects. Secondly, an AR system needs to have a notion of the geometry of the real environment to be able to properly integrate virtual objects into the real scene via techniques such as the determination of the occlusion relation between real and virtual objects or context-aware positioning of virtual content. To solve the second problem, the following two approaches have emerged: A simple solution is to create a map of the real scene a priori by whatever means and to then use this map in real-time operation of the AR system. A more challenging, but also more flexible solution is to create a map of the environment dynamically from real time data of sensors of the AR-system. Our target applications are Augmented Reality in-car infotainment systems in which a video of a forward facing camera is augmented. Using map data to determine the geometry of the environment of the vehicle is limited by the fact that currently available digital maps only provide a rather coarse and abstract picture of the world. Furthermore, map coverage and amount of detail vary greatly regionally and between different maps. Hence, the objective of the presented thesis is to obtain the geometry of the environment in real time from vehicle sensors. More specifically, the aim is to obtain the scene geometry by triangulating it from the camera images at different camera positions (i.e. stereo computation) while the vehicle moves. The problem of estimating geometry from camera images where the camera positions are not (exactly) known is investigated in the (overlapping) fields of visual odometry (VO) and structure from motion (SfM). Since Augmented Reality applications have tight latency requirements, it is necessary to obtain an estimate of the current scene geometry for each frame of the video stream without delay. Furthermore, Augmented Reality applications need detailed information about the scene geometry, which means dense (or semi-dense) depth estimation, that is one depth estimate per pixel. The capability of low-latency geometry estimation is currently only found in filter based VO methods, which model the depth estimates of the pixels as the state vector of a probabilistic filter (e.g. Kalman filter). However, such filters maintain a covariance matrix for the uncertainty of the pixel depth estimates whose complexity is quadratic in the number of estimated pixel depths, which causes infeasible complexity for dense depth estimation. To resolve this conflict, the (full) covariance matrix will be replaced by a matrix requiring only linear complexity in processing and storage. This way, filter-based VO methods can be combined with dense estimation techniques and efficiently scaled up to arbitrarily large image sizes while allowing easy parallelization. For treating the covariance matrix of the filter state, two methods are introduced and discussed. These methods are implemented as modifications to the (existing) VO method LSD-SLAM, yielding the "continuous" variant C-LSD-SLAM. In the first method, a diagonal matrix is used as the covariance matrix. In particular, the correlation between different scene point estimates is neglected. For stabilizing the resulting VO method in forward motion, a reweighting scheme is introduced based on how far scene point estimates are moved when reprojecting them from one frame to the next frame. This way, erroneous scene point estimates are prevented from causing the VO method to diverge. The second method for treating the covariance matrix models the correlation of the scene point estimates caused by camera pose uncertainty by approximating the combined influence of all camera pose estimates in a small subspace of the scene point estimates. This subspace has fixed dimension 15, which forces the complexity of the replacement of the covariance matrix to be linear in the number of scene point estimates