398 research outputs found

    Implementation and experimental verification of a 3D-to-2D Visual Odometry Algorithm for real-time measurements of a vehicle pose

    Get PDF
    This work describes the implementation of a software for real-time vehicle pose reconstruction using stereo visual odometry algorithms. We firstly guess motion by a linear 2D-to-3D method solving a PnP problem embedded within a RANSAC process to remove outliers. Then, a maximum likelihood motion estimation is performed minimizing a non-linear problem. GPU implementations of feature extraction and matching algorithms are used. We demonstrate an accuracy of better than 1% over 1350 mm of travelope

    Learning and Searching Methods for Robust, Real-Time Visual Odometry.

    Full text link
    Accurate position estimation provides a critical foundation for mobile robot perception and control. While well-studied, it remains difficult to provide timely, precise, and robust position estimates for applications that operate in uncontrolled environments, such as robotic exploration and autonomous driving. Continuous, high-rate egomotion estimation is possible using cameras and Visual Odometry (VO), which tracks the movement of sparse scene content known as image keypoints or features. However, high update rates, often 30~Hz or greater, leave little computation time per frame, while variability in scene content stresses robustness. Due to these challenges, implementing an accurate and robust visual odometry system remains difficult. This thesis investigates fundamental improvements throughout all stages of a visual odometry system, and has three primary contributions: The first contribution is a machine learning method for feature detector design. This method considers end-to-end motion estimation accuracy during learning. Consequently, accuracy and robustness are improved across multiple challenging datasets in comparison to state of the art alternatives. The second contribution is a proposed feature descriptor, TailoredBRIEF, that builds upon recent advances in the field in fast, low-memory descriptor extraction and matching. TailoredBRIEF is an in-situ descriptor learning method that improves feature matching accuracy by efficiently customizing descriptor structures on a per-feature basis. Further, a common asymmetry in vision system design between reference and query images is described and exploited, enabling approaches that would otherwise exceed runtime constraints. The final contribution is a new algorithm for visual motion estimation: Perspective Alignment Search~(PAS). Many vision systems depend on the unique appearance of features during matching, despite a large quantity of non-unique features in otherwise barren environments. A search-based method, PAS, is proposed to employ features that lack unique appearance through descriptorless matching. This method simplifies visual odometry pipelines, defining one method that subsumes feature matching, outlier rejection, and motion estimation. Throughout this work, evaluations of the proposed methods and systems are carried out on ground-truth datasets, often generated with custom experimental platforms in challenging environments. Particular focus is placed on preserving runtimes compatible with real-time operation, as is necessary for deployment in the field.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113365/1/chardson_1.pd

    Visual SLAM muuttuvissa ympäristöissä

    Get PDF
    This thesis investigates the problem of Visual Simultaneous Localization and Mapping (vSLAM) in changing environments. The vSLAM problem is to sequentially estimate the pose of a device with mounted cameras in a map generated based on images taken with those cameras. vSLAM algorithms face two main challenges in changing environments: moving objects and temporal appearance changes. Moving objects cause problems in pose estimation if they are mistaken for static objects. Moving objects also cause problems for loop closure detection (LCD), which is the problem of detecting whether a previously visited place has been revisited. A same moving object observed in two different places may cause false loop closures to be detected. Temporal appearance changes such as those brought about by time of day or weather changes cause long-term data association errors for LCD. These cause difficulties in recognizing previously visited places after they have undergone appearance changes. Focus is placed on LCD, which turns out to be the part of vSLAM that changing environment affects the most. In addition, several techniques and algorithms for Visual Place Recognition (VPR) in challenging conditions that could be used in the context of LCD are surveyed and the performance of two state-of-the-art modern VPR algorithms in changing environments is assessed in an experiment in order to measure their applicability for LCD. The most severe performance degrading appearance changes are found to be those caused by change in season and illumination. Several algorithms and techniques that perform well in loop closure related tasks in specific environmental conditions are identified as a result of the survey. Finally, a limited experiment on the Nordland dataset implies that the tested VPR algorithms are usable as is or can be modified for use in long-term LCD. As a part of the experiment, a new simple neighborhood consistency check was also developed, evaluated, and found to be effective at reducing false positives output by the tested VPR algorithms

    Virtual Testbed for Monocular Visual Navigation of Small Unmanned Aircraft Systems

    Full text link
    Monocular visual navigation methods have seen significant advances in the last decade, recently producing several real-time solutions for autonomously navigating small unmanned aircraft systems without relying on GPS. This is critical for military operations which may involve environments where GPS signals are degraded or denied. However, testing and comparing visual navigation algorithms remains a challenge since visual data is expensive to gather. Conducting flight tests in a virtual environment is an attractive solution prior to committing to outdoor testing. This work presents a virtual testbed for conducting simulated flight tests over real-world terrain and analyzing the real-time performance of visual navigation algorithms at 31 Hz. This tool was created to ultimately find a visual odometry algorithm appropriate for further GPS-denied navigation research on fixed-wing aircraft, even though all of the algorithms were designed for other modalities. This testbed was used to evaluate three current state-of-the-art, open-source monocular visual odometry algorithms on a fixed-wing platform: Direct Sparse Odometry, Semi-Direct Visual Odometry, and ORB-SLAM2 (with loop closures disabled)

    Development of a Visual Odometry System as a Location Aid for Self-Driving Cars

    Get PDF
    Conocer la posición exacta que ocupa un robot y la trayectoria que describe es esencial en el ámbito de la automoción. Durante años se han desarrollado distintos sensores y técnicas para este cometido que se estudian a lo largo del trabajo. En este proyecto se utilizan dos cámaras a bordo del vehículo como sensores de percepción del entorno. Se propone un algoritmo basado únicamente en odometría visual, es decir, analizando la secuencia de imágenes captadas por las cámaras, sin conocimiento previo del entorno y sin el uso de otros sensores, se pretende obtener una estimación real de la posición y orientación del vehículo. Dicha propuesta se ha validado en el dataset de KITTI y se ha comparado con otras técnicas de odometría visual existentes en el estado del arteKnowing the exact position occupied by a robot and the trajectory it describes is essential in the automotive field. Some techniques and sensors have been developed over the years for this purpose which are studied in this work. In this project, two cameras on board the vehicle are used as sensors for the environment perception. The proposed algorithm is based only on visual odometry, it means, using the sequence of images captured by the cameras, without prior knowledge of the environment and without the use of other sensors. The aim is to obtain a real estimation of the position and orientation of the vehicle. This proposal has been validated on the KITTI benchmark and compared with other Visual Odometry techniques existing in the state of the art.Grado en Ingeniería en Electrónica y Automática Industria

    Large-Scale Textured 3D Scene Reconstruction

    Get PDF
    Die Erstellung dreidimensionaler Umgebungsmodelle ist eine fundamentale Aufgabe im Bereich des maschinellen Sehens. Rekonstruktionen sind für eine Reihe von Anwendungen von Nutzen, wie bei der Vermessung, dem Erhalt von Kulturgütern oder der Erstellung virtueller Welten in der Unterhaltungsindustrie. Im Bereich des automatischen Fahrens helfen sie bei der Bewältigung einer Vielzahl an Herausforderungen. Dazu gehören Lokalisierung, das Annotieren großer Datensätze oder die vollautomatische Erstellung von Simulationsszenarien. Die Herausforderung bei der 3D Rekonstruktion ist die gemeinsame Schätzung von Sensorposen und einem Umgebunsmodell. Redundante und potenziell fehlerbehaftete Messungen verschiedener Sensoren müssen in eine gemeinsame Repräsentation der Welt integriert werden, um ein metrisch und photometrisch korrektes Modell zu erhalten. Gleichzeitig muss die Methode effizient Ressourcen nutzen, um Laufzeiten zu erreichen, welche die praktische Nutzung ermöglichen. In dieser Arbeit stellen wir ein Verfahren zur Rekonstruktion vor, das fähig ist, photorealistische 3D Rekonstruktionen großer Areale zu erstellen, die sich über mehrere Kilometer erstrecken. Entfernungsmessungen aus Laserscannern und Stereokamerasystemen werden zusammen mit Hilfe eines volumetrischen Rekonstruktionsverfahrens fusioniert. Ringschlüsse werden erkannt und als zusätzliche Bedingungen eingebracht, um eine global konsistente Karte zu erhalten. Das resultierende Gitternetz wird aus Kamerabildern texturiert, wobei die einzelnen Beobachtungen mit ihrer Güte gewichtet werden. Für eine nahtlose Erscheinung werden die unbekannten Belichtungszeiten und Parameter des optischen Systems mitgeschätzt und die Bilder entsprechend korrigiert. Wir evaluieren unsere Methode auf synthetischen Daten, realen Sensordaten unseres Versuchsfahrzeugs und öffentlich verfügbaren Datensätzen. Wir zeigen qualitative Ergebnisse großer innerstädtischer Bereiche, sowie quantitative Auswertungen der Fahrzeugtrajektorie und der Rekonstruktionsqualität. Zuletzt präsentieren wir mehrere Anwendungen und zeigen somit den Nutzen unserer Methode für Anwendungen im Bereich des automatischen Fahrens

    Stereo visual simultaneous localisation and mapping for an outdoor wheeled robot: a front-end study

    Get PDF
    For many mobile robotic systems, navigating an environment is a crucial step in autonomy and Visual Simultaneous Localisation and Mapping (vSLAM) has seen increased effective usage in this capacity. However, vSLAM is strongly dependent on the context in which it is applied, often using heuristic and special cases to provide efficiency and robustness. It is thus crucial to identify the important parameters and factors regarding a particular context as this heavily influences the necessary algorithms, processes, and hardware required for the best results. In this body of work, a generic front-end stereo vSLAM pipeline is tested in the context of a small-scale outdoor wheeled robot that occupies less than 1m3 of volume. The scale of the vehicle constrained the available processing power, Field Of View (FOV), actuation systems, and image distortions present. A dataset was collected with a custom platform that consisted of a Point Grey Bumblebee (Discontinued) stereo camera and Nvidia Jetson TK1 processor. A stereo front-end feature tracking framework was described and evaluated both in simulation and experimentally where appropriate. It was found that scale adversely affected lighting conditions, FOV, baseline, and processing power available, all crucial factors to improve upon. The stereo constraint was effective for robustness criteria, but ineffective in terms of processing power and metric reconstruction. An overall absolute odometer error of 0.25-3m was produced on the dataset but was unable to run in real-time

    딥러닝에 기초한 효과적인 Visual Odometry 개선 방법

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 이범희.Understanding the three-dimensional environment is one of the most important issues in robotics and computer vision. For this purpose, sensors such as a lidar, a ultrasound, infrared devices, an inertial measurement unit (IMU) and cameras are used, individually or simultaneously, through sensor fusion. Among these sensors, in recent years, researches for use of visual sensors, which can obtain a lot of information at a low price, have been actively underway. Understanding of the 3D environment using cameras includes depth restoration, optical/scene flow estimation, and visual odometry (VO). Among them, VO estimates location of a camera and maps the surrounding environment, while a camera-equipped robot or person travels. This technology must be preceded by other tasks such as path planning and collision avoidance. Also, it can be applied to practical applications such as autonomous driving, augmented reality (AR), unmanned aerial vehicle (UAV) control, and 3D modeling. So far, researches on various VO algorithms have been proposed. Initial VO researches were conducted by filtering poses of robot and map features. Because of the disadvantage of the amount of computation being too large and errors are accumulated, a method using a keyframe was studied. Traditional VO can be divided into a feature-based method and a direct method. Methods using features obtain pose transformation between two images through feature extraction and matching. Direct methods directly compare the intensity of image pixels to obtain poses that minimize the sum of photometric errors. Recently, due to the development of deep learning skills, many studies have been conducted to apply deep learning to VO. Deep learning-based VO, like other fields using deep learning with images, first extracts convolutional neural network (CNN) features and calculates pose transformation between images. Deep learning-based VO can be divided into supervised learning-based and unsupervised learning-based. For VO, using supervised learning, a neural network is trained using ground truth poses, and the unsupervised learning-based method learns poses using only image sequences without given ground truth values. While existing research papers show decent performance, the image datasets used in these studies are all composed of high quality and clear images obtained using expensive cameras. There are also algorithms that can be operated only if non-image information such as exposure time, nonlinear response functions, and camera parameters is provided. In order for VO to be more widely applied to real-world application problems, odometry estimation should be performed even if the datasets are incomplete. Therefore, in this dissertation, two methods are proposed to improve VO performance using deep learning. First, I adopt a super-resolution (SR) technique to improve the performance of VO using images with low-resolution and noises. The existing SR techniques have mainly focused on increasing image resolution rather than execution time. However, a real-time property is very important for VO. Therefore, the SR network should be designed considering the execution time, resolution increment, and noise reduction in this case. Conducting a VO after passing through this SR network, a higher performance VO can be carried out, than using original images. Experimental results using the TUM dataset show that the proposed method outperforms the conventional VO and other SR methods. Second, I propose a fully unsupervised learning-based VO that performs odometry estimation, single-view depth estimation, and camera intrinsic parameter estimation simultaneously using a dataset consisting only of image sequences. In the existing unsupervised learning-based VO, algorithms were performed using the images and intrinsic parameters of the camera. Based on existing the technique, I propose a method for additionally estimating camera parameters from the deep intrinsic network. Intrinsic parameters are estimated by two assumptions using the properties of camera parameters in an intrinsic network. Experiments using the KITTI dataset show that the results are comparable to those of the conventional method.3차원 환경에 대한 이해는 로보틱스와 컴퓨터 비전 분야에서 굉장히 중요한 문제 중 하나이다. 이를 위해 라이다, 초음파, 적외선, inertial measurement unit (IMU), 카메라 등의 센서가 개별적으로 또는 센서 융합을 통해 여러 센서가 동시에 사용되기도 한다. 이 중에서도 최근에는 상대적으로 저렴한 가격에 많은 정보를 얻을 수 있는 카메라를 이용한 연구가 활발히 진행되고 있다. 카메라를 이용한 3차원 환경 인지는 깊이 복원, optical/scene flow 추정, visual odometry (VO) 등이 있다. 이 중 VO는 카메라를 장착한 로봇 혹은 사람이 이동하며 자신의 위치를 파악하고 주변 환경의 지도를 작성하는 기술이다. 이 기술은 경로 설정, 충돌 회피 등 다른 임무를 수행하기 전에 필수적으로 선행되어야 하며 자율 주행, AR, UAV contron, 3D modelling 등 실제 응용 문제에 적용될 수 있다. 현재 다양한 VO 알고리즘에 대한 논문이 제안되었다. 초기 VO 연구는 feature를 이용하여 feature와 로봇의 pose를 필터링 하는 방식으로 진행되었다. 필터를 이용한 방법은 계산량이 너무 많고 오차가 누적된다는 단점 때문에 keyframe을 이용하는 방법이 연구되었다. 이 방식으로 feature를 이용하는 방식과 픽셀의 intensity를 직접 사용하는 direct 방식이 연구되었다. feature를 이용하는 방법들은 feature의 추출과 매칭을 이용하여 두 이미지 사이의 pose 변화를 구하며 direct 방법들은 이미지 픽셀의 intensity를 직접 비교하여 photometric error를 최소화 시키는 pose를 구하는 방식이다. 최근에는 deep learning 알고리즘의 발달로 인해 VO에도 deep learning을 적용시키는 연구가 많이 진행되고 있다. Deep learning-based VO는 이미지를 이용한 다른 분야와 같이 기본적으로 CNN을 이용하여 feature를 추출한 뒤 이미지 사이의 pose 변화를 계산한다. 이는 다시 supervised learning을 이용한 방식과 unsupervised learning을 이용한 방법으로 나눌 수 있다. supervised learning을 이용한 VO는 pose의 참값을 사용하여 학습을 시키며, unsupervised learning을 이용하는 방법은 주어지는 참값 없이 이미지의 정보만을 이용하여 pose를 학습시키는 방식이다. 기존 VO 논문들은 좋은 성능을 보였지만 연구에 사용된 이미지 dataset들은 모두 고가의 카메라를 이용하여 얻어진 고화질의 선명한 이미지들로 구성되어 있다. 또한 노출 시간, 비선형 반응 함수, 카메라 파라미터 등의 이미지 외적인 정보를 이용해야만 알고리즘의 동작이 가능하다. VO가 실제 응용 문제에 더 널리 적용되기 위해서는 dataset이 불완전할 경우에도 odometry 추정이 잘 이루어져야 한다. 이에 본 논문에서는 deep learning을 이용하여 VO의 성능을 높이는 두 가지 방법을 제안하였다. 첫 번째로는 super-resolution (SR) 기법으로 저해상도, 노이즈가 포함된 이미지를 이용한 VO의 성능을 높이는 방법을 제안한다. 기존의 SR 기법은 수행 시간보다는 이미지의 해상도를 향상시키는 방법에 주로 집중하였다. 하지만 VO 수행에 있어서는 실시간성이 굉장히 중요하다. 따라서 수행 시간을 고려한 SR 네트워크의 설계하여 이미지의 해상도를 높이고 노이즈를 줄였다. 이 SR 네트워크를 통과시킨 뒤 VO를 수행하면 기존의 이미지를 사용할 때보다 높은 성능의 VO를 실시간으로 수행할 수 있다. TUM dataset을 이용한 실험 결과 기존의 VO 기법과 다른 SR 기법을 적용하였을 때 보다 제안하는 방법의 성능이 더 높은 것을 확인할 수 있었다. 두 번째로는 연속된 이미지만으로 구성된 dataset을 이용하여 VO, 단일 이미지 깊이 추정, 카메라 내부 파라미터 추정을 수행하는 fully unsupervised learning-based VO를 제안한다. 기존 unsupervised learning을 이용한 VO에서는 이미지들과 이미지를 촬영한 카메라의 내부 파라미터를 이용하여 VO를 수행하였다. 이 기술을 기반으로 본 논문에서는 deep intrinsic 네트워크를 추가하여 카메라 파라미터까지 네트워크에서 추정하는 방법을 제안한다. 0으로 수렴하거나 쉽게 발산하는 intrinsic 네트워크에 카메라 파라미터의 성질을 이용한 두 가지 가정을 통해 내부 파라미터를 추정할 수 있었다. KITTI dataset을 이용한 실험을 통해 intrinsic parameter 정보를 제공받아 진행된 기존의 방법과 유사한 성능을 확인할 수 있었다.1 INTRODUCTION 1 1.1 Background and Motivation 1 1.2 Literature Review 3 1.3 Contributions 10 1.4 Thesis Structure 11 2 Mathematical Preliminaries of Visual Odometry 13 2.1 Feature-based VO 13 2.2 Direct VO 17 2.3 Learning-based VO 21 2.3.1 Supervised learning-based VO 22 2.3.2 Unsupervised learning-based VO 25 3 Error Improvement in Visual Odometry Using Super-resolution 29 3.1 Introduction 29 3.2 Related Work 31 3.2.1 Visual Odometry 31 3.2.2 Super-resolution 33 3.3 SR-VO 34 3.3.1 VO performance analysis according to changing resolution 34 3.3.2 Super-Resolution Network 37 3.4 Experiments 40 3.4.1 Super-Resolution Procedure 40 3.4.2 VO with SR images 42 3.5 Summary 54 4 Visual Odometry Enhancement Method Using Fully Unsupervised Learning 55 4.1 Introduction 55 4.2 Related Work 57 4.2.1 Traditional Visual Odometry 57 4.2.2 Single-view Depth Recovery 58 4.2.3 Supervised Learning-based Visual Odometry 59 4.2.4 Unsupervised Learning-based Visual Odometry 60 4.2.5 Architecture Overview 62 4.3 Methods 62 4.3.1 Predicting the Target Image using Source Images 62 4.3.2 Intrinsic Parameters Regressor 63 4.4 Experiments 66 4.4.1 Monocular Depth Estimation 66 4.4.2 Visual Odometry 67 4.4.3 Intrinsic Parameters Estimation 77 5 Conclusion and Future Work 82 5.1 Conclusion 82 5.2 Future Work 85 Bibliography 86 Abstract (In Korean) 101Docto
    corecore