958 research outputs found

    3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection

    Full text link
    Cameras are a crucial exteroceptive sensor for self-driving cars as they are low-cost and small, provide appearance information about the environment, and work in various weather conditions. They can be used for multiple purposes such as visual navigation and obstacle detection. We can use a surround multi-camera system to cover the full 360-degree field-of-view around the car. In this way, we avoid blind spots which can otherwise lead to accidents. To minimize the number of cameras needed for surround perception, we utilize fisheye cameras. Consequently, standard vision pipelines for 3D mapping, visual localization, obstacle detection, etc. need to be adapted to take full advantage of the availability of multiple cameras rather than treat each camera individually. In addition, processing of fisheye images has to be supported. In this paper, we describe the camera calibration and subsequent processing pipeline for multi-fisheye-camera systems developed as part of the V-Charge project. This project seeks to enable automated valet parking for self-driving cars. Our pipeline is able to precisely calibrate multi-camera systems, build sparse 3D maps for visual navigation, visually localize the car with respect to these maps, generate accurate dense maps, as well as detect obstacles based on real-time depth map extraction

    A minimalistic approach to appearance-based visual SLAM

    Get PDF
    This paper presents a vision-based approach to SLAM in indoor / outdoor environments with minimalistic sensing and computational requirements. The approach is based on a graph representation of robot poses, using a relaxation algorithm to obtain a globally consistent map. Each link corresponds to a relative measurement of the spatial relation between the two nodes it connects. The links describe the likelihood distribution of the relative pose as a Gaussian distribution. To estimate the covariance matrix for links obtained from an omni-directional vision sensor, a novel method is introduced based on the relative similarity of neighbouring images. This new method does not require determining distances to image features using multiple view geometry, for example. Combined indoor and outdoor experiments demonstrate that the approach can handle qualitatively different environments (without modification of the parameters), that it can cope with violations of the “flat floor assumption” to some degree, and that it scales well with increasing size of the environment, producing topologically correct and geometrically accurate maps at low computational cost. Further experiments demonstrate that the approach is also suitable for combining multiple overlapping maps, e.g. for solving the multi-robot SLAM problem with unknown initial poses

    Structureless Camera Motion Estimation of Unordered Omnidirectional Images

    Get PDF
    This work aims at providing a novel camera motion estimation pipeline from large collections of unordered omnidirectional images. In oder to keep the pipeline as general and flexible as possible, cameras are modelled as unit spheres, allowing to incorporate any central camera type. For each camera an unprojection lookup is generated from intrinsics, which is called P2S-map (Pixel-to-Sphere-map), mapping pixels to their corresponding positions on the unit sphere. Consequently the camera geometry becomes independent of the underlying projection model. The pipeline also generates P2S-maps from world map projections with less distortion effects as they are known from cartography. Using P2S-maps from camera calibration and world map projection allows to convert omnidirectional camera images to an appropriate world map projection in oder to apply standard feature extraction and matching algorithms for data association. The proposed estimation pipeline combines the flexibility of SfM (Structure from Motion) - which handles unordered image collections - with the efficiency of PGO (Pose Graph Optimization), which is used as back-end in graph-based Visual SLAM (Simultaneous Localization and Mapping) approaches to optimize camera poses from large image sequences. SfM uses BA (Bundle Adjustment) to jointly optimize camera poses (motion) and 3d feature locations (structure), which becomes computationally expensive for large-scale scenarios. On the contrary PGO solves for camera poses (motion) from measured transformations between cameras, maintaining optimization managable. The proposed estimation algorithm combines both worlds. It obtains up-to-scale transformations between image pairs using two-view constraints, which are jointly scaled using trifocal constraints. A pose graph is generated from scaled two-view transformations and solved by PGO to obtain camera motion efficiently even for large image collections. Obtained results can be used as input data to provide initial pose estimates for further 3d reconstruction purposes e.g. to build a sparse structure from feature correspondences in an SfM or SLAM framework with further refinement via BA. The pipeline also incorporates fixed extrinsic constraints from multi-camera setups as well as depth information provided by RGBD sensors. The entire camera motion estimation pipeline does not need to generate a sparse 3d structure of the captured environment and thus is called SCME (Structureless Camera Motion Estimation).:1 Introduction 1.1 Motivation 1.1.1 Increasing Interest of Image-Based 3D Reconstruction 1.1.2 Underground Environments as Challenging Scenario 1.1.3 Improved Mobile Camera Systems for Full Omnidirectional Imaging 1.2 Issues 1.2.1 Directional versus Omnidirectional Image Acquisition 1.2.2 Structure from Motion versus Visual Simultaneous Localization and Mapping 1.3 Contribution 1.4 Structure of this Work 2 Related Work 2.1 Visual Simultaneous Localization and Mapping 2.1.1 Visual Odometry 2.1.2 Pose Graph Optimization 2.2 Structure from Motion 2.2.1 Bundle Adjustment 2.2.2 Structureless Bundle Adjustment 2.3 Corresponding Issues 2.4 Proposed Reconstruction Pipeline 3 Cameras and Pixel-to-Sphere Mappings with P2S-Maps 3.1 Types 3.2 Models 3.2.1 Unified Camera Model 3.2.2 Polynomal Camera Model 3.2.3 Spherical Camera Model 3.3 P2S-Maps - Mapping onto Unit Sphere via Lookup Table 3.3.1 Lookup Table as Color Image 3.3.2 Lookup Interpolation 3.3.3 Depth Data Conversion 4 Calibration 4.1 Overview of Proposed Calibration Pipeline 4.2 Target Detection 4.3 Intrinsic Calibration 4.3.1 Selected Examples 4.4 Extrinsic Calibration 4.4.1 3D-2D Pose Estimation 4.4.2 2D-2D Pose Estimation 4.4.3 Pose Optimization 4.4.4 Uncertainty Estimation 4.4.5 PoseGraph Representation 4.4.6 Bundle Adjustment 4.4.7 Selected Examples 5 Full Omnidirectional Image Projections 5.1 Panoramic Image Stitching 5.2 World Map Projections 5.3 World Map Projection Generator for P2S-Maps 5.4 Conversion between Projections based on P2S-Maps 5.4.1 Proposed Workflow 5.4.2 Data Storage Format 5.4.3 Real World Example 6 Relations between Two Camera Spheres 6.1 Forward and Backward Projection 6.2 Triangulation 6.2.1 Linear Least Squares Method 6.2.2 Alternative Midpoint Method 6.3 Epipolar Geometry 6.4 Transformation Recovery from Essential Matrix 6.4.1 Cheirality 6.4.2 Standard Procedure 6.4.3 Simplified Procedure 6.4.4 Improved Procedure 6.5 Two-View Estimation 6.5.1 Evaluation Strategy 6.5.2 Error Metric 6.5.3 Evaluation of Estimation Algorithms 6.5.4 Concluding Remarks 6.6 Two-View Optimization 6.6.1 Epipolar-Based Error Distances 6.6.2 Projection-Based Error Distances 6.6.3 Comparison between Error Distances 6.7 Two-View Translation Scaling 6.7.1 Linear Least Squares Estimation 6.7.2 Non-Linear Least Squares Optimization 6.7.3 Comparison between Initial and Optimized Scaling Factor 6.8 Homography to Identify Degeneracies 6.8.1 Homography for Spherical Cameras 6.8.2 Homography Estimation 6.8.3 Homography Optimization 6.8.4 Homography and Pure Rotation 6.8.5 Homography in Epipolar Geometry 7 Relations between Three Camera Spheres 7.1 Three View Geometry 7.2 Crossing Epipolar Planes Geometry 7.3 Trifocal Geometry 7.4 Relation between Trifocal, Three-View and Crossing Epipolar Planes 7.5 Translation Ratio between Up-To-Scale Two-View Transformations 7.5.1 Structureless Determination Approaches 7.5.2 Structure-Based Determination Approaches 7.5.3 Comparison between Proposed Approaches 8 Pose Graphs 8.1 Optimization Principle 8.2 Solvers 8.2.1 Additional Graph Solvers 8.2.2 False Loop Closure Detection 8.3 Pose Graph Generation 8.3.1 Generation of Synthetic Pose Graph Data 8.3.2 Optimization of Synthetic Pose Graph Data 9 Structureless Camera Motion Estimation 9.1 SCME Pipeline 9.2 Determination of Two-View Translation Scale Factors 9.3 Integration of Depth Data 9.4 Integration of Extrinsic Camera Constraints 10 Camera Motion Estimation Results 10.1 Directional Camera Images 10.2 Omnidirectional Camera Images 11 Conclusion 11.1 Summary 11.2 Outlook and Future Work Appendices A.1 Additional Extrinsic Calibration Results A.2 Linear Least Squares Scaling A.3 Proof Rank Deficiency A.4 Alternative Derivation Midpoint Method A.5 Simplification of Depth Calculation A.6 Relation between Epipolar and Circumferential Constraint A.7 Covariance Estimation A.8 Uncertainty Estimation from Epipolar Geometry A.9 Two-View Scaling Factor Estimation: Uncertainty Estimation A.10 Two-View Scaling Factor Optimization: Uncertainty Estimation A.11 Depth from Adjoining Two-View Geometries A.12 Alternative Three-View Derivation A.12.1 Second Derivation Approach A.12.2 Third Derivation Approach A.13 Relation between Trifocal Geometry and Alternative Midpoint Method A.14 Additional Pose Graph Generation Examples A.15 Pose Graph Solver Settings A.16 Additional Pose Graph Optimization Examples Bibliograph

    SurfelMeshing: Online Surfel-Based Mesh Reconstruction

    Full text link
    We address the problem of mesh reconstruction from live RGB-D video, assuming a calibrated camera and poses provided externally (e.g., by a SLAM system). In contrast to most existing approaches, we do not fuse depth measurements in a volume but in a dense surfel cloud. We asynchronously (re)triangulate the smoothed surfels to reconstruct a surface mesh. This novel approach enables to maintain a dense surface representation of the scene during SLAM which can quickly adapt to loop closures. This is possible by deforming the surfel cloud and asynchronously remeshing the surface where necessary. The surfel-based representation also naturally supports strongly varying scan resolution. In particular, it reconstructs colors at the input camera's resolution. Moreover, in contrast to many volumetric approaches, ours can reconstruct thin objects since objects do not need to enclose a volume. We demonstrate our approach in a number of experiments, showing that it produces reconstructions that are competitive with the state-of-the-art, and we discuss its advantages and limitations. The algorithm (excluding loop closure functionality) is available as open source at https://github.com/puzzlepaint/surfelmeshing .Comment: Version accepted to IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Creation and maintenance of visual incremental maps and hierarchical localization.

    Get PDF
    Over the last few years, the presence of the mobile robotics has considerably increased in a wide variety of environments. It is common to find robots that carry out repetitive and specific applications and also, they can be used for working at dangerous environments and to perform precise tasks. These robots can be found in a variety of social environments, such as industry, household, educational and health scenarios. For that reason, they need a specific and continuous research and improvement work. Specifically, autonomous mobile robots require a very precise technology to perform tasks without human assistance. To perform tasks autonomously, the robots must be able to navigate in an unknown environment. For that reason, the autonomous mobile robots must be able to address the mapping and localization tasks: they must create a model of the environment and estimate their position and orientation. This PhD thesis proposes and analyses different methods to carry out the map creation and the localization tasks in indoor environments. To address these tasks only visual information is used, specifically, omnidirectional images, with a 360º field of view. Throughout the chapters of this document solutions for autonomous navigation tasks are proposed, they are solved using transformations in the images captured by a vision system mounted on the robot. Firstly, the thesis focuses on the study of the global appearance descriptors in the localization task. The global appearance descriptors are algorithms that transform an image globally, into a unique vector. In these works, a deep comparative study is performed. In the experiments different global appearance descriptors are used along with omnidirectional images and the results are compared. The main goal is to obtain an optimized algorithm to estimate the robot position and orientation in real indoor environments. The experiments take place with real conditions, so some visual changes in the scenes can occur, such as camera defects, furniture or people movements and changes in the lighting conditions. The computational cost is also studied; the idea is that the robot has to localize the robot in an accurate mode, but also, it has to be fast enought. Additionally, a second application, whose goal is to carry out an incremental mapping in indoor environments, is presented. This application uses the best global appearance descriptors used in the localization task, but this time they are constructed with the purpose of solving the mapping problem using an incremental clustering technique. The application clusters a batch of images that are visually similar; every group of images or cluster is expected to identify a zone of the environment. The shape and size of the cluster can vary while the robot is visiting the different rooms. Nowadays. different algorithms can be used to obtain the clusters, but all these solutions usually work properly when they work ‘offline’, starting from the whole set of data to cluster. The main idea of this study is to obtain the map incrementally while the robot explores the new environment. Carrying out the mapping incrementally while the robot is still visiting the area is very interesting since having the map separated into nodes with relationships of similitude between them can be used subsequently for the hierarchical localization tasks, and also, to recognize environments already visited in the model. Finally, this PhD thesis includes an analysis of deep learning techniques for localization tasks. Particularly, siamese networks have been studied. Siamese networks are based on classic convolutional networks, but they permit evaluating two images simultaneously. These networks output a similarity value between the input images, and that information can be used for the localization tasks. Throughout this work the technique is presented, the possible architectures are analysed and the results after the experiments are shown and compared. Using the siamese networks, the localization in real operation conditions and environments is solved, focusing on improving the performance against illumination changes on the scene. During the experiments the room retrieval problem, the hierarchical localization and the absolute localization have been solved.Durante los últimos años, la presencia de la robótica móvil ha aumentado substancialmente en una gran variedad de entornos y escenarios. Es habitual encontrar el uso de robots para llevar a cabo aplicaciones repetitivas y específicas, así como tareas en entornos peligrosos o con resultados que deben ser muy precisos. Dichos robots se pueden encontrar tanto en ámbitos industriales como en familiares, educativos y de salud; por ello, requieren un trabajo específico y continuo de investigación y mejora. En concreto, los robots móviles autónomos requieren de una tecnología precisa para desarrollar tareas sin ayuda del ser humano. Para realizar tareas de manera autónoma, los robots deben ser capaces de navegar por un entorno ‘a priori’ desconocido. Por tanto, los robots móviles autónomos deben ser capaces de realizar la tarea de creación de mapas, creando un modelo del entorno y la tarea de localización, esto es estimar su posición y orientación. La presente tesis plantea un diseño y análisis de diferentes métodos para realizar las tareas de creación de mapas y localización en entornos de interior. Para estas tareas se emplea únicamente información visual, en concreto, imágenes omnidireccionales, con un campo de visión de 360º. En los capítulos de este trabajo se plantean soluciones a las tareas de navegación autónoma del robot mediante transformaciones en las imágenes que este es capaz de captar. En cuanto a los trabajos realizados, en primer lugar, se presenta un estudio de descriptores de apariencia global en tareas de localización. Los descriptores de apariencia global son transformaciones capaces de obtener un único vector que describa globalmente una imagen. En este trabajo se realiza un estudio exhaustivo de diferentes métodos de apariencia global adaptando su uso a imágenes omnidireccionales. Se trata de obtener un algoritmo optimizado para estimar la posición y orientación del robot en entornos reales de oficina, donde puede surgir cambios visuales en el entorno como movimientos de cámara, de mobiliario o de iluminación en la escena. También se evalúa el tiempo empleado para realizar esta estimación, ya que el trabajo de un robot debe ser preciso, pero también factible en cuanto a tiempos de computación. Además, se presenta una segunda aplicación donde el estudio se centra en la creación de mapas de entornos de interior de manera incremental. Esta aplicación hace uso de los descriptores de apariencia global estudiados para la tarea de localización, pero en este caso se utilizan para la construcción de mapas utilizando la técnica de ‘clustering’ incremental. En esta aplicación, conjuntos de imágenes visualmente similares se agrupan en un único grupo. La forma y cantidad de grupos es variable conforme el robot avanza en el entorno. Actualmente, existen diferentes algoritmos para obtener la separación de un entorno en nodos, pero las soluciones efectivas se realizan de manera ‘off-line’, es decir, a posteriori una vez se tienen todas las imágenes captadas. El trabajo presentado permite realizar esta tarea de manera incremental mientras el robot explora el nuevo entorno. Realizar esta tarea mientras se visita el resto del entorno puede ser muy interesante ya que tener el mapa separado por nodos con relaciones de proximidad entre ellos se puede ir utilizando para tareas de localización jerárquica. Además, es posible reconocer entornos ya visitados o similares a nodos pasados. Por último, la tesis también incluye el estudio de técnicas de aprendizaje profundo (‘deep learning’) para tareas de localización. En concreto, se estudia el uso de las redes siamesas, una técnica poco explorada en robótica móvil, que está basada en las clásicas redes convolucionales, pero en la que dos imágenes son evaluadas al mismo tiempo. Estas redes dan un valor de similitud entre el par de imágenes de entrada, lo que permite realizar tareas de localización visual. En este trabajo se expone esta técnica, se presentan las estructuras que pueden tener estas redes y los resultados tras la experimentación. Se evalúa la tarea de localización en entornos heterogéneos en los que el principal problema viene dado por cambios en la iluminación de la escena. Con las redes siamesas se trata de resolver el problema de estimación de estancia, el problema de localización jerárquica y el de localización absoluta
    corecore