1,228 research outputs found

    Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal

    Full text link
    Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable

    Automatic Dense 3D Scene Mapping from Non-overlapping Passive Visual Sensors for Future Autonomous Systems

    Get PDF
    The ever increasing demand for higher levels of autonomy for robots and vehicles means there is an ever greater need for such systems to be aware of their surroundings. Whilst solutions already exist for creating 3D scene maps, many are based on active scanning devices such as laser scanners and depth cameras that are either expensive, unwieldy, or do not function well under certain environmental conditions. As a result passive cameras are a favoured sensor due their low cost, small size, and ability to work in a range of lighting conditions. In this work we address some of the remaining research challenges within the problem of 3D mapping around a moving platform. We utilise prior work in dense stereo imaging, Stereo Visual Odometry (SVO) and extend Structure from Motion (SfM) to create a pipeline optimised for on vehicle sensing. Using forward facing stereo cameras, we use state of the art SVO and dense stereo techniques to map the scene in front of the vehicle. With significant amounts of prior research in dense stereo, we addressed the issue of selecting an appropriate method by creating a novel evaluation technique. Visual 3D mapping of dynamic scenes from a moving platform result in duplicated scene objects. We extend the prior work on mapping by introducing a generalized dynamic object removal process. Unlike other approaches that rely on computationally expensive segmentation or detection, our method utilises existing data from the mapping stage and the findings from our dense stereo evaluation. We introduce a new SfM approach that exploits our platform motion to create a novel dense mapping process that exceeds the 3D data generation rate of state of the art alternatives. Finally, we combine dense stereo, SVO, and our SfM approach to automatically align point clouds from non-overlapping views to create a rotational and scale consistent global 3D model

    Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

    Get PDF
    Simultaneous Localization and Mapping (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors' take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved

    On-Manifold Preintegration for Real-Time Visual-Inertial Odometry

    Get PDF
    Current approaches for visual-inertial odometry (VIO) are able to attain highly accurate state estimation via nonlinear optimization. However, real-time optimization quickly becomes infeasible as the trajectory grows over time, this problem is further emphasized by the fact that inertial measurements come at high rate, hence leading to fast growth of the number of variables in the optimization. In this paper, we address this issue by preintegrating inertial measurements between selected keyframes into single relative motion constraints. Our first contribution is a \emph{preintegration theory} that properly addresses the manifold structure of the rotation group. We formally discuss the generative measurement model as well as the nature of the rotation noise and derive the expression for the \emph{maximum a posteriori} state estimator. Our theoretical development enables the computation of all necessary Jacobians for the optimization and a-posteriori bias correction in analytic form. The second contribution is to show that the preintegrated IMU model can be seamlessly integrated into a visual-inertial pipeline under the unifying framework of factor graphs. This enables the application of incremental-smoothing algorithms and the use of a \emph{structureless} model for visual measurements, which avoids optimizing over the 3D points, further accelerating the computation. We perform an extensive evaluation of our monocular \VIO pipeline on real and simulated datasets. The results confirm that our modelling effort leads to accurate state estimation in real-time, outperforming state-of-the-art approaches.Comment: 20 pages, 24 figures, accepted for publication in IEEE Transactions on Robotics (TRO) 201

    Towards Live 3D Reconstruction from Wearable Video: An Evaluation of V-SLAM, NeRF, and Videogrammetry Techniques

    Full text link
    Mixed reality (MR) is a key technology which promises to change the future of warfare. An MR hybrid of physical outdoor environments and virtual military training will enable engagements with long distance enemies, both real and simulated. To enable this technology, a large-scale 3D model of a physical environment must be maintained based on live sensor observations. 3D reconstruction algorithms should utilize the low cost and pervasiveness of video camera sensors, from both overhead and soldier-level perspectives. Mapping speed and 3D quality can be balanced to enable live MR training in dynamic environments. Given these requirements, we survey several 3D reconstruction algorithms for large-scale mapping for military applications given only live video. We measure 3D reconstruction performance from common structure from motion, visual-SLAM, and photogrammetry techniques. This includes the open source algorithms COLMAP, ORB-SLAM3, and NeRF using Instant-NGP. We utilize the autonomous driving academic benchmark KITTI, which includes both dashboard camera video and lidar produced 3D ground truth. With the KITTI data, our primary contribution is a quantitative evaluation of 3D reconstruction computational speed when considering live video.Comment: Accepted to 2022 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), 13 page

    Stereo Visual SLAM for Mobile Robots Navigation

    Get PDF
    Esta tesis está enfocada a la combinación de los campos de la robótica móvil y la visión por computador, con el objetivo de desarrollar métodos que permitan a un robot móvil localizarse dentro de su entorno mientras construye un mapa del mismo, utilizando como única entrada un conjunto de imágenes. Este problema se denomina SLAM visual (por las siglas en inglés de "Simultaneous Localization And Mapping") y es un tema que aún continúa abierto a pesar del gran esfuerzo investigador realizado en los últimos años. En concreto, en esta tesis utilizamos cámaras estéreo para capturar, simultáneamente, dos imágenes desde posiciones ligeramente diferentes, proporcionando así información 3D de forma directa. De entre los problemas de localización de robots, en esta tesis abordamos dos de ellos: el seguimiento de robots y la localización y mapeado simultáneo (o SLAM). El primero de ellos no tiene en cuenta el mapa del entorno sino que calcula la trayectoria del robot mediante la composición incremental de las estimaciones de su movimiento entre instantes de tiempo consecutivos. Cuando se usan imágenes para calcular esta trayectoria, el problema toma el nombre de "odometría visual", y su resolución es más sencilla que la del SLAM visual. De hecho, a menudo se integra como parte de un sistema de SLAM completo. Esta tesis contribuye con la propuesta de dos sistemas de odometría visual. Uno de ellos está basado en un solución cerrada y eficiente mientras que el otro está basado en un proceso de optimización no-lineal que implementa un nuevo método de detección y eliminación rápida de espurios. Los métodos de SLAM, por su parte, también abordan la construcción de un mapa del entorno con el objetivo de mejorar sensiblemente la localización del robot, evitando de esta forma la acumulación de error en la que incurre la odometría visual. Además, el mapa construido puede ser empleado para hacer frente a situaciones exigentes como la recuperación de la localización tras la pérdida del robot o realizar localización global. En esta tesis se presentan dos sistemas completos de SLAM visual. Uno de ellos se ha implementado dentro del marco de los filtros probabilísticos no parámetricos, mientras que el otro está basado en un método nuevo de "bundle adjustment" relativo que ha sido integrado con algunas técnicas recientes de visión por computador. Otra contribución de esta tesis es la publicación de dos colecciones de datos que contienen imágenes estéreo capturadas en entornos urbanos sin modificar, así como una estimación del camino real del robot basada en GPS (denominada "ground truth"). Estas colecciones sirven como banco de pruebas para validar métodos de odometría y SLAM visual
    corecore