11,883 research outputs found

    Interest point detectors for visual SLAM

    Get PDF
    In this paper we present several interest points detectors and we analyze their suitability when used as landmark extractors for vision-based simultaneous localization and mapping (vSLAM). For this purpose, we evaluate the detectors according to their repeatability under changes in viewpoint and scale. These are the desired requirements for visual landmarks. Several experiments were carried out using sequence of images captured with high precision. The sequences represent planar objects as well as 3D scenes

    Local descriptors for visual SLAM

    Get PDF
    We present a comparison of several local image descriptors in the context of visual Simultaneous Localization and Mapping (SLAM). In visual SLAM a set of points in the environment are extracted from images and used as landmarks. The points are represented by local descriptors used to resolve the association between landmarks. In this paper, we study the class separability of several descriptors under changes in viewpoint and scale. Several experiments were carried out using sequences of images in 2D and 3D scenes

    The use of visual cues for vehicle control and navigation

    Get PDF
    At least three levels of control are required to operate most vehicles: (1) inner-loop control to counteract the momentary effects of disturbances on vehicle position; (2) intermittent maneuvers to avoid obstacles, and (3) outer-loop control to maintain a planned route. Operators monitor dynamic optical relationships in their immediate surroundings to estimate momentary changes in forward, lateral, and vertical position, rates of change in speed and direction of motion, and distance from obstacles. The process of searching the external scene to find landmarks (for navigation) is intermittent and deliberate, while monitoring and responding to subtle changes in the visual scene (for vehicle control) is relatively continuous and 'automatic'. However, since operators may perform both tasks simultaneously, the dynamic optical cues available for a vehicle control task may be determined by the operator's direction of gaze for wayfinding. An attempt to relate the visual processes involved in vehicle control and wayfinding is presented. The frames of reference and information used by different operators (e.g., automobile drivers, airline pilots, and helicopter pilots) are reviewed with particular emphasis on the special problems encountered by helicopter pilots flying nap of the earth (NOE). The goal of this overview is to describe the context within which different vehicle control tasks are performed and to suggest ways in which the use of visual cues for geographical orientation might influence visually guided control activities

    A comparative evaluation of interest point detectors and local descriptors for visual SLAM

    Get PDF
    Abstract In this paper we compare the behavior of different interest points detectors and descriptors under the conditions needed to be used as landmarks in vision-based simultaneous localization and mapping (SLAM). We evaluate the repeatability of the detectors, as well as the invariance and distinctiveness of the descriptors, under different perceptual conditions using sequences of images representing planar objects as well as 3D scenes. We believe that this information will be useful when selecting an appropriat

    PlaNet - Photo Geolocation with Convolutional Neural Networks

    Full text link
    Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model

    Hybrid Scene Compression for Visual Localization

    Full text link
    Localizing an image wrt. a 3D scene model represents a core task for many computer vision applications. An increasing number of real-world applications of visual localization on mobile devices, e.g., Augmented Reality or autonomous robots such as drones or self-driving cars, demand localization approaches to minimize storage and bandwidth requirements. Compressing the 3D models used for localization thus becomes a practical necessity. In this work, we introduce a new hybrid compression algorithm that uses a given memory limit in a more effective way. Rather than treating all 3D points equally, it represents a small set of points with full appearance information and an additional, larger set of points with compressed information. This enables our approach to obtain a more complete scene representation without increasing the memory requirements, leading to a superior performance compared to previous compression schemes. As part of our contribution, we show how to handle ambiguous matches arising from point compression during RANSAC. Besides outperforming previous compression techniques in terms of pose accuracy under the same memory constraints, our compression scheme itself is also more efficient. Furthermore, the localization rates and accuracy obtained with our approach are comparable to state-of-the-art feature-based methods, while using a small fraction of the memory.Comment: Published at CVPR 201

    Attention and Anticipation in Fast Visual-Inertial Navigation

    Get PDF
    We study a Visual-Inertial Navigation (VIN) problem in which a robot needs to estimate its state using an on-board camera and an inertial sensor, without any prior knowledge of the external environment. We consider the case in which the robot can allocate limited resources to VIN, due to tight computational constraints. Therefore, we answer the following question: under limited resources, what are the most relevant visual cues to maximize the performance of visual-inertial navigation? Our approach has four key ingredients. First, it is task-driven, in that the selection of the visual cues is guided by a metric quantifying the VIN performance. Second, it exploits the notion of anticipation, since it uses a simplified model for forward-simulation of robot dynamics, predicting the utility of a set of visual cues over a future time horizon. Third, it is efficient and easy to implement, since it leads to a greedy algorithm for the selection of the most relevant visual cues. Fourth, it provides formal performance guarantees: we leverage submodularity to prove that the greedy selection cannot be far from the optimal (combinatorial) selection. Simulations and real experiments on agile drones show that our approach ensures state-of-the-art VIN performance while maintaining a lean processing time. In the easy scenarios, our approach outperforms appearance-based feature selection in terms of localization errors. In the most challenging scenarios, it enables accurate visual-inertial navigation while appearance-based feature selection fails to track robot's motion during aggressive maneuvers.Comment: 20 pages, 7 figures, 2 table

    Real-time Monocular Object SLAM

    Get PDF
    We present a real-time object-based SLAM system that leverages the largest object database to date. Our approach comprises two main components: 1) a monocular SLAM algorithm that exploits object rigidity constraints to improve the map and find its real scale, and 2) a novel object recognition algorithm based on bags of binary words, which provides live detections with a database of 500 3D objects. The two components work together and benefit each other: the SLAM algorithm accumulates information from the observations of the objects, anchors object features to especial map landmarks and sets constrains on the optimization. At the same time, objects partially or fully located within the map are used as a prior to guide the recognition algorithm, achieving higher recall. We evaluate our proposal on five real environments showing improvements on the accuracy of the map and efficiency with respect to other state-of-the-art techniques
    corecore