940 research outputs found

    Probabilistic Global Scale Estimation for MonoSLAM Based on Generic Object Detection

    Full text link
    This paper proposes a novel method to estimate the global scale of a 3D reconstructed model within a Kalman filtering-based monocular SLAM algorithm. Our Bayesian framework integrates height priors over the detected objects belonging to a set of broad predefined classes, based on recent advances in fast generic object detection. Each observation is produced on single frames, so that we do not need a data association process along video frames. This is because we associate the height priors with the image region sizes at image places where map features projections fall within the object detection regions. We present very promising results of this approach obtained on several experiments with different object classes.Comment: Int. Workshop on Visual Odometry, CVPR, (July 2017

    Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation

    Get PDF
    How do computers and intelligent agents view the world around them? Feature extraction and representation constitutes one the basic building blocks towards answering this question. Traditionally, this has been done with carefully engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is no ``one size fits all'' approach that satisfies all requirements. In recent years, the rising popularity of deep learning has resulted in a myriad of end-to-end solutions to many computer vision problems. These approaches, while successful, tend to lack scalability and can't easily exploit information learned by other systems. Instead, we propose SAND features, a dedicated deep learning solution to feature extraction capable of providing hierarchical context information. This is achieved by employing sparse relative labels indicating relationships of similarity/dissimilarity between image locations. The nature of these labels results in an almost infinite set of dissimilar examples to choose from. We demonstrate how the selection of negative examples during training can be used to modify the feature space and vary it's properties. To demonstrate the generality of this approach, we apply the proposed features to a multitude of tasks, each requiring different properties. This includes disparity estimation, semantic segmentation, self-localisation and SLAM. In all cases, we show how incorporating SAND features results in better or comparable results to the baseline, whilst requiring little to no additional training. Code can be found at: https://github.com/jspenmar/SAND_featuresComment: CVPR201

    Eliminating Scale Drift in Monocular SLAM Using Depth from Defocus

    Full text link
    © 2017 IEEE. This letter presents a novel approach to correct errors caused by accumulated scale drift in monocular SLAM. It is shown that the metric scale can be estimated using information gathered through monocular SLAM and image blur due to defocus. A nonlinear least squares optimization problem is formulated to integrate depth estimates from defocus to monocular SLAM. An algorithm to process the output keyframe and feature location estimates generated by a monocular SLAM algorithm to correct for scale drift at selected local regions of the environment is presented. The proposed algorithm is experimentally evaluated by processing the output of ORB-SLAM to obtain accurate metric scale maps from a monocular camera without any prior knowledge about the scene

    Estimation of Absolute Scale in Monocular SLAM Using Synthetic Data

    Full text link
    This paper addresses the problem of scale estimation in monocular SLAM by estimating absolute distances between camera centers of consecutive image frames. These estimates would improve the overall performance of classical (not deep) SLAM systems and allow metric feature locations to be recovered from a single monocular camera. We propose several network architectures that lead to an improvement of scale estimation accuracy over the state of the art. In addition, we exploit a possibility to train the neural network only with synthetic data derived from a computer graphics simulator. Our key insight is that, using only synthetic training inputs, we can achieve similar scale estimation accuracy as that obtained from real data. This fact indicates that fully annotated simulated data is a viable alternative to existing deep-learning-based SLAM systems trained on real (unlabeled) data. Our experiments with unsupervised domain adaptation also show that the difference in visual appearance between simulated and real data does not affect scale estimation results. Our method operates with low-resolution images (0.03MP), which makes it practical for real-time SLAM applications with a monocular camera

    J-MOD2^{2}: Joint Monocular Obstacle Detection and Depth Estimation

    Full text link
    In this work, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multi task architectures to both perform scene understanding and depth estimation. We follow their track and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to exploit the joint information of the obstacle detection task, that produces more reliable bounding boxes, with the depth estimation one, increasing the robustness of both to scenario changes. We call this architecture J-MOD2^{2}. We test the effectiveness of our approach with experiments on sequences with different appearance and focal lengths and compare it to SotA multi task methods that jointly perform semantic segmentation and depth estimation. In addition, we show the integration in a full system using a set of simulated navigation experiments where a MAV explores an unknown scenario and plans safe trajectories by using our detection model

    Improved Real-Time Monocular SLAM Using Semantic Segmentation on Selective Frames

    Full text link
    Monocular simultaneous localization and mapping (SLAM) is emerging in advanced driver assistance systems and autonomous driving, because a single camera is cheap and easy to install. Conventional monocular SLAM has two major challenges leading inaccurate localization and mapping. First, it is challenging to estimate scales in localization and mapping. Second, conventional monocular SLAM uses inappropriate mapping factors such as dynamic objects and low-parallax areas in mapping. This paper proposes an improved real-time monocular SLAM that resolves the aforementioned challenges by efficiently using deep learning-based semantic segmentation. To achieve the real-time execution of the proposed method, we apply semantic segmentation only to downsampled keyframes in parallel with mapping processes. In addition, the proposed method corrects scales of camera poses and three-dimensional (3D) points, using estimated ground plane from road-labeled 3D points and the real camera height. The proposed method also removes inappropriate corner features labeled as moving objects and low parallax areas. Experiments with eight video sequences demonstrate that the proposed monocular SLAM system achieves significantly improved and comparable trajectory tracking accuracy, compared to existing state-of-the-art monocular and stereo SLAM systems, respectively. The proposed system can achieve real-time tracking on a standard CPU potentially with a standard GPU support, whereas existing segmentation-aided monocular SLAM does not

    Visual SLAM muuttuvissa ympäristöissä

    Get PDF
    This thesis investigates the problem of Visual Simultaneous Localization and Mapping (vSLAM) in changing environments. The vSLAM problem is to sequentially estimate the pose of a device with mounted cameras in a map generated based on images taken with those cameras. vSLAM algorithms face two main challenges in changing environments: moving objects and temporal appearance changes. Moving objects cause problems in pose estimation if they are mistaken for static objects. Moving objects also cause problems for loop closure detection (LCD), which is the problem of detecting whether a previously visited place has been revisited. A same moving object observed in two different places may cause false loop closures to be detected. Temporal appearance changes such as those brought about by time of day or weather changes cause long-term data association errors for LCD. These cause difficulties in recognizing previously visited places after they have undergone appearance changes. Focus is placed on LCD, which turns out to be the part of vSLAM that changing environment affects the most. In addition, several techniques and algorithms for Visual Place Recognition (VPR) in challenging conditions that could be used in the context of LCD are surveyed and the performance of two state-of-the-art modern VPR algorithms in changing environments is assessed in an experiment in order to measure their applicability for LCD. The most severe performance degrading appearance changes are found to be those caused by change in season and illumination. Several algorithms and techniques that perform well in loop closure related tasks in specific environmental conditions are identified as a result of the survey. Finally, a limited experiment on the Nordland dataset implies that the tested VPR algorithms are usable as is or can be modified for use in long-term LCD. As a part of the experiment, a new simple neighborhood consistency check was also developed, evaluated, and found to be effective at reducing false positives output by the tested VPR algorithms
    • …
    corecore