40 research outputs found

    Motion and Saliency Based Monocular SLAM

    Get PDF
    Feature extraction is a key component of a Monocular Simultaneous Localization and Mapping (Monocular SLAM) system, which permits to extract features that can be reliably tracked over frames. This paper proposes a novel approach for Monocular SLAM that uses the information on the camera displacement and image saliency to adequately extract stable features, which will be prompt to produce sufficient parallax that is essential to ensure precise localization and mapping. The results obtained from real data show that the proposed method outclasses the state of the art method both in precision and computational speed

    Real-time monocular SLAM: Why filter?

    Full text link
    Abstract—While the most accurate solution to off-line structure from motion (SFM) problems is undoubtedly to extract as much correspondence information as possible and perform global optimisation, sequential methods suitable for live video streams must approximate this to fit within fixed computational bounds. Two quite different approaches to real-time SFM — also called monocular SLAM (Simultaneous Localisation and Mapping) — have proven successful, but they sparsify the problem in different ways. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods retain the optimisation approach of global bundle adjustment, but computationally must select only a small number of past frames to process. In this paper we perform the first rigorous analysis of the relative advantages of filtering and sparse optimisation for sequential monocular SLAM. A series of experiments in simulation as well using a real image SLAM system were performed by means of covariance propagation and Monte Carlo methods, and comparisons made using a combined cost/accuracy measure. With some well-discussed reservations, we conclude that while filtering may have a niche in systems with low processing resources, in most modern applications keyframe optimisation gives the most accuracy per unit of computing time. I

    Large-scale monocular SLAM by local bundle adjustment and map joining

    Full text link
    This paper first demonstrates an interesting property of bundle adjustment (BA), "scale drift correction". Here "scale drift correction" means that BA can converge to the correct solution (up to a scale) even if the initial values of the camera pose translations and point feature positions are calculated using very different scale factors. This property together with other properties of BA makes it the best approach for monocular Simultaneous Localization and Mapping (SLAM), without considering the computational complexity. This naturally leads to the idea of using local BA and map joining to solve large-scale monocular SLAM problem, which is proposed in this paper. The local maps are built through Scale-Invariant Feature Transform (SIFT) for feature detection and matching, random sample consensus (RANSAC) paradigm at different levels for robust outlier removal, and BA for optimization. To reduce the computational cost of the large-scale map building, the features in each local map are judiciously selected and then the local maps are combined using a recently developed 3D map joining algorithm. The proposed large-scale monocular SLAM algorithm is evaluated using a publicly available dataset with centimeter-level ground truth. ©2010 IEEE

    Monocular graph SLAM with complexity reduction

    Get PDF
    Abstract-We present a graph-based SLAM approach, using monocular vision and odometry, designed to operate on computationally constrained platforms. When computation and memory are limited, visual tracking becomes difficult or impossible, and map representation and update costs must remain low. Our system constructs a map of structured views using only weak temporal assumptions, and performs recognition and relative pose estimation over the set of views. Visual observations are fused with differential sensors in an incrementally optimized graph representation. Using variable elimination and constraint pruning, the graph complexity and storage is kept linear in explored space rather than in time. We evaluate performance on sequences with ground truth, and also compare to a standard graph SLAM approach

    Non-Linearity Analysis of Depth and Angular Indexes for Optimal Stereo SLAM

    Get PDF
    In this article, we present a real-time 6DoF egomotion estimation system for indoor environments using a wide-angle stereo camera as the only sensor. The stereo camera is carried in hand by a person walking at normal walking speeds 3–5 km/h. We present the basis for a vision-based system that would assist the navigation of the visually impaired by either providing information about their current position and orientation or guiding them to their destination through different sensing modalities. Our sensor combines two different types of feature parametrization: inverse depth and 3D in order to provide orientation and depth information at the same time. Natural landmarks are extracted from the image and are stored as 3D or inverse depth points, depending on a depth threshold. This depth threshold is used for switching between both parametrizations and it is computed by means of a non-linearity analysis of the stereo sensor. Main steps of our system approach are presented as well as an analysis about the optimal way to calculate the depth threshold. At the moment each landmark is initialized, the normal of the patch surface is computed using the information of the stereo pair. In order to improve long-term tracking, a patch warping is done considering the normal vector information. Some experimental results under indoor environments and conclusions are presented

    A multisensor SLAM for dense maps of large scale environments under poor lighting conditions

    Get PDF
    This thesis describes the development and implementation of a multisensor large scale autonomous mapping system for surveying tasks in underground mines. The hazardous nature of the underground mining industry has resulted in a push towards autonomous solutions to the most dangerous operations, including surveying tasks. Many existing autonomous mapping techniques rely on approaches to the Simultaneous Localization and Mapping (SLAM) problem which are not suited to the extreme characteristics of active underground mining environments. Our proposed multisensor system has been designed from the outset to address the unique challenges associated with underground SLAM. The robustness, self-containment and portability of the system maximize the potential applications.The multisensor mapping solution proposed as a result of this work is based on a fusion of omnidirectional bearing-only vision-based localization and 3D laser point cloud registration. By combining these two SLAM techniques it is possible to achieve some of the advantages of both approaches – the real-time attributes of vision-based SLAM and the dense, high precision maps obtained through 3D lasers. The result is a viable autonomous mapping solution suitable for application in challenging underground mining environments.A further improvement to the robustness of the proposed multisensor SLAM system is a consequence of incorporating colour information into vision-based localization. Underground mining environments are often dominated by dynamic sources of illumination which can cause inconsistent feature motion during localization. Colour information is utilized to identify and remove features resulting from illumination artefacts and to improve the monochrome based feature matching between frames.Finally, the proposed multisensor mapping system is implemented and evaluated in both above ground and underground scenarios. The resulting large scale maps contained a maximum offset error of ±30mm for mapping tasks with lengths over 100m

    Mixed Reality and Remote Sensing Application of Unmanned Aerial Vehicle in Fire and Smoke Detection

    Get PDF
    This paper proposes the development of a system incorporating inertial measurement unit (IMU), a consumer-grade digital camera and a fire detection algorithm simultaneously with a nano Unmanned Aerial Vehicle (UAV) for inspection purposes. The video streams are collected through the monocular camera and navigation relied on the state-of-the-art indoor/outdoor Simultaneous Localisation and Mapping (SLAM) system. It implements the robotic operating system (ROS) and computer vision algorithm to provide a robust, accurate and unique inter-frame motion estimation. The collected onboard data are communicated to the ground station and used the SLAM system to generate a map of the environment. A robust and efficient re-localization was performed to recover from tracking failure, motion blur, and frame lost in the data received. The fire detection algorithm was deployed based on the colour, movement attributes, temporal variation of fire intensity and its accumulation around a point. The cumulative time derivative matrix was utilized to analyze the frame-by-frame changes and to detect areas with high-frequency luminance flicker (random characteristic). Colour, surface coarseness, boundary roughness, and skewness features were perceived as the quadrotor flew autonomously within the clutter and congested area. Mixed Reality system was adopted to visualize and test the proposed system in a physical environment, and the virtual simulation was conducted through the Unity game engine. The results showed that the UAV could successfully detect fire and flame, autonomously fly towards and hover around it, communicate with the ground station and simultaneously generate a map of the environment. There was a slight error between the real and virtual UAV calibration due to the ground truth data and the correlation complexity of tracking real and virtual camera coordinate frames

    A Unified Hybrid Formulation for Visual SLAM

    Get PDF
    Visual Simultaneous Localization and Mapping (Visual SLAM (VSLAM)), is the process of estimating the six degrees of freedom ego-motion of a camera, from its video feed, while simultaneously constructing a 3D model of the observed environment. Extensive research in the field for the past two decades has yielded real-time and efficient algorithms for VSLAM, allowing various interesting applications in augmented reality, cultural heritage, robotics and the automotive industry, to name a few. The underlying formula behind VSLAM is a mixture of image processing, geometry, graph theory, optimization and machine learning; the theoretical and practical development of these building blocks led to a wide variety of algorithms, each leveraging different assumptions to achieve superiority under the presumed conditions of operation. An exhaustive survey on the topic outlined seven main components in a generic VSLAM pipeline, namely: the matching paradigm, visual initialization, data association, pose estimation, topological/metric map generation, optimization, and global localization. Before claiming VSLAM a solved problem, numerous challenging subjects pertaining to robustness in each of the aforementioned components have to be addressed; namely: resilience to a wide variety of scenes (poorly textured or self repeating scenarios), resilience to dynamic changes (moving objects), and scalability for long-term operation (computational resources awareness and management). Furthermore, current state-of-the art VSLAM pipelines are tailored towards static, basic point cloud reconstructions, an impediment to perception applications such as path planning, obstacle avoidance and object tracking. To address these limitations, this work proposes a hybrid scene representation, where different sources of information extracted solely from the video feed are fused in a hybrid VSLAM system. The proposed pipeline allows for seamless integration of data from pixel-based intensity measurements and geometric entities to produce and make use of a coherent scene representation. The goal is threefold: 1) Increase camera tracking accuracy under challenging motions, 2) improve robustness to challenging poorly textured environments and varying illumination conditions, and 3) ensure scalability and long-term operation by efficiently maintaining a global reusable map representation

    Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment

    Full text link
    In this paper we propose a novel Semantic Bundle Ad-justment framework whereby known rigid stationary objects are detected while tracking the camera and mapping the en-vironment. The system builds on established tracking and mapping techniques to exploit incremental 3D reconstruc-tion in order to validate hypotheses on the presence and pose of sought objects. Then, detected objects are explic-itly taken into account for a global semantic optimization of both camera and object poses. Thus, unlike all systems proposed so far, our approach allows for solving jointly the detection and SLAM problems, so as to achieve object de-tection together with improved SLAM accuracy. 1

    Local Accuracy and Global Consistency for Efficient SLAM

    No full text
    This thesis is concerned with the problem of Simultaneous Localisation and Mapping (SLAM) using visual data only. Given the video stream of a moving camera, we wish to estimate the structure of the environment and the motion of the device most accurately and in real-time. Two effective approaches were presented in the past. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods rely on the optimisation approach of bundle adjustment, but computationally must select only a small number of past frames to process. We perform a rigorous comparison between the two approaches for visual SLAM. Especially, we show that accuracy comes from a large number of points, while the number of intermediate frames only has a minor impact. We conclude that keyframe bundle adjustment is superior to ltering due to a smaller computational cost. Based on these experimental results, we develop an efficient framework for large-scale visual SLAM using the keyframe strategy. We demonstrate that SLAM using a single camera does not only drift in rotation and translation, but also in scale. In particular, we perform large-scale loop closure correction using a novel variant of pose-graph optimisation which also takes scale drift into account. Starting from this two stage approach which tackles local motion estimation and loop closures separately, we develop a unified framework for real-time visual SLAM. By employing a novel double window scheme, we present a constant-time approach which enables the local accuracy of bundle adjustment while ensuring global consistency. Furthermore, we suggest a new scheme for local registration using metric loop closures and present several improvements for the visual front-end of SLAM. Our contributions are evaluated exhaustively on a number of synthetic experiments and real-image data-set from single cameras and range imaging devices
    corecore