70 research outputs found

    2D-3D Pose Tracking with Multi-View Constraints

    Full text link
    Camera localization in 3D LiDAR maps has gained increasing attention due to its promising ability to handle complex scenarios, surpassing the limitations of visual-only localization methods. However, existing methods mostly focus on addressing the cross-modal gaps, estimating camera poses frame by frame without considering the relationship between adjacent frames, which makes the pose tracking unstable. To alleviate this, we propose to couple the 2D-3D correspondences between adjacent frames using the 2D-2D feature matching, establishing the multi-view geometrical constraints for simultaneously estimating multiple camera poses. Specifically, we propose a new 2D-3D pose tracking framework, which consists: a front-end hybrid flow estimation network for consecutive frames and a back-end pose optimization module. We further design a cross-modal consistency-based loss to incorporate the multi-view constraints during the training and inference process. We evaluate our proposed framework on the KITTI and Argoverse datasets. Experimental results demonstrate its superior performance compared to existing frame-by-frame 2D-3D pose tracking methods and state-of-the-art vision-only pose tracking algorithms. More online pose tracking videos are available at \url{https://youtu.be/yfBRdg7gw5M}Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Visual SLAM muuttuvissa ympäristöissä

    Get PDF
    This thesis investigates the problem of Visual Simultaneous Localization and Mapping (vSLAM) in changing environments. The vSLAM problem is to sequentially estimate the pose of a device with mounted cameras in a map generated based on images taken with those cameras. vSLAM algorithms face two main challenges in changing environments: moving objects and temporal appearance changes. Moving objects cause problems in pose estimation if they are mistaken for static objects. Moving objects also cause problems for loop closure detection (LCD), which is the problem of detecting whether a previously visited place has been revisited. A same moving object observed in two different places may cause false loop closures to be detected. Temporal appearance changes such as those brought about by time of day or weather changes cause long-term data association errors for LCD. These cause difficulties in recognizing previously visited places after they have undergone appearance changes. Focus is placed on LCD, which turns out to be the part of vSLAM that changing environment affects the most. In addition, several techniques and algorithms for Visual Place Recognition (VPR) in challenging conditions that could be used in the context of LCD are surveyed and the performance of two state-of-the-art modern VPR algorithms in changing environments is assessed in an experiment in order to measure their applicability for LCD. The most severe performance degrading appearance changes are found to be those caused by change in season and illumination. Several algorithms and techniques that perform well in loop closure related tasks in specific environmental conditions are identified as a result of the survey. Finally, a limited experiment on the Nordland dataset implies that the tested VPR algorithms are usable as is or can be modified for use in long-term LCD. As a part of the experiment, a new simple neighborhood consistency check was also developed, evaluated, and found to be effective at reducing false positives output by the tested VPR algorithms

    Depth Estimation Using 2D RGB Images

    Get PDF
    Single image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. As a result, in the first part of this dissertation, the idea of constraining the model for more accurate depth estimation by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene is explored. Although deep learning based methods are very successful in computer vision and handle noise very well, they suffer from poor generalization when the test and train distributions are not close. While, the geometric methods do not have the generalization problem since they benefit from temporal information in an unsupervised manner. They are sensitive to noise, though. At the same time, explicitly modeling of a dynamic scenes as well as flexible objects in traditional computer vision methods is a big challenge. Considering the advantages and disadvantages of each approach, a hybrid method, which benefits from both, is proposed here by extending traditional geometric models’ abilities to handle flexible and dynamic objects in the scene. This is made possible by relaxing geometric computer vision rules from one motion model for some areas of the scene into one for every pixel in the scene. This enables the model to detect even small, flexible, floating debris in a dynamic scene. However, it makes the optimization under-constrained. To change the optimization from under-constrained to over-constrained while maintaining the model’s flexibility, ”moving object detection loss” and ”synchrony loss” are designed. The algorithm is trained in an unsupervised fashion. The primary results are in no way comparable to the current state of the art. Because the training process is so slow, it is difficult to compare it to the current state of the art. Also, the algorithm lacks stability. In addition, the optical flow model is extremely noisy and naive. At the end, some solutions are suggested to address these issues

    A Comprehensive Introduction of Visual-Inertial Navigation

    Full text link
    In this article, a tutorial introduction to visual-inertial navigation(VIN) is presented. Visual and inertial perception are two complementary sensing modalities. Cameras and inertial measurement units (IMU) are the corresponding sensors for these two modalities. The low cost and light weight of camera-IMU sensor combinations make them ubiquitous in robotic navigation. Visual-inertial Navigation is a state estimation problem, that estimates the ego-motion and local environment of the sensor platform. This paper presents visual-inertial navigation in the classical state estimation framework, first illustrating the estimation problem in terms of state variables and system models, including related quantities representations (Parameterizations), IMU dynamic and camera measurement models, and corresponding general probabilistic graphical models (Factor Graph). Secondly, we investigate the existing model-based estimation methodologies, these involve filter-based and optimization-based frameworks and related on-manifold operations. We also discuss the calibration of some relevant parameters, also initialization of state of interest in optimization-based frameworks. Then the evaluation and improvement of VIN in terms of accuracy, efficiency, and robustness are discussed. Finally, we briefly mention the recent development of learning-based methods that may become alternatives to traditional model-based methods.Comment: 35 pages, 10 figure

    Distributed scene reconstruction from multiple mobile platforms

    Get PDF
    Recent research on mobile robotics has produced new designs that provide house-hold robots with omnidirectional motion. The image sensor embedded in these devices motivates the application of 3D vision techniques on them for navigation and mapping purposes. In addition to this, distributed cheapsensing systems acting as unitary entity have recently been discovered as an efficient alternative to expensive mobile equipment. In this work we present an implementation of a visual reconstruction method, structure from motion (SfM), on a low-budget, omnidirectional mobile platform, and extend this method to distributed 3D scene reconstruction with several instances of such a platform. Our approach overcomes the challenges yielded by the plaform. The unprecedented levels of noise produced by the image compression typical of the platform is processed by our feature filtering methods, which ensure suitable feature matching populations for epipolar geometry estimation by means of a strict quality-based feature selection. The robust pose estimation algorithms implemented, along with a novel feature tracking system, enable our incremental SfM approach to novelly deal with ill-conditioned inter-image configurations provoked by the omnidirectional motion. The feature tracking system developed efficiently manages the feature scarcity produced by noise and outputs quality feature tracks, which allow robust 3D mapping of a given scene even if - due to noise - their length is shorter than what it is usually assumed for performing stable 3D reconstructions. The distributed reconstruction from multiple instances of SfM is attained by applying loop-closing techniques. Our multiple reconstruction system merges individual 3D structures and resolves the global scale problem with minimal overlaps, whereas in the literature 3D mapping is obtained by overlapping stretches of sequences. The performance of this system is demonstrated in the 2-session case. The management of noise, the stability against ill-configurations and the robustness of our SfM system is validated on a number of experiments and compared with state-of-the-art approaches. Possible future research areas are also discussed

    PERCEPTION FOR SURVEILLANCE: LEARNING SELF-LOCALISATION AND INTRUDERS DETECTION FROM MONOCULAR IMAGES OF AN AERIAL ROBOT IN OUTDOOR URBAN ENVIRONMENTS

    Get PDF
    Unmanned aerial vehicles (UAVs), more commonly named drones, are one of the most versatile robotic platforms for their high mobility and low-cost design. Therefore, they have been applied to numerous civil applications. These robots generally can complete autonomous or semi-autonomous missions by undertaking complex calculations on their autopilot system based on the sensors' observations to control their attitude and speed and to plan and track a trajectory for navigating in a possibly unknown environment without human intervention. However, to enable higher degrees of autonomy, the perception system is paramount for extracting valuable knowledge that allows interaction with the external world. Therefore, this thesis aims to solve the core perception challenges of an autonomous surveillance application carried out by an aerial robot in an outdoor urban environment. We address a simplified use case of patrolling missions to monitor a confined area around buildings that is supposedly under access restriction. Hence, we identify the main research questions involved in this application context. On the one hand, the drone has to locate itself in a controlled navigation environment, keep track of its pose while flying, and understand the geometrical structure of the 3D scene around it. On the other hand, the surveillance mission entails detecting and localising people in the monitored area. Consequently, we develop numerous methodologies to address these challenging questions. Furthermore, constraining the UAV's sensor array to a monocular RGB camera, we approach the raised problems with algorithms in the computer vision field. First, we train a neural network with an unsupervised learning paradigm to predict the drone ego-motion and the geometrical scene structure. Hence, we introduce a novel algorithm that integrates a model-free epipolar method to adjust online the rotational drift of the trajectory estimated by the trained pose network. Second, we employ an efficient Convolutional Neural Network (CNN) architecture to regress the UAV global metric pose directly from a single colour image. Moreover, we investigate how dynamic objects in the camera field of view affect the localisation performance of such an approach. Following, we discuss the implementation of an object detection network and derive the equations to find the 3D position of the detected people in a reconstructed environment. Next, we describe the theory behind structure-from-motion and use it to recreate a 3D model of a dataset recorded with a drone at the University of Luxembourg's Belval campus. Ultimately, we perform multiple experiments to validate and evaluate our proposed algorithms with other state-of-the-art methodologies. Results show the superiority of our methods in different metrics. Also, in our analysis, we determine the limitations and highlight the benefits of the adopted strategies compared to other approaches. Finally, the introduced dataset provides an additional tool for benchmarking perception algorithms and future application developments

    Autonomous Navigation in Complex Indoor and Outdoor Environments with Micro Aerial Vehicles

    Get PDF
    Micro aerial vehicles (MAVs) are ideal platforms for surveillance and search and rescue in confined indoor and outdoor environments due to their small size, superior mobility, and hover capability. In such missions, it is essential that the MAV is capable of autonomous flight to minimize operator workload. Despite recent successes in commercialization of GPS-based autonomous MAVs, autonomous navigation in complex and possibly GPS-denied environments gives rise to challenging engineering problems that require an integrated approach to perception, estimation, planning, control, and high level situational awareness. Among these, state estimation is the first and most critical component for autonomous flight, especially because of the inherently fast dynamics of MAVs and the possibly unknown environmental conditions. In this thesis, we present methodologies and system designs, with a focus on state estimation, that enable a light-weight off-the-shelf quadrotor MAV to autonomously navigate complex unknown indoor and outdoor environments using only onboard sensing and computation. We start by developing laser and vision-based state estimation methodologies for indoor autonomous flight. We then investigate fusion from heterogeneous sensors to improve robustness and enable operations in complex indoor and outdoor environments. We further propose estimation algorithms for on-the-fly initialization and online failure recovery. Finally, we present planning, control, and environment coverage strategies for integrated high-level autonomy behaviors. Extensive online experimental results are presented throughout the thesis. We conclude by proposing future research opportunities

    A hybrid visual-based SLAM architecture: local filter-based SLAM with keyframe-based global mapping

    Get PDF
    This work presents a hybrid visual-based SLAM architecture that aims to take advantage of the strengths of each of the two main methodologies currently available for implementing visual-based SLAM systems, while at the same time minimizing some of their drawbacks. The main idea is to implement a local SLAM process using a filter-based technique, and enable the tasks of building and maintaining a consistent global map of the environment, including the loop closure problem, to use the processes implemented using optimization-based techniques. Different variants of visual-based SLAM systems can be implemented using the proposed architecture. This work also presents the implementation case of a full monocular-based SLAM system for unmanned aerial vehicles that integrates additional sensory inputs. Experiments using real data obtained from the sensors of a quadrotor are presented to validate the feasibility of the proposed approachPostprint (published version

    (LC)2^2: LiDAR-Camera Loop Constraints For Cross-Modal Place Recognition

    Full text link
    Localization has been a challenging task for autonomous navigation. A loop detection algorithm must overcome environmental changes for the place recognition and re-localization of robots. Therefore, deep learning has been extensively studied for the consistent transformation of measurements into localization descriptors. Street view images are easily accessible; however, images are vulnerable to appearance changes. LiDAR can robustly provide precise structural information. However, constructing a point cloud database is expensive, and point clouds exist only in limited places. Different from previous works that train networks to produce shared embedding directly between the 2D image and 3D point cloud, we transform both data into 2.5D depth images for matching. In this work, we propose a novel cross-matching method, called (LC)2^2, for achieving LiDAR localization without a prior point cloud map. To this end, LiDAR measurements are expressed in the form of range images before matching them to reduce the modality discrepancy. Subsequently, the network is trained to extract localization descriptors from disparity and range images. Next, the best matches are employed as a loop factor in a pose graph. Using public datasets that include multiple sessions in significantly different lighting conditions, we demonstrated that LiDAR-based navigation systems could be optimized from image databases and vice versa.Comment: 8 pages, 11 figures, Accepted to IEEE Robotics and Automation Letters (RA-L

    Localization of Autonomous Vehicles in Urban Environments

    Full text link
    The future of applications such as last-mile delivery, infrastructure inspection and surveillance bets big on employing small autonomous drones and ground robots in cluttered urban settings where precise positioning is critical. However, when navigating close to buildings, GPS-based localisation of robotic platforms is noisy due to obscured reception and multi-path reflection. Localisation methods using introspective sensors like monocular and stereo cameras mounted on the platforms offer a better alternative as they are suitable for both indoor and outdoor operations. However, the inherent drift in the estimated trajectory is often evident in the 7 degrees of freedom that captures scaling, rotation and translation motion, and needs to be corrected. The theme of the thesis is to use a pre-existing 3D model to supplement the pose estimation from a visual navigation system, reducing incremental drift and thereby improving localisation accuracy. The novel framework developed for the monocular camera first extracts the geometric relationship between the pixels of the calibrated camera and the 3D points on the model. These geometric constraints, when used in addition to the relative pose constraints typically used in Simultaneous Localisation and Mapping (SLAM) algorithms, provide superior trajectory estimation. Further, scale drift correction is proposed using a novel SIM3SIM_3 optimisation procedure and successfully demonstrated using a unique dataset that embodies many urban localisation challenges. Techniques developed for Stereo camera localisation aligns the textured 3D stereo scans with respect to a 3D model and estimates the associated camera pose. The idea is to solve the image registration problem between the projection of the 3D scan and images whose poses are accurately known with respect to the 3D model. The 2D motion parameters are then mapped to the 3D space for camera pose estimation. Novel image registration techniques are developed which use image edge information combined with traditional approaches to show successful results
    • …
    corecore