9 research outputs found

    Attention and Anticipation in Fast Visual-Inertial Navigation

    Get PDF
    We study a Visual-Inertial Navigation (VIN) problem in which a robot needs to estimate its state using an on-board camera and an inertial sensor, without any prior knowledge of the external environment. We consider the case in which the robot can allocate limited resources to VIN, due to tight computational constraints. Therefore, we answer the following question: under limited resources, what are the most relevant visual cues to maximize the performance of visual-inertial navigation? Our approach has four key ingredients. First, it is task-driven, in that the selection of the visual cues is guided by a metric quantifying the VIN performance. Second, it exploits the notion of anticipation, since it uses a simplified model for forward-simulation of robot dynamics, predicting the utility of a set of visual cues over a future time horizon. Third, it is efficient and easy to implement, since it leads to a greedy algorithm for the selection of the most relevant visual cues. Fourth, it provides formal performance guarantees: we leverage submodularity to prove that the greedy selection cannot be far from the optimal (combinatorial) selection. Simulations and real experiments on agile drones show that our approach ensures state-of-the-art VIN performance while maintaining a lean processing time. In the easy scenarios, our approach outperforms appearance-based feature selection in terms of localization errors. In the most challenging scenarios, it enables accurate visual-inertial navigation while appearance-based feature selection fails to track robot's motion during aggressive maneuvers.Comment: 20 pages, 7 figures, 2 table

    Information-Driven Direct RGB-D Odometry

    Get PDF
    This paper presents an information-theoretic approach to point selection for direct RGB-D odometry. The aim is to select only the most informative measurements, in order to reduce the optimization problem with a minimal impact in the accuracy. It is usual practice in visual odometry/SLAM to track several hundreds of points, achieving real-time performance in high-end desktop PCs. Reducing their computational footprint will facilitate the implementation of odometry and SLAM in low-end platforms such as small robots and AR/VR glasses. Our experimental results show that our novel information-based selection criteria allows us to reduce the number of tracked points an order of magnitude (down to only 24 of them), achieving an accuracy similar to the state of the art (sometimes outperforming it) while reducing 10× the computational demand

    Ultrasound-Augmented Laparoscopy

    Get PDF
    Laparoscopic surgery is perhaps the most common minimally invasive procedure for many diseases in the abdomen. Since the laparoscopic camera provides only the surface view of the internal organs, in many procedures, surgeons use laparoscopic ultrasound (LUS) to visualize deep-seated surgical targets. Conventionally, the 2D LUS image is visualized in a display spatially separate from that displays the laparoscopic video. Therefore, reasoning about the geometry of hidden targets requires mentally solving the spatial alignment, and resolving the modality differences, which is cognitively very challenging. Moreover, the mental representation of hidden targets in space acquired through such cognitive medication may be error prone, and cause incorrect actions to be performed. To remedy this, advanced visualization strategies are required where the US information is visualized in the context of the laparoscopic video. To this end, efficient computational methods are required to accurately align the US image coordinate system with that centred in the camera, and to render the registered image information in the context of the camera such that surgeons perceive the geometry of hidden targets accurately. In this thesis, such a visualization pipeline is described. A novel method to register US images with a camera centric coordinate system is detailed with an experimental investigation into its accuracy bounds. An improved method to blend US information with the surface view is also presented with an experimental investigation into the accuracy of perception of the target locations in space

    Visual SLAM for Measurement and Augmented Reality in Laparoscopic Surgery

    Get PDF
    In spite of the great advances in laparoscopic surgery, this type of surgery still shows some difficulties during its realization, mainly caused by its complex maneuvers and, above all, by the loss of the depth perception. Unlike classical open surgery --laparotomy-- where surgeons have direct contact with organs and a complete 3D perception, laparoscopy is carried out by means of specialized instruments, and a monocular camera (laparoscope) in which the 3D scene is projected into a 2D plane --image. The main goal of this thesis is to face with this loss of depth perception by making use of Simultaneous Localization and Mapping (SLAM) algorithms developed in the fields of robotics and computer vision during the last years. These algorithms allow to localize, in real time (25 ∌\thicksim 30 frames per second), a camera that moves freely inside an unknown rigid environment while, at the same time, they build a map of this environment by exploiting images gathered by that camera. These algorithms have been extensively validated both in man-made environments (buildings, rooms, ...) and in outdoor environments, showing robustness to occlusions, sudden camera motions, or clutter. This thesis tries to extend the use of these algorithms to laparoscopic surgery. Due to the intrinsic nature of internal body images (they suffer from deformations, specularities, variable illumination conditions, limited movements, ...), applying this type of algorithms to laparoscopy supposes a real challenge. Knowing the camera (laparoscope) location with respect to the scene (abdominal cavity) and the 3D map of that scene opens new interesting possibilities inside the surgical field. This knowledge enables to do augmented reality annotations directly on the laparoscopic images (e.g. alignment of preoperative 3D CT models); intracavity 3D distance measurements; or photorealistic 3D reconstructions of the abdominal cavity recovering synthetically the lost depth. These new facilities provide security and rapidity to surgical procedures without disturbing the classical procedure workflow. Hence, these tools are available inside the surgeon's armory, being the surgeon who decides to use them or not. Additionally, knowledge of the camera location with respect to the patient's abdominal cavity is fundamental for future development of robots that can operate automatically since, knowing this location, the robot will be able to localize other tools controlled by itself with respect to the patient. In detail, the contributions of this thesis are: - To demonstrate the feasibility of applying SLAM algorithms to laparoscopy showing experimentally that using robust data association is a must. - To robustify one of these algorithms, in particular the monocular EKF-SLAM algorithm, by adapting a relocalization system and improving data association with a robust matching algorithm. - To develop of a robust matching method (1-Point RANSAC algorithm). - To develop a new surgical procedure to ease the use of visual SLAM in laparoscopy. - To make an extensive validation of the robust EKF-SLAM (EKF + relocalization + 1-Point RANSAC) obtaining millimetric errors and working in real time both on simulation and real human surgeries. The selected surgery has been the ventral hernia repair. - To demonstrate the potential of these algorithms in laparoscopy: they recover synthetically the depth of the operative field which is lost by using monocular laparoscopes, enable the insertion of augmented reality annotations, and allow to perform distance measurements using only a laparoscopic tool (to define the real scale) and laparoscopic images. - To make a clinical validation showing that these algorithms allow to shorten surgical times of operations and provide more security to the surgical procedures

    Self-Localization for Autonomous Driving Using Vector Maps and Multi-Modal Odometry

    Get PDF
    One of the fundamental requirements in automated driving is having accurate vehicle localization. It is because different modules such as motion planning and control require accurate location and heading of the ego-vehicle to navigate within the drivable region safely. Global Navigation Satellite Systems (GNSS) can provide the geolocation of the vehicle in different outdoor environments. However, they suffer from poor observability and even signal loss in GNSS-denied environments such as city canyons. Map-based self-localization systems are the other tools to estimate the pose of the vehicle in known environments. The main purpose of this research is to design a real-time self-localization system for autonomous driving. To provide short-term constraints over the self-localization system a multi-modal vehicle odometry algorithm is developed that fuses an Inertial Measurement Unit (IMU), a camera, a Lidar, and a GNSS through an Error-State Kalman Filter (ESKF). Additionally, a Machine-Learning (ML)-based odometry algorithm is developed to compensate for the self-localization unavailability through kernel-based regression models that fuse IMU, encoders, and a steering sensor along with recent historical measurement data. The simulation and experimental results demonstrate that the vehicle odometry can be estimated with good accuracy. Based on the main objective of the thesis, a novel computationally efficient self-localization algorithm is developed that uses geospatial information from High-Definition (HD) maps along with observation of nearby landmarks. This approach uses situation- and uncertainty-aware attention mechanisms to select “suitable” landmarks at any drivable location within the known environment based on their observability and level of uncertainty. By using landmarks that are invariant to seasonal changes and knowing “where to look” proactively, robustness and computational efficiency are improved. The developed localization system is implemented and experimentally evaluated on WATonoBus, the University of Waterloo's autonomous shuttle. The experimental results confirm excellent computational efficiency and good accuracy

    From images to augmented 3D models: improved visual SLAM and augmented point cloud modeling

    Get PDF
    This thesis investigates into the problem of using monocular image sequences to generate augmented models. The problem is decomposed to two subproblems: monocular visual simultaneously localization and mapping (VSLAM), and the point cloud data modeling. Accordingly, the thesis comprises two major parts. The First part, including Chapters 2, 3 and 4, aims to leverage the system observability theories to improve the VSLAM accuracy. In Chapter 2, a piece-wise linear system is developed to model VSLAM, and two necessary conditions are proved to make the VSLAM completely observable. Based on the First condition, an instantaneous condition for complete observability, the "Optimally Observable and Minimal Cardinality (OOMC) VSLAM" is presented in Chapter 3. The OOMC algorithm selects the feature subset of minimal required cardinality to form the strongest observable VSLAM subsystem. The select feature subset is further used to improve the data association in VSLAM. Based on the second condition, a temporal condition for complete observability, the "Good Features (GF) to Track for VSLAM" is presented in Chapter 4. The GF algorithm ranks the individual features according to their contributions to system observability. Benchmarking experiments of both OOMC and GF algorithms demonstrate improvements in VSLAM performance. The second part, including Chapters 5 and 6, aims to solve the PCD modeling problem in a geometry-driven manner. Chapter 5 presents an algorithm to model PCDs with planar patches via a sparsity-inducing optimization. Chapter 6 extends the PCD modeling to quadratic surface primitives based models. A method is further developed to retrieve the high-level semantic information of the model components. Evaluation on the PCDs generated from VSLAM demonstrates the effectiveness of these geometry-driven PCD modeling approaches.Ph.D

    Multi-camera simultaneous localization and mapping

    Get PDF
    In this thesis, we study two aspects of simultaneous localization and mapping (SLAM) for multi-camera systems: minimal solution methods for the scaled motion of non-overlapping and partially overlapping two camera systems and enabling online, real-time mapping of large areas using the parallelism inherent in the visual simultaneous localization and mapping (VSLAM) problem. We present the only existing minimal solution method for six degree of freedom structure and motion estimation using a non-overlapping, rigid two camera system with known intrinsic and extrinsic calibration. One example application of our method is the three-dimensional reconstruction of urban scenes from video. Because our method does not require the cameras' fields-of-view to overlap, we are able to maximize coverage of the scene and avoid processing redundant, overlapping imagery. Additionally, we developed a minimal solution method for partially overlapping stereo camera systems to overcome degeneracies inherent to non-overlapping two-camera systems but still have a wide total field of view. The method takes two stereo images as its input. It uses one feature visible in all four views and three features visible across two temporal view pairs to constrain the system camera's motion. We show in synthetic experiments that our method creates rotation and translation estimates that are more accurate than the perspective three-point method as the overlap in the stereo camera's fields-of-view is reduced. A final part of this thesis is the development of an online, real-time visual SLAM system that achieves real-time speed by exploiting the parallelism inherent in the VSLAM problem. We show that feature tracking, relative pose estimation, and global mapping operations such as loop detection and loop correction can be effectively parallelized. Additionally, we demonstrate that a combination of short baseline, differentially tracked corner features, which can be tracked at high frame rates and wide baseline matchable but slower to compute features such as the scale-invariant feature transform can facilitate high speed visual odometry and at the same time support location recognition for loop detection and global geometric error correction

    Informed Data Selection For Dynamic Multi-Camera Clusters

    Get PDF
    Traditional multi-camera systems require a fixed calibration between cameras to provide the solution at the correct scale, which places many limitations on its performance. This thesis investigates the calibration of dynamic camera clusters, or DCCs, where one or more of the cluster cameras is mounted to an actuated mechanism, such as a gimbal or robotic manipulator. Our novel calibration approach parameterizes the actuated mechanism using the Denavit-Hartenberg convention, then determines the calibration parameters which allow for the estimation of the time varying extrinsic transformations between the static and dynamic camera frames. A degeneracy analysis is also presented, which identifies redundant parameters of the DCC calibration system. In order to automate the calibration process, this thesis also presents two information theoretic methods which selects the optimal calibration viewpoints using a next-best-view strategy. The first strategy looks at minimizing the entropy of the calibration parameters, while the second method selects the viewpoints which maximize the mutual information between the joint angle input and calibration parameters. Finally, the effective selection of key-frames is also an essential aspect of robust visual navigation algorithms, as it ensures metrically consistent mapping solutions while reducing the computational complexity of the bundle adjustment process. To that end, we propose two entropy based methods which aim to insert key-frames that will directly improve the system's ability to localize. The first approach inserts key-frames based on the cumulative point entropy reduction in the existing map, while the second approach uses the predicted point flow discrepancy to select key-frames which best initialize new features for the camera to track against in the future. The DCC calibration methods are verified in both simulation and using physical hardware consisting of a 5-DOF Fanuc manipulator and a 3-DOF Aeryon Skyranger gimbal. We demonstrate that the proposed methods are able to achieve high quality calibrations using RMSE pixel error metrics, as well as through analysis of the estimator covariance matrix. The key-frame insertion methods are implemented within the Multi-Camera Parallel Mapping and Tracking (MCPTAM) framework, and we confirm the effectiveness of these approaches using high quality ground truth collected using an indoor positioning system
    corecore