43 research outputs found

    Kernelized Locality-Sensitive Hashing for Fast Image Landmark Association

    Get PDF
    As the concept of war has evolved, navigation in urban environments where GPS may be degraded is increasingly becoming more important. Two existing solutions are vision-aided navigation and vision-based Simultaneous Localization and Mapping (SLAM). The problem, however, is that vision-based navigation techniques can require excessive amounts of memory and increased computational complexity resulting in a decrease in speed. This research focuses on techniques to improve such issues by speeding up and optimizing the data association process in vision-based SLAM. Specifically, this work studies the current methods that algorithms use to associate a current robot pose to that of one previously seen and introduce another method to the image mapping arena for comparison. The current method, kd-trees, is effcient in lower dimensions, but does not narrow the search space enough in higher dimensional datasets. In this research, Kernelized Locality-Sensitive Hashing (KLSH) is implemented to conduct the aforementioned pose associations. Results on KLSH shows that fewer image comparisons are required for location identification than that of other methods. This work can then be extended into a vision-SLAM implementation to subsequently produce a map

    Simultaneous localization and mapping for inspection robots in water and sewer pipe networks: a review

    Get PDF
    At the present time, water and sewer pipe networks are predominantly inspected manually. In the near future, smart cities will perform intelligent autonomous monitoring of buried pipe networks, using teams of small robots. These robots, equipped with all necessary computational facilities and sensors (optical, acoustic, inertial, thermal, pressure and others) will be able to inspect pipes whilst navigating, selflocalising and communicating information about the pipe condition and faults such as leaks or blockages to human operators for monitoring and decision support. The predominantly manual inspection of pipe networks will be replaced with teams of autonomous inspection robots that can operate for long periods of time over a large spatial scale. Reliable autonomous navigation and reporting of faults at this scale requires effective localization and mapping, which is the estimation of the robot’s position and its surrounding environment. This survey presents an overview of state-of-the-art works on robot simultaneous localization and mapping (SLAM) with a focus on water and sewer pipe networks. It considers various aspects of the SLAM problem in pipes, from the motivation, to the water industry requirements, modern SLAM methods, map-types and sensors suited to pipes. Future challenges such as robustness for long term robot operation in pipes are discussed, including how making use of prior knowledge, e.g. geographic information systems (GIS) can be used to build map estimates, and improve the multi-robot SLAM in the pipe environmen

    Satellite and UAV Platforms, Remote Sensing for Geographic Information Systems

    Get PDF
    The present book contains ten articles illustrating the different possible uses of UAVs and satellite remotely sensed data integration in Geographical Information Systems to model and predict changes in both the natural and the human environment. It illustrates the powerful instruments given by modern geo-statistical methods, modeling, and visualization techniques. These methods are applied to Arctic, tropical and mid-latitude environments, agriculture, forest, wetlands, and aquatic environments, as well as further engineering-related problems. The present Special Issue gives a balanced view of the present state of the field of geoinformatics

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    User-oriented markerless augmented reality framework based on 3D reconstruction and loop closure detection

    Get PDF
    An augmented reality (AR) system needs to track the user-view to perform an accurate augmentation registration. The present research proposes a conceptual marker-less, natural feature-based AR framework system, the process for which is divided into two stages - an offline database training session for the application developers, and an online AR tracking and display session for the final users. In the offline session, two types of 3D reconstruction application, RGBD-SLAM and SfM are integrated into the development framework for building the reference template of a target environment. The performance and applicable conditions of these two methods are presented in the present thesis, and the application developers can choose which method to apply for their developmental demands. A general developmental user interface is provided to the developer for interaction, including a simple GUI tool for augmentation configuration. The present proposal also applies a Bag of Words strategy to enable a rapid "loop-closure detection" in the online session, for efficiently querying the application user-view from the trained database to locate the user pose. The rendering and display process of augmentation is currently implemented within an OpenGL window, which is one result of the research that is worthy of future detailed investigation and development

    Structureless Camera Motion Estimation of Unordered Omnidirectional Images

    Get PDF
    This work aims at providing a novel camera motion estimation pipeline from large collections of unordered omnidirectional images. In oder to keep the pipeline as general and flexible as possible, cameras are modelled as unit spheres, allowing to incorporate any central camera type. For each camera an unprojection lookup is generated from intrinsics, which is called P2S-map (Pixel-to-Sphere-map), mapping pixels to their corresponding positions on the unit sphere. Consequently the camera geometry becomes independent of the underlying projection model. The pipeline also generates P2S-maps from world map projections with less distortion effects as they are known from cartography. Using P2S-maps from camera calibration and world map projection allows to convert omnidirectional camera images to an appropriate world map projection in oder to apply standard feature extraction and matching algorithms for data association. The proposed estimation pipeline combines the flexibility of SfM (Structure from Motion) - which handles unordered image collections - with the efficiency of PGO (Pose Graph Optimization), which is used as back-end in graph-based Visual SLAM (Simultaneous Localization and Mapping) approaches to optimize camera poses from large image sequences. SfM uses BA (Bundle Adjustment) to jointly optimize camera poses (motion) and 3d feature locations (structure), which becomes computationally expensive for large-scale scenarios. On the contrary PGO solves for camera poses (motion) from measured transformations between cameras, maintaining optimization managable. The proposed estimation algorithm combines both worlds. It obtains up-to-scale transformations between image pairs using two-view constraints, which are jointly scaled using trifocal constraints. A pose graph is generated from scaled two-view transformations and solved by PGO to obtain camera motion efficiently even for large image collections. Obtained results can be used as input data to provide initial pose estimates for further 3d reconstruction purposes e.g. to build a sparse structure from feature correspondences in an SfM or SLAM framework with further refinement via BA. The pipeline also incorporates fixed extrinsic constraints from multi-camera setups as well as depth information provided by RGBD sensors. The entire camera motion estimation pipeline does not need to generate a sparse 3d structure of the captured environment and thus is called SCME (Structureless Camera Motion Estimation).:1 Introduction 1.1 Motivation 1.1.1 Increasing Interest of Image-Based 3D Reconstruction 1.1.2 Underground Environments as Challenging Scenario 1.1.3 Improved Mobile Camera Systems for Full Omnidirectional Imaging 1.2 Issues 1.2.1 Directional versus Omnidirectional Image Acquisition 1.2.2 Structure from Motion versus Visual Simultaneous Localization and Mapping 1.3 Contribution 1.4 Structure of this Work 2 Related Work 2.1 Visual Simultaneous Localization and Mapping 2.1.1 Visual Odometry 2.1.2 Pose Graph Optimization 2.2 Structure from Motion 2.2.1 Bundle Adjustment 2.2.2 Structureless Bundle Adjustment 2.3 Corresponding Issues 2.4 Proposed Reconstruction Pipeline 3 Cameras and Pixel-to-Sphere Mappings with P2S-Maps 3.1 Types 3.2 Models 3.2.1 Unified Camera Model 3.2.2 Polynomal Camera Model 3.2.3 Spherical Camera Model 3.3 P2S-Maps - Mapping onto Unit Sphere via Lookup Table 3.3.1 Lookup Table as Color Image 3.3.2 Lookup Interpolation 3.3.3 Depth Data Conversion 4 Calibration 4.1 Overview of Proposed Calibration Pipeline 4.2 Target Detection 4.3 Intrinsic Calibration 4.3.1 Selected Examples 4.4 Extrinsic Calibration 4.4.1 3D-2D Pose Estimation 4.4.2 2D-2D Pose Estimation 4.4.3 Pose Optimization 4.4.4 Uncertainty Estimation 4.4.5 PoseGraph Representation 4.4.6 Bundle Adjustment 4.4.7 Selected Examples 5 Full Omnidirectional Image Projections 5.1 Panoramic Image Stitching 5.2 World Map Projections 5.3 World Map Projection Generator for P2S-Maps 5.4 Conversion between Projections based on P2S-Maps 5.4.1 Proposed Workflow 5.4.2 Data Storage Format 5.4.3 Real World Example 6 Relations between Two Camera Spheres 6.1 Forward and Backward Projection 6.2 Triangulation 6.2.1 Linear Least Squares Method 6.2.2 Alternative Midpoint Method 6.3 Epipolar Geometry 6.4 Transformation Recovery from Essential Matrix 6.4.1 Cheirality 6.4.2 Standard Procedure 6.4.3 Simplified Procedure 6.4.4 Improved Procedure 6.5 Two-View Estimation 6.5.1 Evaluation Strategy 6.5.2 Error Metric 6.5.3 Evaluation of Estimation Algorithms 6.5.4 Concluding Remarks 6.6 Two-View Optimization 6.6.1 Epipolar-Based Error Distances 6.6.2 Projection-Based Error Distances 6.6.3 Comparison between Error Distances 6.7 Two-View Translation Scaling 6.7.1 Linear Least Squares Estimation 6.7.2 Non-Linear Least Squares Optimization 6.7.3 Comparison between Initial and Optimized Scaling Factor 6.8 Homography to Identify Degeneracies 6.8.1 Homography for Spherical Cameras 6.8.2 Homography Estimation 6.8.3 Homography Optimization 6.8.4 Homography and Pure Rotation 6.8.5 Homography in Epipolar Geometry 7 Relations between Three Camera Spheres 7.1 Three View Geometry 7.2 Crossing Epipolar Planes Geometry 7.3 Trifocal Geometry 7.4 Relation between Trifocal, Three-View and Crossing Epipolar Planes 7.5 Translation Ratio between Up-To-Scale Two-View Transformations 7.5.1 Structureless Determination Approaches 7.5.2 Structure-Based Determination Approaches 7.5.3 Comparison between Proposed Approaches 8 Pose Graphs 8.1 Optimization Principle 8.2 Solvers 8.2.1 Additional Graph Solvers 8.2.2 False Loop Closure Detection 8.3 Pose Graph Generation 8.3.1 Generation of Synthetic Pose Graph Data 8.3.2 Optimization of Synthetic Pose Graph Data 9 Structureless Camera Motion Estimation 9.1 SCME Pipeline 9.2 Determination of Two-View Translation Scale Factors 9.3 Integration of Depth Data 9.4 Integration of Extrinsic Camera Constraints 10 Camera Motion Estimation Results 10.1 Directional Camera Images 10.2 Omnidirectional Camera Images 11 Conclusion 11.1 Summary 11.2 Outlook and Future Work Appendices A.1 Additional Extrinsic Calibration Results A.2 Linear Least Squares Scaling A.3 Proof Rank Deficiency A.4 Alternative Derivation Midpoint Method A.5 Simplification of Depth Calculation A.6 Relation between Epipolar and Circumferential Constraint A.7 Covariance Estimation A.8 Uncertainty Estimation from Epipolar Geometry A.9 Two-View Scaling Factor Estimation: Uncertainty Estimation A.10 Two-View Scaling Factor Optimization: Uncertainty Estimation A.11 Depth from Adjoining Two-View Geometries A.12 Alternative Three-View Derivation A.12.1 Second Derivation Approach A.12.2 Third Derivation Approach A.13 Relation between Trifocal Geometry and Alternative Midpoint Method A.14 Additional Pose Graph Generation Examples A.15 Pose Graph Solver Settings A.16 Additional Pose Graph Optimization Examples Bibliograph

    Understanding the Structure of 3D Shapes

    Get PDF
    Compact representations of three dimensional objects are very often used in computer graphics to create effective ways to analyse, manipulate and transmit 3D models. Their ability to abstract from the concrete shapes and expose their structure is important in a number of applications, spanning from computer animation, to medicine, to physical simulations. This thesis will investigate new methods for the generation of compact shape representations. In the first part, the problem of computing optimal PolyCube base complexes will be considered. PolyCubes are orthogonal polyhedra used in computer graphics to map both surfaces and volumes. Their ability to resemble the original models and at the same time expose a very simple and regular structure is important in a number of applications, such as texture mapping, spline fitting and hex-meshing. The second part will focus on medial descriptors. In particular, two new algorithms for the generation of curve-skeletons will be presented. These methods are completely based on the visual appearance of the input, therefore they are independent from the type, number and quality of the primitives used to describe a shape, determining, thus, an advancement to the state of the art in the field

    Understanding the Structure of 3D Shapes

    Get PDF
    Compact representations of three dimensional objects are very often used in computer graphics to create effective ways to analyse, manipulate and transmit 3D models. Their ability to abstract from the concrete shapes and expose their structure is important in a number of applications, spanning from computer animation, to medicine, to physical simulations. This thesis will investigate new methods for the generation of compact shape representations. In the first part, the problem of computing optimal PolyCube base complexes will be considered. PolyCubes are orthogonal polyhedra used in computer graphics to map both surfaces and volumes. Their ability to resemble the original models and at the same time expose a very simple and regular structure is important in a number of applications, such as texture mapping, spline fitting and hex-meshing. The second part will focus on medial descriptors. In particular, two new algorithms for the generation of curve-skeletons will be presented. These methods are completely based on the visual appearance of the input, therefore they are independent from the type, number and quality of the primitives used to describe a shape, determining, thus, an advancement to the state of the art in the field

    3D Scene Geometry Estimation from 360∘^\circ Imagery: A Survey

    Full text link
    This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360∘^\circ, spherical or panoramic) images and videos. We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data. The classical stereo matching is then revised on the spherical domain, where methodologies for detecting and describing sparse and dense features become crucial. The stereo matching concepts are then extrapolated for multiple view camera setups, categorizing them among light fields, multi-view stereo, and structure from motion (or visual simultaneous localization and mapping). We also compile and discuss commonly adopted datasets and figures of merit indicated for each purpose and list recent results for completeness. We conclude this paper by pointing out current and future trends.Comment: Published in ACM Computing Survey

    Medical SLAM in an autonomous robotic system

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-operative morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilities by observing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted instruments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This thesis addresses the ambitious goal of achieving surgical autonomy, through the study of the anatomical environment by Initially studying the technology present and what is needed to analyze the scene: vision sensors. A novel endoscope for autonomous surgical task execution is presented in the first part of this thesis. Which combines a standard stereo camera with a depth sensor. This solution introduces several key advantages, such as the possibility of reconstructing the 3D at a greater distance than traditional endoscopes. Then the problem of hand-eye calibration is tackled, which unites the vision system and the robot in a single reference system. Increasing the accuracy in the surgical work plan. In the second part of the thesis the problem of the 3D reconstruction and the algorithms currently in use were addressed. In MIS, simultaneous localization and mapping (SLAM) can be used to localize the pose of the endoscopic camera and build ta 3D model of the tissue surface. Another key element for MIS is to have real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy. Starting from the ORB-SLAM algorithm we have modified the architecture to make it usable in an anatomical environment by adding the registration of the pre-operative information of the intervention to the map obtained from the SLAM. Once it has been proven that the slam algorithm is usable in an anatomical environment, it has been improved by adding semantic segmentation to be able to distinguish dynamic features from static ones. All the results in this thesis are validated on training setups, which mimics some of the challenges of real surgery and on setups that simulate the human body within Autonomous Robotic Surgery (ARS) and Smart Autonomous Robotic Assistant Surgeon (SARAS) projects
    corecore