26 research outputs found

    Visual simultaneous localisation and mapping for sewer pipe networks leveraging cylindrical regularity

    Get PDF
    This work proposes a novel visual Simultaneous Localisation and Mapping (vSLAM) approach for robots in sewer pipe networks. One problem of vSLAM in pipes is that the scale drifts and accuracy degrades. We propose the use of structural information to mitigate this problem via cylindrical regularity. The main novelty consists of an approach for cylinder detection that is more robust than previous methods in non-smooth sewer pipe environments. Cylindrical regularity is then incorporated into both local bundle adjustment and pose graph optimisation. The approach adopts a minimal cylinder representation with only five parameters, avoiding constraints during the optimisation in vSLAM. A further novelty is that the estimated cylinder is part of the scale drift estimation, which enables a correction to the translation estimate and this further improves the accuracy. The approach, termed Cylindrical Regularity ORB-SLAM (CRORB), is benchmarked and compared to leading visual SLAM algorithms ORB-SLAM2 and direct sparse odometry (DSO), as well as a vSLAM algorithm with cylindrical regularity developed for gas pipes, using real sewer pipe data and synthetic data generated with the Gazebo modelling software. The results demonstrate that CRORB improves substantially over the competitors, with a reduction of approximately 70% in error on real data

    PVI-DSO: Leveraging Planar Regularities for Direct Sparse Visual-Inertial Odometry

    Full text link
    The monocular Visual-Inertial Odometry (VIO) based on the direct method can leverage all the available pixels in the image to estimate the camera motion and reconstruct the environment. The denser map reconstruction provides more information about the environment, making it easier to extract structure and planar regularities. In this paper, we propose a monocular direct sparse visual-inertial odometry, which exploits the plane regularities (PVI-DSO). Our system detects coplanar information from 3D meshes generated from 3D point clouds and uses coplanar parameters to introduce coplanar constraints. In order to reduce computation and improve compactness, the plane-distance cost is directly used as the prior information of plane parameters. We conduct ablation experiments on public datasets and compare our system with other state-of-the-art algorithms. The experimental results verified leveraging the plane information can improve the accuracy of the VIO system based on the direct method

    Neural Radiance Fields for Manhattan Scenes with Unknown Manhattan Frame

    Full text link
    Novel view synthesis and 3D modeling using implicit neural field representation are shown to be very effective for calibrated multi-view cameras. Such representations are known to benefit from additional geometric and semantic supervision. Most existing methods that exploit additional supervision require dense pixel-wise labels or localized scene priors. These methods cannot benefit from high-level vague scene priors provided in terms of scenes' descriptions. In this work, we aim to leverage the geometric prior of Manhattan scenes to improve the implicit neural radiance field representations. More precisely, we assume that only the knowledge of the indoor scene (under investigation) being Manhattan is known -- with no additional information whatsoever -- with an unknown Manhattan coordinate frame. Such high-level prior is used to self-supervise the surface normals derived explicitly in the implicit neural fields. Our modeling allows us to group the derived normals and exploit their orthogonality constraints for self-supervision. Our exhaustive experiments on datasets of diverse indoor scenes demonstrate the significant benefit of the proposed method over the established baselines

    Online Synthesis Of Speculative Building Information Models For Robot Motion Planning

    Get PDF
    Autonomous mobile robots today still lack the necessary understanding of indoor environments for making informed decisions about the state of the world beyond their immediate field of view. As a result, they are forced to make conservative and often inaccurate assumptions about unexplored space, inhibiting the degree of performance being increasingly expected of them in the areas of high-speed navigation and mission planning. In order to address this limitation, this thesis explores the use of Building Information Models (BIMs) for providing the existing ecosystem of local and global planning algorithms with informative compact higher-level representations of indoor environments. Although BIMs have long been used in architecture, engineering, and construction for a number of different purposes, to our knowledge, this is the first instance of them being used in robotics. Given the technical constraints accompanying this domain, including a limited and incomplete set of observations which grows over time, the systems we present are designed such that together they produce BIMs capable of providing explanations of both the explored and unexplored space in an online fashion. The first is a SLAM system that uses the structural regularity of buildings in order to mitigate drift and provide the simplest explanation of architectural features such as floors, walls, and ceilings. The planar model generated is then passed to a secondary system that then reasons about their mutual relationships in order to provide a water-tight model of the observed and inferred freespace. Our experimental results demonstrate this to be an accurate and efficient approach towards this end

    3D Reconstruction of Indoor Corridor Models Using Single Imagery and Video Sequences

    Get PDF
    In recent years, 3D indoor modeling has gained more attention due to its role in decision-making process of maintaining the status and managing the security of building indoor spaces. In this thesis, the problem of continuous indoor corridor space modeling has been tackled through two approaches. The first approach develops a modeling method based on middle-level perceptual organization. The second approach develops a visual Simultaneous Localisation and Mapping (SLAM) system with model-based loop closure. In the first approach, the image space was searched for a corridor layout that can be converted into a geometrically accurate 3D model. Manhattan rule assumption was adopted, and indoor corridor layout hypotheses were generated through a random rule-based intersection of image physical line segments and virtual rays of orthogonal vanishing points. Volumetric reasoning, correspondences to physical edges, orientation map and geometric context of an image are all considered for scoring layout hypotheses. This approach provides physically plausible solutions while facing objects or occlusions in a corridor scene. In the second approach, Layout SLAM is introduced. Layout SLAM performs camera localization while maps layout corners and normal point features in 3D space. Here, a new feature matching cost function was proposed considering both local and global context information. In addition, a rotation compensation variable makes Layout SLAM robust against cameras orientation errors accumulations. Moreover, layout model matching of keyframes insures accurate loop closures that prevent miss-association of newly visited landmarks to previously visited scene parts. The comparison of generated single image-based 3D models to ground truth models showed that average ratio differences in widths, heights and lengths were 1.8%, 3.7% and 19.2% respectively. Moreover, Layout SLAM performed with the maximum absolute trajectory error of 2.4m in position and 8.2 degree in orientation for approximately 318m path on RAWSEEDS data set. Loop closing was strongly performed for Layout SLAM and provided 3D indoor corridor layouts with less than 1.05m displacement errors in length and less than 20cm in width and height for approximately 315m path on York University data set. The proposed methods can successfully generate 3D indoor corridor models compared to their major counterpart

    Structure from Motion with Higher-level Environment Representations

    Get PDF
    Computer vision is an important area focusing on understanding, extracting and using the information from vision-based sensor. It has many applications such as vision-based 3D reconstruction, simultaneous localization and mapping(SLAM) and data-driven understanding of the real world. Vision is a fundamental sensing modality in many different fields of application. While the traditional structure from motion mostly uses sparse point-based feature, this thesis aims to explore the possibility of using higher order feature representation. It starts with a joint work which uses straight line for feature representation and performs bundle adjustment with straight line parameterization. Then, we further try an even higher order representation where we use Bezier spline for parameterization. We start with a simple case where all contours are lying on the plane and uses Bezier splines to parametrize the curves in the background and optimize on both camera position and Bezier splines. For application, we present a complete end-to-end pipeline which produces meaningful dense 3D models from natural data of a 3D object: the target object is placed on a structured but unknown planar background that is modeled with splines. The data is captured using only a hand-held monocular camera. However, this application is limited to a planar scenario and we manage to push the parameterizations into real 3D. Following the potential of this idea, we introduce a more flexible higher-order extension of points that provide a general model for structural edges in the environment, no matter if straight or curved. Our model relies on linked B´ezier curves, the geometric intuition of which proves great benefits during parameter initialization and regularization. We present the first fully automatic pipeline that is able to generate spline-based representations without any human supervision. Besides a full graphical formulation of the problem, we introduce both geometric and photometric cues as well as higher-level concepts such overall curve visibility and viewing angle restrictions to automatically manage the correspondences in the graph. Results prove that curve-based structure from motion with splines is able to outperform state-of-the-art sparse feature-based methods, as well as to model curved edges in the environment
    corecore