26 research outputs found
Visual simultaneous localisation and mapping for sewer pipe networks leveraging cylindrical regularity
This work proposes a novel visual Simultaneous Localisation and Mapping (vSLAM) approach for robots in sewer pipe networks. One problem of vSLAM in pipes is that the scale drifts and accuracy degrades. We propose the use of structural information to mitigate this problem via cylindrical regularity. The main novelty consists of an approach for cylinder detection that is more robust than previous methods in non-smooth sewer pipe environments. Cylindrical regularity is then incorporated into both local bundle adjustment and pose graph optimisation. The approach adopts a minimal cylinder representation with only five parameters, avoiding constraints during the optimisation in vSLAM. A further novelty is that the estimated cylinder is part of the scale drift estimation, which enables a correction to the translation estimate and this further improves the accuracy. The approach, termed Cylindrical Regularity ORB-SLAM (CRORB), is benchmarked and compared to leading visual SLAM algorithms ORB-SLAM2 and direct sparse odometry (DSO), as well as a vSLAM algorithm with cylindrical regularity developed for gas pipes, using real sewer pipe data and synthetic data generated with the Gazebo modelling software. The results demonstrate that CRORB improves substantially over the competitors, with a reduction of approximately 70% in error on real data
PVI-DSO: Leveraging Planar Regularities for Direct Sparse Visual-Inertial Odometry
The monocular Visual-Inertial Odometry (VIO) based on the direct method can
leverage all the available pixels in the image to estimate the camera motion
and reconstruct the environment. The denser map reconstruction provides more
information about the environment, making it easier to extract structure and
planar regularities. In this paper, we propose a monocular direct sparse
visual-inertial odometry, which exploits the plane regularities (PVI-DSO). Our
system detects coplanar information from 3D meshes generated from 3D point
clouds and uses coplanar parameters to introduce coplanar constraints. In order
to reduce computation and improve compactness, the plane-distance cost is
directly used as the prior information of plane parameters. We conduct ablation
experiments on public datasets and compare our system with other
state-of-the-art algorithms. The experimental results verified leveraging the
plane information can improve the accuracy of the VIO system based on the
direct method
Neural Radiance Fields for Manhattan Scenes with Unknown Manhattan Frame
Novel view synthesis and 3D modeling using implicit neural field
representation are shown to be very effective for calibrated multi-view
cameras. Such representations are known to benefit from additional geometric
and semantic supervision. Most existing methods that exploit additional
supervision require dense pixel-wise labels or localized scene priors. These
methods cannot benefit from high-level vague scene priors provided in terms of
scenes' descriptions. In this work, we aim to leverage the geometric prior of
Manhattan scenes to improve the implicit neural radiance field representations.
More precisely, we assume that only the knowledge of the indoor scene (under
investigation) being Manhattan is known -- with no additional information
whatsoever -- with an unknown Manhattan coordinate frame. Such high-level prior
is used to self-supervise the surface normals derived explicitly in the
implicit neural fields. Our modeling allows us to group the derived normals and
exploit their orthogonality constraints for self-supervision. Our exhaustive
experiments on datasets of diverse indoor scenes demonstrate the significant
benefit of the proposed method over the established baselines
Online Synthesis Of Speculative Building Information Models For Robot Motion Planning
Autonomous mobile robots today still lack the necessary understanding of indoor environments for making informed decisions about the state of the world beyond their immediate field of view. As a result, they are forced to make conservative and often inaccurate assumptions about unexplored space, inhibiting the degree of performance being increasingly expected of them in the areas of high-speed navigation and mission planning. In order to address this limitation, this thesis explores the use of Building Information Models (BIMs) for providing the existing ecosystem of local and global planning algorithms with informative compact higher-level representations of indoor environments. Although BIMs have long been used in architecture, engineering, and construction for a number of different purposes, to our knowledge, this is the first instance of them being used in robotics. Given the technical constraints accompanying this domain, including a limited and incomplete set of observations which grows over time, the systems we present are designed such that together they produce BIMs capable of providing explanations of both the explored and unexplored space in an online fashion. The first is a SLAM system that uses the structural regularity of buildings in order to mitigate drift and provide the simplest explanation of architectural features such as floors, walls, and ceilings. The planar model generated is then passed to a secondary system that then reasons about their mutual relationships in order to provide a water-tight model of the observed and inferred freespace. Our experimental results demonstrate this to be an accurate and efficient approach towards this end
3D Reconstruction of Indoor Corridor Models Using Single Imagery and Video Sequences
In recent years, 3D indoor modeling has gained more attention due to its role in decision-making process of maintaining the status and managing the security of building indoor spaces. In this thesis, the problem of continuous indoor corridor space modeling has been tackled through two approaches. The first approach develops a modeling method based on middle-level perceptual organization. The second approach develops a visual Simultaneous Localisation and Mapping (SLAM) system with model-based loop closure.
In the first approach, the image space was searched for a corridor layout that can be converted into a geometrically accurate 3D model. Manhattan rule assumption was adopted, and indoor corridor layout hypotheses were generated through a random rule-based intersection of image physical line segments and virtual rays of orthogonal vanishing points. Volumetric reasoning, correspondences to physical edges, orientation map and geometric context of an image are all considered for scoring layout hypotheses. This approach provides physically plausible solutions while facing objects or occlusions in a corridor scene.
In the second approach, Layout SLAM is introduced. Layout SLAM performs camera localization while maps layout corners and normal point features in 3D space. Here, a new feature matching cost function was proposed considering both local and global context information. In addition, a rotation compensation variable makes Layout SLAM robust against cameras orientation errors accumulations. Moreover, layout model matching of keyframes insures accurate loop closures that prevent miss-association of newly visited landmarks to previously visited scene parts.
The comparison of generated single image-based 3D models to ground truth models showed that average ratio differences in widths, heights and lengths were 1.8%, 3.7% and 19.2% respectively. Moreover, Layout SLAM performed with the maximum absolute trajectory error of 2.4m in position and 8.2 degree in orientation for approximately 318m path on RAWSEEDS data set. Loop closing was strongly performed for Layout SLAM and provided 3D indoor corridor layouts with less than 1.05m displacement errors in length and less than 20cm in width and height for approximately 315m path on York University data set. The proposed methods can successfully generate 3D indoor corridor models compared to their major counterpart
Structure from Motion with Higher-level Environment Representations
Computer vision is an important area focusing on understanding,
extracting and using the information from vision-based sensor. It
has many applications such as vision-based 3D reconstruction,
simultaneous localization and mapping(SLAM) and data-driven
understanding of the real world. Vision is a fundamental sensing
modality in many different fields of application.
While the traditional structure from motion mostly uses sparse
point-based feature, this thesis aims to explore the possibility
of using higher order feature representation. It starts with a
joint work which uses straight line for feature representation
and performs bundle adjustment with straight line
parameterization. Then, we further try an even higher order
representation where we use Bezier spline for parameterization.
We start with a simple case where all contours are lying on the
plane and uses Bezier splines to parametrize the curves in the
background and optimize on both camera position and Bezier
splines. For application, we present a complete end-to-end
pipeline which produces meaningful dense 3D models from natural
data of a 3D object: the target object is placed on a structured
but unknown planar background that is modeled with splines. The
data is captured using only a hand-held monocular camera.
However, this application is limited to a planar scenario and we
manage to push the parameterizations into real 3D. Following the
potential of this idea, we introduce a more flexible higher-order
extension of points that provide a general model for structural
edges in the environment, no matter if straight or curved. Our
model relies on linked B´ezier curves, the geometric intuition
of which proves great benefits during parameter initialization
and regularization. We present the
first fully automatic pipeline that is able to generate
spline-based representations without any human supervision.
Besides a full graphical formulation of the problem, we introduce
both geometric and photometric cues as well as higher-level
concepts such overall curve visibility and viewing angle
restrictions to automatically manage the correspondences in the
graph. Results prove that curve-based structure from motion with
splines is able to outperform state-of-the-art sparse
feature-based methods, as well as to model curved edges in the
environment