27,677 research outputs found
CNN-SVO: improving the mapping in semi-direct visual odometry using single-image depth prediction
Reliable feature correspondence between frames is a critical step in visual odometry (VO) and visual simultaneous localization and mapping (V-SLAM) algorithms. In comparison with existing VO and V-SLAM algorithms, semi-direct visual odometry (SVO) has two main advantages that lead to state-of-the-art frame rate camera motion estimation: direct pixel correspondence and efficient implementation of probabilistic mapping method. This paper improves the SVO mapping by initializing the mean and the variance of the depth at a feature location according to the depth prediction from a single-image depth prediction network. By significantly reducing the depth uncertainty of the initialized map point (i.e., small variance centred about the depth prediction), the benefits are twofold: reliable feature correspondence between views and fast convergence to the true depth in order to create new map points. We evaluate our method with two outdoor datasets: KITTI dataset and Oxford Robotcar dataset. The experimental results indicate that improved SVO mapping results in increased robustness and camera tracking accuracy. The implementation of this work is available at https://github.com/yan99033/CNN-SVO
Continuous Pose for Monocular Cameras in Neural Implicit Representation
In this paper, we showcase the effectiveness of optimizing monocular camera
poses as a continuous function of time. The camera poses are represented using
an implicit neural function which maps the given time to the corresponding
camera pose. The mapped camera poses are then used for the downstream tasks
where joint camera pose optimization is also required. While doing so, the
network parameters -- that implicitly represent camera poses -- are optimized.
We exploit the proposed method in four diverse experimental settings, namely,
(1) NeRF from noisy poses; (2) NeRF from asynchronous Events; (3) Visual
Simultaneous Localization and Mapping (vSLAM); and (4) vSLAM with IMUs. In all
four settings, the proposed method performs significantly better than the
compared baselines and the state-of-the-art methods. Additionally, using the
assumption of continuous motion, changes in pose may actually live in a
manifold that has lower than 6 degrees of freedom (DOF) is also realized. We
call this low DOF motion representation as the \emph{intrinsic motion} and use
the approach in vSLAM settings, showing impressive camera tracking performance
Learned Camera Gain and Exposure Control for Improved Visual Feature Detection and Matching
Successful visual navigation depends upon capturing images that contain
sufficient useful information. In this paper, we explore a data-driven approach
to account for environmental lighting changes, improving the quality of images
for use in visual odometry (VO) or visual simultaneous localization and mapping
(SLAM). We train a deep convolutional neural network model to predictively
adjust camera gain and exposure time parameters such that consecutive images
contain a maximal number of matchable features. The training process is fully
self-supervised: our training signal is derived from an underlying VO or SLAM
pipeline and, as a result, the model is optimized to perform well with that
specific pipeline. We demonstrate through extensive real-world experiments that
our network can anticipate and compensate for dramatic lighting changes (e.g.,
transitions into and out of road tunnels), maintaining a substantially higher
number of inlier feature matches than competing camera parameter control
algorithms.Comment: Accepted to IEEE Robotics and Automation Letters and to the IEEE
International Conference on Robotics and Automation (ICRA) 202
SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization
Current techniques in Visual Simultaneous Localization and Mapping (VSLAM)
estimate camera displacement by comparing image features of consecutive scenes.
These algorithms depend on scene continuity, hence requires frequent camera
inputs. However, processing images frequently can lead to significant memory
usage and computation overhead. In this study, we introduce SemanticSLAM, an
end-to-end visual-inertial odometry system that utilizes semantic features
extracted from an RGB-D sensor. This approach enables the creation of a
semantic map of the environment and ensures reliable camera localization.
SemanticSLAM is scene-agnostic, which means it doesn't require retraining for
different environments. It operates effectively in indoor settings, even with
infrequent camera input, without prior knowledge. The strength of SemanticSLAM
lies in its ability to gradually refine the semantic map and improve pose
estimation. This is achieved by a convolutional long-short-term-memory
(ConvLSTM) network, trained to correct errors during map construction. Compared
to existing VSLAM algorithms, SemanticSLAM improves pose estimation by 17%. The
resulting semantic map provides interpretable information about the environment
and can be easily applied to various downstream tasks, such as path planning,
obstacle avoidance, and robot navigation. The code will be publicly available
at https://github.com/Leomingyangli/SemanticSLAMComment: 2023 IEEE Symposium Series on Computational Intelligence (SSCI) 6
page
Image-Based Localization Using Deep Neural Networks
Image-based localization, or camera relocalization, is a fundamental problem in computer vision and robotics, and it refers to estimating camera pose from an image. It is a key component of many computer vision applications such as navigating autonomous vehicles and mobile robotics, simultaneous localization and mapping (SLAM), and augmented reality.
Currently, there are plenty of image-based localization methods proposed in the literature. Most state-of-the-art approaches are based on hand-crafted local features, such as SIFT, ORB, or SURF, and efficient 2D-to-3D matching using a 3D model. However, the limitations of the hand-crafted feature detector and descriptor become the bottleneck of these approaches. Recently, some promising deep neural network based localization approaches have been proposed. These approaches directly formulate 6 DoF pose estimation as a regression problem or use neural networks for generating 2D-3D correspondences, and thus no feature extraction or feature matching processes are required.
In this thesis, we first review two state-of-the-art approaches for image-based localization. The first approach is conventional hand-crafted local feature based (Active Search) and the second one is novel deep neural network based (DSAC). Building on the idea of DSAC, we then examine the use of conventional RANSAC and introduce a novel full-frame Coordinate CNN. We evaluate these methods on the 7-Scenes dataset of Microsoft Research, and extensive comparisons are made. The results show that our modifications to the original DSAC pipeline lead to better performance than the two state-of-the-art approaches
DeepSLAM: A Robust Monocular SLAM System with Unsupervised Deep Learning
In this paper, we propose DeepSLAM, a novel unsupervised deep learning-based visual Simultaneous Localization and Mapping (SLAM) system. The DeepSLAM training is fully unsupervised since it only requires stereo imagery instead of annotating ground-truth poses. Its testing takes a monocular image sequence as the input. Therefore, it is a monocular SLAM paradigm. DeepSLAM consists of several essential components, including Mapping-Net, Tracking-Net, Loop-Net and a graph optimization unit. Specifically, the Mapping-Net is an encoder and decoder architecture for describing the 3D structure of the environment while the Tracking-Net is a Recurrent Convolutional Neural Network (RCNN) architecture for capturing the camera motion. The Loop-Net is a pre-trained binary classifier for detecting loop closures. DeepSLAM can simultaneously generate pose estimate, depth map and outlier rejection mask. We evaluate its performance on various datasets, and find that DeepSLAM achieves good performance in terms of pose estimation accuracy, and is robust in some challenging scenes
Deep Semantic 3D Visual Metric Reconstruction Using Wall-Climbing Robot
This project introduces an inspection method using a deep neural network to detect the crack and spalling defects on concrete structures performed by a wall-climbing robot. First, we create a pixel-level semantic dataset which includes 820 labeled images. Second, we propose an inspection method to obtain 3D metric measurement by using an RGB-D camera-based visual simultaneous localization and mapping (SLAM), which is able to generate pose coupled key-frames with depth information. Therefore, the semantic inspection results can be registered in the concrete structure 3D model for condition assessment and monitoring. Third, we present our new generation wall-climbing robot to perform the inspection task on both horizontal and vertical surfaces
Network Uncertainty Informed Semantic Feature Selection for Visual SLAM
In order to facilitate long-term localization using a visual simultaneous
localization and mapping (SLAM) algorithm, careful feature selection can help
ensure that reference points persist over long durations and the runtime and
storage complexity of the algorithm remain consistent. We present SIVO
(Semantically Informed Visual Odometry and Mapping), a novel
information-theoretic feature selection method for visual SLAM which
incorporates semantic segmentation and neural network uncertainty into the
feature selection pipeline. Our algorithm selects points which provide the
highest reduction in Shannon entropy between the entropy of the current state
and the joint entropy of the state, given the addition of the new feature with
the classification entropy of the feature from a Bayesian neural network. Each
selected feature significantly reduces the uncertainty of the vehicle state and
has been detected to be a static object (building, traffic sign, etc.)
repeatedly with a high confidence. This selection strategy generates a sparse
map which can facilitate long-term localization. The KITTI odometry dataset is
used to evaluate our method, and we also compare our results against ORB_SLAM2.
Overall, SIVO performs comparably to the baseline method while reducing the map
size by almost 70%.Comment: Published in: 2019 16th Conference on Computer and Robot Vision (CRV
- …