210 research outputs found
RGBDTAM: A Cost-Effective and Accurate RGB-D Tracking and Mapping System
Simultaneous Localization and Mapping using RGB-D cameras has been a fertile
research topic in the latest decade, due to the suitability of such sensors for
indoor robotics. In this paper we propose a direct RGB-D SLAM algorithm with
state-of-the-art accuracy and robustness at a los cost. Our experiments in the
RGB-D TUM dataset [34] effectively show a better accuracy and robustness in CPU
real time than direct RGB-D SLAM systems that make use of the GPU. The key
ingredients of our approach are mainly two. Firstly, the combination of a
semi-dense photometric and dense geometric error for the pose tracking (see
Figure 1), which we demonstrate to be the most accurate alternative. And
secondly, a model of the multi-view constraints and their errors in the mapping
and tracking threads, which adds extra information over other approaches. We
release the open-source implementation of our approach 1 . The reader is
referred to a video with our results 2 for a more illustrative visualization of
its performance
Jacobian Computation for Cumulative B-Splines on SE(3) and Application to Continuous-Time Object Tracking
In this paper we propose a method that estimates the SE(3) continuous trajectories (orientation and translation) of the dynamic rigid objects present in a scene, from multiple RGB-D views. Specifically, we fit the object trajectories to cumulative B-Splines curves, which allow us to interpolate, at any intermediate time stamp, not only their poses but also their linear and angular velocities and accelerations. Additionally, we derive in this work the analytical SE(3) Jacobians needed by the optimization, being applicable to any other approach that uses this type of curves. To the best of our knowledge this is the first work that proposes 6-DoF continuous-time object tracking, which we endorse with significant computational cost reduction thanks to our analytical derivations. We evaluate our proposal in synthetic data and in a public benchmark, showing competitive results in localization and significant improvements in velocity estimation in comparison to discrete-time approaches. © 2016 IEEE
Using superpixels in monocular SLAM
have been traditionally based on finding point correspondences in highly-textured image areas. Large textureless regions, usu-ally found in indoor and urban environments, are difficult to reconstruct by these systems. In this paper we augment for the first time the traditional point-based monocular SLAM maps with superpixels. Super-pixels are middle-level features consisting of image regions of homogeneous texture. We propose a novel scheme for superpixel matching, 3D initialization and optimization that overcomes the difficulties of salient point-based approaches in these areas of homogeneous texture. Our experimental results show the validity of our approach. First, we compare our proposal with a state-of-the-art multiview stereo system; being able to reconstruct the textureless regions that the latest cannot. Secondly, we present experimental results of our algorithm integrated with the point-based PTAM [1]; estimating, now in real-time, the superpixel textureless areas. Finally, we show the accuracy of the presented algorithm with a quantitative analysis of the estimation error. I
SfM-TTR: Using Structure from Motion for Test-Time Refinement of Single-View Depth Networks
Estimating a dense depth map from a single view is geometrically ill-posed,
and state-of-the-art methods rely on learning depth's relation with visual
appearance using deep neural networks. On the other hand, Structure from Motion
(SfM) leverages multi-view constraints to produce very accurate but sparse
maps, as accurate matching across images is limited by locally discriminative
texture. In this work, we combine the strengths of both approaches by proposing
a novel test-time refinement (TTR) method, denoted as SfM-TTR, that boosts the
performance of single-view depth networks at test time using SfM multi-view
cues. Specifically, and differently from the state of the art, we use sparse
SfM point clouds as test-time self-supervisory signal, fine-tuning the network
encoder to learn a better representation of the test scene. Our results show
how the addition of SfM-TTR to several state-of-the-art self-supervised and
supervised networks improves significantly their performance, outperforming
previous TTR baselines mainly based on photometric multi-view consistency
Optimal Transport Aggregation for Visual Place Recognition
The task of Visual Place Recognition (VPR) aims to match a query image
against references from an extensive database of images from different places,
relying solely on visual cues. State-of-the-art pipelines focus on the
aggregation of features extracted from a deep backbone, in order to form a
global descriptor for each image. In this context, we introduce SALAD (Sinkhorn
Algorithm for Locally Aggregated Descriptors), which reformulates NetVLAD's
soft-assignment of local features to clusters as an optimal transport problem.
In SALAD, we consider both feature-to-cluster and cluster-to-feature relations
and we also introduce a 'dustbin' cluster, designed to selectively discard
features deemed non-informative, enhancing the overall descriptor quality.
Additionally, we leverage and fine-tune DINOv2 as a backbone, which provides
enhanced description power for the local features, and dramatically reduces the
required training time. As a result, our single-stage method not only surpasses
single-stage baselines in public VPR datasets, but also surpasses two-stage
methods that add a re-ranking with significantly higher cost. Code and models
are available at https://github.com/serizba/salad
DAC: Detector-Agnostic Spatial Covariances for Deep Local Features
Current deep visual local feature detectors do not model the spatial
uncertainty of detected features, producing suboptimal results in downstream
applications. In this work, we propose two post-hoc covariance estimates that
can be plugged into any pretrained deep feature detector: a simple, isotropic
covariance estimate that uses the predicted score at a given pixel location,
and a full covariance estimate via the local structure tensor of the learned
score maps. Both methods are easy to implement and can be applied to any deep
feature detector. We show that these covariances are directly related to errors
in feature matching, leading to improvements in downstream tasks, including
solving the perspective-n-point problem and motion-only bundle adjustment. Code
is available at https://github.com/javrtg/DA
Loosely-Coupled Semi-Direct Monocular SLAM
We propose a novel semi-direct approach for monocular simultaneous
localization and mapping (SLAM) that combines the complementary strengths of
direct and feature-based methods. The proposed pipeline loosely couples direct
odometry and feature-based SLAM to perform three levels of parallel
optimizations: (1) photometric bundle adjustment (BA) that jointly optimizes
the local structure and motion, (2) geometric BA that refines keyframe poses
and associated feature map points, and (3) pose graph optimization to achieve
global map consistency in the presence of loop closures. This is achieved in
real-time by limiting the feature-based operations to marginalized keyframes
from the direct odometry module. Exhaustive evaluation on two benchmark
datasets demonstrates that our system outperforms the state-of-the-art
monocular odometry and SLAM systems in terms of overall accuracy and
robustness.Comment: Accepted for publication in IEEE Robotics and Automation Letters.
Watch video demo at: https://youtu.be/j7WnU7ZpZ8
- …