2,455 research outputs found
Efficient 2D-3D Matching for Multi-Camera Visual Localization
Visual localization, i.e., determining the position and orientation of a
vehicle with respect to a map, is a key problem in autonomous driving. We
present a multicamera visual inertial localization algorithm for large scale
environments. To efficiently and effectively match features against a pre-built
global 3D map, we propose a prioritized feature matching scheme for
multi-camera systems. In contrast to existing works, designed for monocular
cameras, we (1) tailor the prioritization function to the multi-camera setup
and (2) run feature matching and pose estimation in parallel. This
significantly accelerates the matching and pose estimation stages and allows us
to dynamically adapt the matching efforts based on the surrounding environment.
In addition, we show how pose priors can be integrated into the localization
system to increase efficiency and robustness. Finally, we extend our algorithm
by fusing the absolute pose estimates with motion estimates from a multi-camera
visual inertial odometry pipeline (VIO). This results in a system that provides
reliable and drift-less pose estimation. Extensive experiments show that our
localization runs fast and robust under varying conditions, and that our
extended algorithm enables reliable real-time pose estimation.Comment: 7 pages, 5 figure
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
Vision-Based Monocular SLAM in Micro Aerial Vehicle
Micro Aerial Vehicles (MAVs) are popular for their efficiency, agility, and lightweights. They can navigate in dynamic environments that cannot be accessed by humans or traditional aircraft. These MAVs rely on GPS and it will be difficult for GPS-denied areas where it is obstructed by buildings and other obstacles. Simultaneous Localization and Mapping (SLAM) in an unknown environment can solve the aforementioned problems faced by flying robots. A rotation and scale invariant visual-based solution, oriented fast and rotated brief (ORB-SLAM) is one of the best solutions for localization and mapping using monocular vision.
 In this paper, an ORB-SLAM3 has been used to carry out the research on localizing micro-aerial vehicle Tello and mapping an unknown environment. The effectiveness of ORB-SLAM3 was tested in a variety of indoor environments.  An integrated adaptive controller was used for an autonomous flight that used the 3D map, produced by ORB-SLAM3 and our proposed novel technique for robust initialization of the SLAM system during flight. The results show that ORB-SLAM3 can provide accurate localization and mapping for flying robots, even in challenging scenarios with fast motion, large camera movements, and dynamic environments. Furthermore, our results show that the proposed system is capable of navigating and mapping challenging indoor situations
Using Image Sequences for Long-Term Visual Localization
Estimating the pose of a camera in a known scene, i.e., visual localization, is a core task for applications such as self-driving cars. In many scenarios, image sequences are available and existing work on combining single-image localization with odometry offers to unlock their potential for improving localization performance. Still, the largest part of the literature focuses on single-image localization and ignores the availability of sequence data. The goal of this paper is to demonstrate the potential of image sequences in challenging scenarios, e.g., under day-night or seasonal changes. Combining ideas from the literature, we describe a sequence-based localization pipeline that combines odometry with both a coarse and a fine localization module. Experiments on long-term localization datasets show that combining single-image global localization against a prebuilt map with a visual odometry / SLAM pipeline improves performance to a level where the extended CMU Seasons dataset can be considered solved. We show that SIFT features can perform on par with modern state-of-the-art features in our framework, despite being much weaker and a magnitude faster to compute. Our code is publicly available at github.com/rulllars
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Visual localization enables autonomous vehicles to navigate in their
surroundings and augmented reality applications to link virtual to real worlds.
Practical visual localization approaches need to be robust to a wide variety of
viewing condition, including day-night changes, as well as weather and seasonal
variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera
pose estimates. In this paper, we introduce the first benchmark datasets
specifically designed for analyzing the impact of such factors on visual
localization. Using carefully created ground truth poses for query images taken
under a wide variety of conditions, we evaluate the impact of various factors
on 6DOF camera pose estimation accuracy through extensive experiments with
state-of-the-art localization approaches. Based on our results, we draw
conclusions about the difficulty of different conditions, showing that
long-term localization is far from solved, and propose promising avenues for
future work, including sequence-based localization approaches and the need for
better local features. Our benchmark is available at visuallocalization.net.Comment: Accepted to CVPR 2018 as a spotligh
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression
Visual-inertial localization is a key problem in computer vision and robotics
applications such as virtual reality, self-driving cars, and aerial vehicles.
The goal is to estimate an accurate pose of an object when either the
environment or the dynamics are known. Recent methods directly regress the pose
using convolutional and spatio-temporal networks. Absolute pose regression
(APR) techniques predict the absolute camera pose from an image input in a
known scene. Odometry methods perform relative pose regression (RPR) that
predicts the relative pose from a known object dynamic (visual or inertial
inputs). The localization task can be improved by retrieving information of
both data sources for a cross-modal setup, which is a challenging problem due
to contradictory tasks. In this work, we conduct a benchmark to evaluate deep
multimodal fusion based on PGO and attention networks. Auxiliary and Bayesian
learning are integrated for the APR task. We show accuracy improvements for the
RPR-aided APR task and for the RPR-RPR task for aerial vehicles and hand-held
devices. We conduct experiments on the EuRoC MAV and PennCOSYVIO datasets, and
record a novel industry dataset.Comment: Under revie
NeRF-VINS: A Real-time Neural Radiance Field Map-based Visual-Inertial Navigation System
Achieving accurate, efficient, and consistent localization within an a priori
environment map remains a fundamental challenge in robotics and computer
vision. Conventional map-based keyframe localization often suffers from
sub-optimal viewpoints due to limited field of view (FOV), thus degrading its
performance. To address this issue, in this paper, we design a real-time
tightly-coupled Neural Radiance Fields (NeRF)-aided visual-inertial navigation
system (VINS), termed NeRF-VINS. By effectively leveraging NeRF's potential to
synthesize novel views, essential for addressing limited viewpoints, the
proposed NeRF-VINS optimally fuses IMU and monocular image measurements along
with synthetically rendered images within an efficient filter-based framework.
This tightly coupled integration enables 3D motion tracking with bounded error.
We extensively compare the proposed NeRF-VINS against the state-of-the-art
methods that use prior map information, which is shown to achieve superior
performance. We also demonstrate the proposed method is able to perform
real-time estimation at 15 Hz, on a resource-constrained Jetson AGX Orin
embedded platform with impressive accuracy.Comment: 6 pages, 7 figure
Four years of multi-modal odometry and mapping on the rail vehicles
Precise, seamless, and efficient train localization as well as long-term
railway environment monitoring is the essential property towards reliability,
availability, maintainability, and safety (RAMS) engineering for railroad
systems. Simultaneous localization and mapping (SLAM) is right at the core of
solving the two problems concurrently. In this end, we propose a
high-performance and versatile multi-modal framework in this paper, targeted
for the odometry and mapping task for various rail vehicles. Our system is
built atop an inertial-centric state estimator that tightly couples light
detection and ranging (LiDAR), visual, optionally satellite navigation and
map-based localization information with the convenience and extendibility of
loosely coupled methods. The inertial sensors IMU and wheel encoder are treated
as the primary sensor, which achieves the observations from subsystems to
constrain the accelerometer and gyroscope biases. Compared to point-only
LiDAR-inertial methods, our approach leverages more geometry information by
introducing both track plane and electric power pillars into state estimation.
The Visual-inertial subsystem also utilizes the environmental structure
information by employing both lines and points. Besides, the method is capable
of handling sensor failures by automatic reconfiguration bypassing failure
modules. Our proposed method has been extensively tested in the long-during
railway environments over four years, including general-speed, high-speed and
metro, both passenger and freight traffic are investigated. Further, we aim to
share, in an open way, the experience, problems, and successes of our group
with the robotics community so that those that work in such environments can
avoid these errors. In this view, we open source some of the datasets to
benefit the research community
- …