8 research outputs found
Image-Based Localization Using Context
Image-based localization problem consists of estimating the 6 DoFcamera pose by matching the image to a 3D point cloud (or equivalent)representing a 3D environment. The robustness and accuracyof current solutions is not objective and quantifiable. Wehave completed a comparative analysis of the main state of the artapproaches, namely Brute Force Matching, Approximate NearestNeighbour Matching, Embedded Ferns Classification, ACG Localizer(Using Visual Vocabulary) and Keyframe Matching Approach.The results of the study revealed major deficiencies in each approachmainly in search space reduction, clustering, feature matchingand sensitivity to where the query image was taken. Then, wechoose to focus on one common major problem that is reducingthe search space. We propose to create a new image-based localizationapproach based on reducing the search space by usingglobal descriptors to find candidate keyframes in the database thensearch against the 3D points that are only seen from these candidatesusing local descriptors stored in a 3D cloud map
SANet: Scene agnostic network for camera localization
This thesis presents a scene agnostic neural architecture for camera localization, where model parameters and scenes are independent from each other. Despite recent advancement in learning based methods with scene coordinate regression, most approaches require training for each scene one by one, not applicable for online applications such as SLAM and robotic navigation, where a model must be built on-the-fly. Our approach learns to build a hierarchical scene representation and predicts a dense scene coordinate map of a query RGB image on-the-fly given an arbitrary scene. The 6 DoF camera pose of the query image can be estimated with the predicted scene coordinate map. Additionally, the dense prediction can be used for other online robotic and AR applications such as obstacle avoidance. We demonstrate the effectiveness and efficiency of our method on both indoor and outdoor benchmarks, achieving state-of-the-art performance among methods working for arbitrary scenes without retraining or adaptation
Random Ferns for Semantic Segmentation of PolSAR Images
Random Ferns -- as a less known example of Ensemble Learning -- have been
successfully applied in many Computer Vision applications ranging from keypoint
matching to object detection. This paper extends the Random Fern framework to
the semantic segmentation of polarimetric synthetic aperture radar images. By
using internal projections that are defined over the space of Hermitian
matrices, the proposed classifier can be directly applied to the polarimetric
covariance matrices without the need to explicitly compute predefined image
features. Furthermore, two distinct optimization strategies are proposed: The
first based on pre-selection and grouping of internal binary features before
the creation of the classifier; and the second based on iteratively improving
the properties of a given Random Fern. Both strategies are able to boost the
performance by filtering features that are either redundant or have a low
information content and by grouping correlated features to best fulfill the
independence assumptions made by the Random Fern classifier. Experiments show
that results can be achieved that are similar to a more complex Random Forest
model and competitive to a deep learning baseline.Comment: This is the author's version of the article as accepted for
publication in IEEE Transactions on Geoscience and Remote Sensing, 2021. Link
to original: https://ieeexplore.ieee.org/document/962798
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression
Visual-inertial localization is a key problem in computer vision and robotics
applications such as virtual reality, self-driving cars, and aerial vehicles.
The goal is to estimate an accurate pose of an object when either the
environment or the dynamics are known. Recent methods directly regress the pose
using convolutional and spatio-temporal networks. Absolute pose regression
(APR) techniques predict the absolute camera pose from an image input in a
known scene. Odometry methods perform relative pose regression (RPR) that
predicts the relative pose from a known object dynamic (visual or inertial
inputs). The localization task can be improved by retrieving information of
both data sources for a cross-modal setup, which is a challenging problem due
to contradictory tasks. In this work, we conduct a benchmark to evaluate deep
multimodal fusion based on PGO and attention networks. Auxiliary and Bayesian
learning are integrated for the APR task. We show accuracy improvements for the
RPR-aided APR task and for the RPR-RPR task for aerial vehicles and hand-held
devices. We conduct experiments on the EuRoC MAV and PennCOSYVIO datasets, and
record a novel industry dataset.Comment: Under revie
Efficient Image-Based Localization Using Context
Image-Based Localization (IBL) is the problem of computing the position and orientation of a camera with respect to a geometric representation of the scene. A fundamental building block of IBL is searching the space of a saved 3D representation of the scene for correspondences to a query image. The robustness and accuracy of the IBL approaches in the literature are not objective and quantifiable.
First, this thesis presents a detailed description and study of three different 3D modeling packages based on SFM to reconstruct a 3D map of an environment. The packages tested are VSFM, Bundler and PTAM. The objective is to assess the mapping ability of each of the techniques and choose the best one to use for reconstructing the IBL 3D map. The study results show that image matching which is the bottleneck of SFM, SLAM and IBL plays the major role in favour of VSFM. This will result in using wrong matches in building the 3D map. It is crucial for IBL to choose the software that provides the best quality of points, \textit{i.e.} the largest number of correct 3D points. For this reason, VSFM will be chosen to reconstruct the 3D maps for IBL.
Second, this work presents a comparative study of the main approaches, namely Brute Force Matching, Tree-Based Approach, Embedded Ferns Classification, ACG Localizer, Keyframe Approach, Decision Forest, Worldwide Pose Estimation and MPEG Search Space Reduction. The objective of the comparative analysis was to first uncover the specifics of each of these techniques and thereby understand the advantages and disadvantages of each of them. The testing was performed on Dubrovnik Dataset where the localization is determined with respect to a 3D cloud map which was computed using a Structure-from-Motion approach. The study results show that the current state of the art IBL solutions still face challenges in search space reduction, feature matching, clustering, and the quality of the solution is not consistent across all query images.
Third, this work addresses the search space problem in order to solve the IBL problem. The Gist-based Search Space Reduction (GSSR), an efficient alternative to the available search space solutions, is proposed. It relies on GIST descriptors to considerably reduce search space and computational time, while at the same exceeding the state of the art in localization accuracy. Experiments on the 7 scenes datasets of Microsoft Research reveal considerable speedups for GSSR versus tree-based approaches, reaching a 4 times faster speed for the Heads dataset, and reducing the search space by an average of 92% while maintaining a better accuracy