204 research outputs found
Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets
We propose a distributed bundle adjustment (DBA) method using the exact
Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the
existing methods partition the global map to small ones and conduct bundle
adjustment in the submaps. In order to fit the parallel framework, they use
approximate solutions instead of the LM algorithm. However, those methods often
give sub-optimal results. Different from them, we utilize the exact LM
algorithm to conduct global bundle adjustment where the formation of the
reduced camera system (RCS) is actually parallelized and executed in a
distributed way. To store the large RCS, we compress it with a block-based
sparse matrix compression format (BSMC), which fully exploits its block
feature. The BSMC format also enables the distributed storage and updating of
the global RCS. The proposed method is extensively evaluated and compared with
the state-of-the-art pipelines using both synthetic and real datasets.
Preliminary results demonstrate the efficient memory usage and vast scalability
of the proposed method compared with the baselines. For the first time, we
conducted parallel bundle adjustment using LM algorithm on a real datasets with
1.18 million images and a synthetic dataset with 10 million images (about 500
times that of the state-of-the-art LM-based BA) on a distributed computing
system.Comment: camera ready version for ICCV202
Visual Geometry Grounded Deep Structure From Motion
Structure-from-motion (SfM) is a long-standing problem in the computer vision
community, which aims to reconstruct the camera poses and 3D structure of a
scene from a set of unconstrained 2D images. Classical frameworks solve this
problem in an incremental manner by detecting and matching keypoints,
registering images, triangulating 3D points, and conducting bundle adjustment.
Recent research efforts have predominantly revolved around harnessing the power
of deep learning techniques to enhance specific elements (e.g., keypoint
matching), but are still based on the original, non-differentiable pipeline.
Instead, we propose a new deep pipeline VGGSfM, where each component is fully
differentiable and thus can be trained in an end-to-end manner. To this end, we
introduce new mechanisms and simplifications. First, we build on recent
advances in deep 2D point tracking to extract reliable pixel-accurate tracks,
which eliminates the need for chaining pairwise matches. Furthermore, we
recover all cameras simultaneously based on the image and track features
instead of gradually registering cameras. Finally, we optimise the cameras and
triangulate 3D points via a differentiable bundle adjustment layer. We attain
state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism,
and ETH3D.Comment: 8 figures. Project page: https://vggsfm.github.io
Unsupervised Domain Adaptation within Deep Foundation Latent Spaces
The vision transformer-based foundation models, such as ViT or Dino-V2, are
aimed at solving problems with little or no finetuning of features. Using a
setting of prototypical networks, we analyse to what extent such foundation
models can solve unsupervised domain adaptation without finetuning over the
source or target domain. Through quantitative analysis, as well as qualitative
interpretations of decision making, we demonstrate that the suggested method
can improve upon existing baselines, as well as showcase the limitations of
such approach yet to be solved
RADA: Robust Adversarial Data Augmentation for Camera Localization in Challenging Conditions
Camera localization is a fundamental problem for many applications in computer vision, robotics, and autonomy. Despite recent deep learning-based approaches, the lack of robustness in challenging conditions persists due to changes in appearance caused by texture-less planes, repeating structures, reflective surfaces, motion blur, and illumination changes. Data augmentation is an attractive solution, but standard image perturbation methods fail to improve localization robustness. To address this, we propose RADA, which concentrates on perturbing the most vulnerable pixels to generate relatively less image perturbations that perplex the network. Our method outperforms previous augmentation techniques, achieving up to twice the accuracy of state-of-the-art models even under ’unseen’ challenging weather conditions. Videos of our results can be found at https://youtu.be/niOv7- fJeCA. The source code for RADA is publicly available at https://github.com/jialuwang123321/RAD
Extracting Semantic Information from Visual Data: A Survey
The traditional environment maps built by mobile robots include both metric ones and topological ones. These maps are navigation-oriented and not adequate for service robots to interact with or serve human users who normally rely on the conceptual knowledge or semantic contents of the environment. Therefore, the construction of semantic maps becomes necessary for building an effective human-robot interface for service robots. This paper reviews recent research and development in the field of visual-based semantic mapping. The main focus is placed on how to extract semantic information from visual data in terms of feature extraction, object/place recognition and semantic representation methods
- …