193 research outputs found
Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets
We propose a distributed bundle adjustment (DBA) method using the exact
Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the
existing methods partition the global map to small ones and conduct bundle
adjustment in the submaps. In order to fit the parallel framework, they use
approximate solutions instead of the LM algorithm. However, those methods often
give sub-optimal results. Different from them, we utilize the exact LM
algorithm to conduct global bundle adjustment where the formation of the
reduced camera system (RCS) is actually parallelized and executed in a
distributed way. To store the large RCS, we compress it with a block-based
sparse matrix compression format (BSMC), which fully exploits its block
feature. The BSMC format also enables the distributed storage and updating of
the global RCS. The proposed method is extensively evaluated and compared with
the state-of-the-art pipelines using both synthetic and real datasets.
Preliminary results demonstrate the efficient memory usage and vast scalability
of the proposed method compared with the baselines. For the first time, we
conducted parallel bundle adjustment using LM algorithm on a real datasets with
1.18 million images and a synthetic dataset with 10 million images (about 500
times that of the state-of-the-art LM-based BA) on a distributed computing
system.Comment: camera ready version for ICCV202
Discrete Visual Perception
International audienceComputational vision and biomedical image have made tremendous progress of the past decade. This is mostly due the development of efficient learning and inference algorithms which allow better, faster and richer modeling of visual perception tasks. Graph-based representations are among the most prominent tools to address such perception through the casting of perception as a graph optimization problem. In this paper, we briefly introduce the interest of such representations, discuss their strength and limitations and present their application to address a variety of problems in computer vision and biomedical image analysis
Visual Geometry Grounded Deep Structure From Motion
Structure-from-motion (SfM) is a long-standing problem in the computer vision
community, which aims to reconstruct the camera poses and 3D structure of a
scene from a set of unconstrained 2D images. Classical frameworks solve this
problem in an incremental manner by detecting and matching keypoints,
registering images, triangulating 3D points, and conducting bundle adjustment.
Recent research efforts have predominantly revolved around harnessing the power
of deep learning techniques to enhance specific elements (e.g., keypoint
matching), but are still based on the original, non-differentiable pipeline.
Instead, we propose a new deep pipeline VGGSfM, where each component is fully
differentiable and thus can be trained in an end-to-end manner. To this end, we
introduce new mechanisms and simplifications. First, we build on recent
advances in deep 2D point tracking to extract reliable pixel-accurate tracks,
which eliminates the need for chaining pairwise matches. Furthermore, we
recover all cameras simultaneously based on the image and track features
instead of gradually registering cameras. Finally, we optimise the cameras and
triangulate 3D points via a differentiable bundle adjustment layer. We attain
state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism,
and ETH3D.Comment: 8 figures. Project page: https://vggsfm.github.io
Large-Scale Mapping of Small Roads in Lidar Images Using Deep Convolutional Neural Networks
Detailed and complete mapping of forest roads is important for the forest industry since they are used for timber transport by trucks with long trailers. This paper proposes a new automatic method for large-scale mapping forest roads from airborne laser scanning data. The method is based on a fully convolutional neural network that performs end-to-end segmentation. To train the network, a large set of image patches with corresponding road label information are applied. The final network is then applied to detect and map forest roads from lidar data covering the Etnedal municipality in Norway. The results show that we are able to map the forest roads with an overall accuracy of 97.2%. We conclude that the method has a strong potential for large-scale operational mapping of forest roads
DIOR: Dataset for Indoor-Outdoor Reidentification -- Long Range 3D/2D Skeleton Gait Collection Pipeline, Semi-Automated Gait Keypoint Labeling and Baseline Evaluation Methods
In recent times, there is an increased interest in the identification and
re-identification of people at long distances, such as from rooftop cameras,
UAV cameras, street cams, and others. Such recognition needs to go beyond face
and use whole-body markers such as gait. However, datasets to train and test
such recognition algorithms are not widely prevalent, and fewer are labeled.
This paper introduces DIOR -- a framework for data collection, semi-automated
annotation, and also provides a dataset with 14 subjects and 1.649 million RGB
frames with 3D/2D skeleton gait labels, including 200 thousands frames from a
long range camera. Our approach leverages advanced 3D computer vision
techniques to attain pixel-level accuracy in indoor settings with motion
capture systems. Additionally, for outdoor long-range settings, we remove the
dependency on motion capture systems and adopt a low-cost, hybrid 3D computer
vision and learning pipeline with only 4 low-cost RGB cameras, successfully
achieving precise skeleton labeling on far-away subjects, even when their
height is limited to a mere 20-25 pixels within an RGB frame. On publication,
we will make our pipeline open for others to use
HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration
Outdoor LiDAR point clouds are typically large-scale and complexly
distributed. To achieve efficient and accurate registration, emphasizing the
similarity among local regions and prioritizing global local-to-local matching
is of utmost importance, subsequent to which accuracy can be enhanced through
cost-effective fine registration. In this paper, a novel hierarchical neural
network with double attention named HDMNet is proposed for large-scale outdoor
LiDAR point cloud registration. Specifically, A novel feature consistency
enhanced double-soft matching network is introduced to achieve two-stage
matching with high flexibility while enlarging the receptive field with high
efficiency in a patch-to patch manner, which significantly improves the
registration performance. Moreover, in order to further utilize the sparse
matching information from deeper layer, we develop a novel trainable embedding
mask to incorporate the confidence scores of correspondences obtained from pose
estimation of deeper layer, eliminating additional computations. The
high-confidence keypoints in the sparser point cloud of the deeper layer
correspond to a high-confidence spatial neighborhood region in shallower layer,
which will receive more attention, while the features of non-key regions will
be masked. Extensive experiments are conducted on two large-scale outdoor LiDAR
point cloud datasets to demonstrate the high accuracy and efficiency of the
proposed HDMNet.Comment: Accepted by WACV202
- …