367 research outputs found
Fast multi-image matching via density-based clustering
We consider the problem of finding consistent matches
across multiple images. Previous state-of-the-art solutions
use constraints on cycles of matches together with convex
optimization, leading to computationally intensive iterative
algorithms. In this paper, we propose a clustering-based
formulation. We first rigorously show its equivalence with
the previous one, and then propose QuickMatch, a novel
algorithm that identifies multi-image matches from a density
function in feature space. We use the density to order the
points in a tree, and then extract the matches by breaking this
tree using feature distances and measures of distinctiveness.
Our algorithm outperforms previous state-of-the-art methods
(such as MatchALS) in accuracy, and it is significantly faster
(up to 62 times faster on some bechmarks), and can scale to
large datasets (with more than twenty thousands features).Accepted manuscriptSupporting documentatio
GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints
Learned local descriptors based on Convolutional Neural Networks (CNNs) have
achieved significant improvements on patch-based benchmarks, whereas not having
demonstrated strong generalization ability on recent benchmarks of image-based
3D reconstruction. In this paper, we mitigate this limitation by proposing a
novel local descriptor learning approach that integrates geometry constraints
from multi-view reconstructions, which benefits the learning process in terms
of data generation, data sampling and loss computation. We refer to the
proposed descriptor as GeoDesc, and demonstrate its superior performance on
various large-scale benchmarks, and in particular show its great success on
challenging reconstruction tasks. Moreover, we provide guidelines towards
practical integration of learned descriptors in Structure-from-Motion (SfM)
pipelines, showing the good trade-off that GeoDesc delivers to 3D
reconstruction tasks between accuracy and efficiency.Comment: Accepted to ECCV'1
End2End Multi-View Feature Matching with Differentiable Pose Optimization
Erroneous feature matches have severe impact on subsequent camera pose
estimation and often require additional, time-costly measures, like RANSAC, for
outlier rejection. Our method tackles this challenge by addressing feature
matching and pose optimization jointly. To this end, we propose a graph
attention network to predict image correspondences along with confidence
weights. The resulting matches serve as weighted constraints in a
differentiable pose estimation. Training feature matching with gradients from
pose optimization naturally learns to down-weight outliers and boosts pose
estimation on image pairs compared to SuperGlue by 6.7% on ScanNet. At the same
time, it reduces the pose estimation time by over 50% and renders RANSAC
iterations unnecessary. Moreover, we integrate information from multiple views
by spanning the graph across multiple frames to predict the matches all at
once. Multi-view matching combined with end-to-end training improves the pose
estimation metrics on Matterport3D by 18.5% compared to SuperGlue.Comment: ICCV 2023, project page:
https://barbararoessle.github.io/e2e_multi_view_matching , video:
https://youtu.be/uuLb6GfM9C
The Atlas Structure of Images
Many operations of vision require image regions to be isolated and inter-related. This is challenging when they are different in detail and extent. Practical methods of Computer Vision approach this through the tools of downsampling, pyramids, cropping and patches. In this paper we develop an ideal geometric structure for this, compatible with the existing scale space model of image measurement. Its elements are apertures which view the image like fuzzy-edged portholes of frosted glass. We establish containment and cause/effect relations between apertures, and show that these link them into cross-scale atlases. Atlases formed of Gaussian apertures are shown to be a continuous version of the image pyramid used in Computer Vision, and allow various types of image description to naturally be expressed within their framework. We show that views through Gaussian apertures are approximately equivalent to the jets of derivative of Gaussian filter responses that form part of standard Scale Space theory. This supports a view of the simple cells of mammalian V1 as implementing a system of local views of the retinal image of varying extent and resolution. As a worked example we develop a keypoint descriptor scheme that outperforms previous schemes that do not make use of learning
NCP: Neural Correspondence Prior for Effective Unsupervised Shape Matching
We present Neural Correspondence Prior (NCP), a new paradigm for computing
correspondences between 3D shapes. Our approach is fully unsupervised and can
lead to high-quality correspondences even in challenging cases such as sparse
point clouds or non-isometric meshes, where current methods fail. Our first key
observation is that, in line with neural priors observed in other domains,
recent network architectures on 3D data, even without training, tend to produce
pointwise features that induce plausible maps between rigid or non-rigid
shapes. Secondly, we show that given a noisy map as input, training a feature
extraction network with the input map as supervision tends to remove artifacts
from the input and can act as a powerful correspondence denoising mechanism,
both between individual pairs and within a collection. With these observations
in hand, we propose a two-stage unsupervised paradigm for shape matching by (i)
performing unsupervised training by adapting an existing approach to obtain an
initial set of noisy matches, and (ii) using these matches to train a network
in a supervised manner. We demonstrate that this approach significantly
improves the accuracy of the maps, especially when trained within a collection.
We show that NCP is data-efficient, fast, and achieves state-of-the-art results
on many tasks. Our code can be found online: https://github.com/pvnieo/NCP.Comment: NeurIPS 2022, 10 pages, 9 figure
Point Cloud Registration for LiDAR and Photogrammetric Data: a Critical Synthesis and Performance Analysis on Classic and Deep Learning Algorithms
Recent advances in computer vision and deep learning have shown promising
performance in estimating rigid/similarity transformation between unregistered
point clouds of complex objects and scenes. However, their performances are
mostly evaluated using a limited number of datasets from a single sensor (e.g.
Kinect or RealSense cameras), lacking a comprehensive overview of their
applicability in photogrammetric 3D mapping scenarios. In this work, we provide
a comprehensive review of the state-of-the-art (SOTA) point cloud registration
methods, where we analyze and evaluate these methods using a diverse set of
point cloud data from indoor to satellite sources. The quantitative analysis
allows for exploring the strengths, applicability, challenges, and future
trends of these methods. In contrast to existing analysis works that introduce
point cloud registration as a holistic process, our experimental analysis is
based on its inherent two-step process to better comprehend these approaches
including feature/keypoint-based initial coarse registration and dense fine
registration through cloud-to-cloud (C2C) optimization. More than ten methods,
including classic hand-crafted, deep-learning-based feature correspondence, and
robust C2C methods were tested. We observed that the success rate of most of
the algorithms are fewer than 40% over the datasets we tested and there are
still are large margin of improvement upon existing algorithms concerning 3D
sparse corresopondence search, and the ability to register point clouds with
complex geometry and occlusions. With the evaluated statistics on three
datasets, we conclude the best-performing methods for each step and provide our
recommendations, and outlook future efforts.Comment: 7 figure
- …