9 research outputs found
LF-Net: Learning Local Features from Images
We present a novel deep architecture and a training strategy to learn a local
feature pipeline from scratch, using collections of images without the need for
human supervision. To do so we exploit depth and relative camera pose cues to
create a virtual target that the network should achieve on one image, provided
the outputs of the network for the other image. While this process is
inherently non-differentiable, we show that we can optimize the network in a
two-branch setup by confining it to one branch, while preserving
differentiability in the other. We train our method on both indoor and outdoor
datasets, with depth data from 3D sensors for the former, and depth estimates
from an off-the-shelf Structure-from-Motion solution for the latter. Our models
outperform the state of the art on sparse feature matching on both datasets,
while running at 60+ fps for QVGA images.Comment: NIPS 201
LF-Net: Learning Local Features from Images
We present a novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision. To do so we exploit depth and relative camera pose cues to create a virtual target that the network should achieve on one image, provided the outputs of the network for the other image. While this process is inherently non-differentiable, we show that we can optimize the network in a two-branch setup by confining it to one branch, while preserving differentiability in the other. We train our method on both indoor and outdoor datasets, with depth data from 3D sensors for the former, and depth estimates from an off-the-shelf Structure-from-Motion solution for the latter. Our models outperform the state of the art on sparse feature matching on both datasets while running at 60+ fps for QVGA images
Is there anything new to say about SIFT matching?
SIFT is a classical hand-crafted, histogram-based descriptor that has deeply influenced research on image matching for more than a decade. In this paper, a critical review of the aspects that affect SIFT matching performance is carried out, and novel descriptor design strategies are introduced and individually evaluated. These encompass quantization, binarization and hierarchical cascade filtering as means to reduce data storage and increase matching efficiency, with no significant loss of accuracy. An original contextual matching strategy based on a symmetrical variant of the usual nearest-neighbor ratio is discussed as well, that can increase the discriminative power of any descriptor. The paper then undertakes a comprehensive experimental evaluation of state-of-the-art hand-crafted and data-driven descriptors, also including the most recent deep descriptors. Comparisons are carried out according to several performance parameters, among which accuracy and space-time efficiency. Results are provided for both planar and non-planar scenes, the latter being evaluated with a new benchmark based on the concept of approximated patch overlap. Experimental evidence shows that, despite their age, SIFT and other hand-crafted descriptors, once enhanced through the proposed strategies, are ready to meet the future image matching challenges. We also believe that the lessons learned from this work will inspire the design of better hand-crafted and data-driven descriptors
Image Matching across Wide Baselines: From Paper to Practice
We introduce a comprehensive benchmark for local features and robust
estimation algorithms, focusing on the downstream task -- the accuracy of the
reconstructed camera pose -- as our primary metric. Our pipeline's modular
structure allows easy integration, configuration, and combination of different
methods and heuristics. This is demonstrated by embedding dozens of popular
algorithms and evaluating them, from seminal works to the cutting edge of
machine learning research. We show that with proper settings, classical
solutions may still outperform the perceived state of the art.
Besides establishing the actual state of the art, the conducted experiments
reveal unexpected properties of Structure from Motion (SfM) pipelines that can
help improve their performance, for both algorithmic and learned methods. Data
and code are online https://github.com/vcg-uvic/image-matching-benchmark,
providing an easy-to-use and flexible framework for the benchmarking of local
features and robust estimation methods, both alongside and against
top-performing methods. This work provides a basis for the Image Matching
Challenge https://vision.uvic.ca/image-matching-challenge.Comment: Added: KeyNet-SOSNet, AffNet-HardNet, TFeat, MKD from korni