38,561 research outputs found
RGM: A Robust Generalist Matching Model
Finding corresponding pixels within a pair of images is a fundamental
computer vision task with various applications. Due to the specific
requirements of different tasks like optical flow estimation and local feature
matching, previous works are primarily categorized into dense matching and
sparse feature matching focusing on specialized architectures along with
task-specific datasets, which may somewhat hinder the generalization
performance of specialized models. In this paper, we propose a deep model for
sparse and dense matching, termed RGM (Robust Generalist Matching). In
particular, we elaborately design a cascaded GRU module for refinement by
exploring the geometric similarity iteratively at multiple scales following an
additional uncertainty estimation module for sparsification. To narrow the gap
between synthetic training samples and real-world scenarios, we build a new,
large-scale dataset with sparse correspondence ground truth by generating
optical flow supervision with greater intervals. As such, we are able to mix up
various dense and sparse matching datasets, significantly improving the
training diversity. The generalization capacity of our proposed RGM is greatly
improved by learning the matching and uncertainty estimation in a two-stage
manner on the large, mixed data. Superior performance is achieved for zero-shot
matching and downstream geometry estimation across multiple datasets,
outperforming the previous methods by a large margin.Comment: 17 pages. Fixed typo in the first two equations. Code is available
at: https://github.com/aim-uofa/RG
Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance
"Like night and day" is a commonly used expression to imply that two things
are completely different. Unfortunately, this tends to be the case for current
visual feature representations of the same scene across varying seasons or
times of day. The aim of this paper is to provide a dense feature
representation that can be used to perform localization, sparse matching or
image retrieval, regardless of the current seasonal or temporal appearance.
Recently, there have been several proposed methodologies for deep learning
dense feature representations. These methods make use of ground truth
pixel-wise correspondences between pairs of images and focus on the spatial
properties of the features. As such, they don't address temporal or seasonal
variation. Furthermore, obtaining the required pixel-wise correspondence data
to train in cross-seasonal environments is highly complex in most scenarios.
We propose Deja-Vu, a weakly supervised approach to learning season invariant
features that does not require pixel-wise ground truth data. The proposed
system only requires coarse labels indicating if two images correspond to the
same location or not. From these labels, the network is trained to produce
"similar" dense feature maps for corresponding locations despite environmental
changes. Code will be made available at:
https://github.com/jspenmar/DejaVu_Feature
SCNet: Learning Semantic Correspondence
This paper addresses the problem of establishing semantic correspondences
between images depicting different instances of the same object or scene
category. Previous approaches focus on either combining a spatial regularizer
with hand-crafted features, or learning a correspondence model for appearance
only. We propose instead a convolutional neural network architecture, called
SCNet, for learning a geometrically plausible model for semantic
correspondence. SCNet uses region proposals as matching primitives, and
explicitly incorporates geometric consistency in its loss function. It is
trained on image pairs obtained from the PASCAL VOC 2007 keypoint dataset, and
a comparative evaluation on several standard benchmarks demonstrates that the
proposed approach substantially outperforms both recent deep learning
architectures and previous methods based on hand-crafted features.Comment: ICCV 201
- …