Attention-based graph neural networks have made great progress in feature
matching learning. However, insight of how attention mechanism works for
feature matching is lacked in the literature. In this paper, we rethink cross-
and self-attention from the viewpoint of traditional feature matching and
filtering. In order to facilitate the learning of matching and filtering, we
inject the similarity of descriptors and relative positions into cross- and
self-attention score, respectively. In this way, the attention can focus on
learning residual matching and filtering functions with reference to the basic
functions of measuring visual and spatial correlation. Moreover, we mine intra-
and inter-neighbors according to the similarity of descriptors and relative
positions. Then sparse attention for each point can be performed only within
its neighborhoods to acquire higher computation efficiency. Feature matching
networks equipped with our full and sparse residual attention learning
strategies are termed ResMatch and sResMatch respectively. Extensive
experiments, including feature matching, pose estimation and visual
localization, confirm the superiority of our networks