Extracting discriminative local features that are invariant to imaging
variations is an integral part of establishing correspondences between images.
In this work, we introduce a self-supervised learning framework to extract
discriminative rotation-invariant descriptors using group-equivariant CNNs.
Thanks to employing group-equivariant CNNs, our method effectively learns to
obtain rotation-equivariant features and their orientations explicitly, without
having to perform sophisticated data augmentations. The resultant features and
their orientations are further processed by group aligning, a novel invariant
mapping technique that shifts the group-equivariant features by their
orientations along the group dimension. Our group aligning technique achieves
rotation-invariance without any collapse of the group dimension and thus
eschews loss of discriminability. The proposed method is trained end-to-end in
a self-supervised manner, where we use an orientation alignment loss for the
orientation estimation and a contrastive descriptor loss for robust local
descriptors to geometric/photometric variations. Our method demonstrates
state-of-the-art matching accuracy among existing rotation-invariant
descriptors under varying rotation and also shows competitive results when
transferred to the task of keypoint matching and camera pose estimation.Comment: Accepted to CVPR 2023, Project webpage at
http://cvlab.postech.ac.kr/research/REL