Although deep neural networks endow the downsampled superpoints with
discriminative feature representations, directly matching them is usually not
used alone in state-of-the-art methods, mainly for two reasons. First, the
correspondences are inevitably noisy, so RANSAC-like refinement is usually
adopted. Such ad hoc postprocessing, however, is slow and not differentiable,
which can not be jointly optimized with feature learning. Second, superpoints
are sparse and thus more RANSAC iterations are needed. Existing approaches use
the coarse-to-fine strategy to propagate the superpoints correspondences to the
point level, which are not discriminative enough and further necessitates the
postprocessing refinement. In this paper, we present a simple yet effective
approach to extract correspondences by directly matching superpoints using a
global softmax layer in an end-to-end manner, which are used to determine the
rigid transformation between the source and target point cloud. Compared with
methods that directly predict corresponding points, by leveraging the rich
information from the superpoints matchings, we can obtain more accurate
estimation of the transformation and effectively filter out outliers without
any postprocessing refinement. As a result, our approach is not only fast, but
also achieves state-of-the-art results on the challenging ModelNet and 3DMatch
benchmarks. Our code and model weights will be publicly released