Spatial attention has been widely used to improve the performance of
convolutional neural networks by allowing them to focus on important
information. However, it has certain limitations. In this paper, we propose a
new perspective on the effectiveness of spatial attention, which is that it can
solve the problem of convolutional kernel parameter sharing. Despite this, the
information contained in the attention map generated by spatial attention is
not sufficient for large-size convolutional kernels. Therefore, we introduce a
new attention mechanism called Receptive-Field Attention (RFA). While previous
attention mechanisms such as the Convolutional Block Attention Module (CBAM)
and Coordinate Attention (CA) only focus on spatial features, they cannot fully
address the issue of convolutional kernel parameter sharing. In contrast, RFA
not only focuses on the receptive-field spatial feature but also provides
effective attention weights for large-size convolutional kernels. The
Receptive-Field Attention convolutional operation (RFAConv), developed by RFA,
represents a new approach to replace the standard convolution operation. It
offers nearly negligible increment of computational cost and parameters, while
significantly improving network performance. We conducted a series of
experiments on ImageNet-1k, MS COCO, and VOC datasets, which demonstrated the
superiority of our approach in various tasks including classification, object
detection, and semantic segmentation. Of particular importance, we believe that
it is time to shift focus from spatial features to receptive-field spatial
features for current spatial attention mechanisms. By doing so, we can further
improve network performance and achieve even better results. The code and
pre-trained models for the relevant tasks can be found at
https://github.com/Liuchen1997/RFAConv.Comment: 14 pages, 5 figure