We revisit the relationship between attention mechanisms and large kernel
ConvNets in visual transformers and propose a new spatial attention named Large
Kernel Convolutional Attention (LKCA). It simplifies the attention operation by
replacing it with a single large kernel convolution. LKCA combines the
advantages of convolutional neural networks and visual transformers, possessing
a large receptive field, locality, and parameter sharing. We explained the
superiority of LKCA from both convolution and attention perspectives, providing
equivalent code implementations for each view. Experiments confirm that LKCA
implemented from both the convolutional and attention perspectives exhibit
equivalent performance. We extensively experimented with the LKCA variant of
ViT in both classification and segmentation tasks. The experiments demonstrated
that LKCA exhibits competitive performance in visual tasks. Our code will be
made publicly available at https://github.com/CatworldLee/LKCA