Spatial attention mechanism has been widely incorporated into deep
convolutional neural networks (CNNs) via long-range dependency capturing,
significantly lifting the performance in computer vision, but it may perform
poorly in medical imaging. Unfortunately, existing efforts are often unaware
that long-range dependency capturing has limitations in highlighting subtle
lesion regions, neglecting to exploit the potential of multi-scale pixel
context information to improve the representational capability of CNNs. In this
paper, we propose a practical yet lightweight architectural unit, Pyramid Pixel
Context Recalibration (PPCR) module, which exploits multi-scale pixel context
information to recalibrate pixel position in a pixel-independent manner
adaptively. PPCR first designs a cross-channel pyramid pooling to aggregate
multi-scale pixel context information, then eliminates the inconsistency among
them by the well-designed pixel normalization, and finally estimates per pixel
attention weight via a pixel context integration. PPCR can be flexibly plugged
into modern CNNs with negligible overhead. Extensive experiments on five
medical image datasets and CIFAR benchmarks empirically demonstrate the
superiority and generalization of PPCR over state-of-the-art attention methods.
The in-depth analyses explain the inherent behavior of PPCR in the
decision-making process, improving the interpretability of CNNs.Comment: 10 page