Advancements in adapting deep convolution architectures for Spiking Neural
Networks (SNNs) have significantly enhanced image classification performance
and reduced computational burdens. However, the inability of
Multiplication-Free Inference (MFI) to harmonize with attention and transformer
mechanisms, which are critical to superior performance on high-resolution
vision tasks, imposes limitations on these gains. To address this, our research
explores a new pathway, drawing inspiration from the progress made in
Multi-Layer Perceptrons (MLPs). We propose an innovative spiking MLP
architecture that uses batch normalization to retain MFI compatibility and
introduces a spiking patch encoding layer to reinforce local feature extraction
capabilities. As a result, we establish an efficient multi-stage spiking MLP
network that effectively blends global receptive fields with local feature
extraction for comprehensive spike-based computation. Without relying on
pre-training or sophisticated SNN training techniques, our network secures a
top-1 accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly
trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational
costs, model capacity, and simulation steps. An expanded version of our network
challenges the performance of the spiking VGG-16 network with a 71.64% top-1
accuracy, all while operating with a model capacity 2.1 times smaller. Our
findings accentuate the potential of our deep SNN architecture in seamlessly
integrating global and local learning abilities. Interestingly, the trained
receptive field in our network mirrors the activity patterns of cortical cells.Comment: 11 pages, 6 figure