Spiking neural networks (SNNs) have achieved orders of magnitude improvement
in terms of energy consumption and latency when performing inference with deep
learning workloads. Error backpropagation is presently regarded as the most
effective method for training SNNs, but in a twist of irony, when training on
modern graphics processing units (GPUs) this becomes more expensive than
non-spiking networks. The emergence of Graphcore's Intelligence Processing
Units (IPUs) balances the parallelized nature of deep learning workloads with
the sequential, reusable, and sparsified nature of operations prevalent when
training SNNs. IPUs adopt multi-instruction multi-data (MIMD) parallelism by
running individual processing threads on smaller data blocks, which is a
natural fit for the sequential, non-vectorized steps required to solve spiking
neuron dynamical state equations. We present an IPU-optimized release of our
custom SNN Python package, snnTorch, which exploits fine-grained parallelism by
utilizing low-level, pre-compiled custom operations to accelerate irregular and
sparse data access patterns that are characteristic of training SNN workloads.
We provide a rigorous performance assessment across a suite of commonly used
spiking neuron models, and propose methods to further reduce training run-time
via half-precision training. By amortizing the cost of sequential processing
into vectorizable population codes, we ultimately demonstrate the potential for
integrating domain-specific accelerators with the next generation of neural
networks.Comment: 10 pages, 9 figures, journa