5 research outputs found
MINT: Multiplier-less Integer Quantization for Spiking Neural Networks
We propose Multiplier-less INTeger (MINT) quantization, an efficient uniform
quantization scheme for the weights and membrane potentials in spiking neural
networks (SNNs). Unlike prior SNN quantization works, MINT quantizes the
memory-hungry membrane potentials to extremely low precision (2-bit) to
significantly reduce the total memory footprint. Additionally, MINT
quantization shares the quantization scaling factor between the weights and
membrane potentials, eliminating the need for multipliers that are necessary
for vanilla uniform quantization. Experimental results demonstrate that our
proposed method achieves accuracy that matches the full-precision models and
other state-of-the-art SNN quantization works while outperforming them on total
memory footprint and hardware cost at deployment. For instance, 2-bit MINT
VGG-16 achieves 90.6% accuracy on CIFAR-10 with approximately 93.8% reduction
in total memory footprint from the full-precision model; meanwhile, it reduces
90% computation energy compared to the vanilla uniform quantization at
deployment.Comment: 6 pages. Accepted to 29th Asia and South Pacific Design Automation
Conference (ASP-DAC 2024
Sharing Leaky-Integrate-and-Fire Neurons for Memory-Efficient Spiking Neural Networks
Spiking Neural Networks (SNNs) have gained increasing attention as
energy-efficient neural networks owing to their binary and asynchronous
computation. However, their non-linear activation, that is
Leaky-Integrate-and-Fire (LIF) neuron, requires additional memory to store a
membrane voltage to capture the temporal dynamics of spikes. Although the
required memory cost for LIF neurons significantly increases as the input
dimension goes larger, a technique to reduce memory for LIF neurons has not
been explored so far. To address this, we propose a simple and effective
solution, EfficientLIF-Net, which shares the LIF neurons across different
layers and channels. Our EfficientLIF-Net achieves comparable accuracy with the
standard SNNs while bringing up to ~4.3X forward memory efficiency and ~21.9X
backward memory efficiency for LIF neurons. We conduct experiments on various
datasets including CIFAR10, CIFAR100, TinyImageNet, ImageNet-100, and
N-Caltech101. Furthermore, we show that our approach also offers advantages on
Human Activity Recognition (HAR) datasets, which heavily rely on temporal
information
Workload-Balanced Pruning for Sparse Spiking Neural Networks
Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental
methodology for deploying deep SNNs on resource-constrained edge devices.
Though the existing pruning methods can provide extremely high weight sparsity
for deep SNNs, the high weight sparsity brings a workload imbalance problem.
Specifically, the workload imbalance happens when a different number of
non-zero weights are assigned to hardware units running in parallel, which
results in low hardware utilization and thus imposes longer latency and higher
energy costs. In preliminary experiments, we show that sparse SNNs (98%
weight sparsity) can suffer as low as 59% utilization. To alleviate the
workload imbalance problem, we propose u-Ticket, where we monitor and adjust
the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based
pruning, thus guaranteeing the final ticket gets optimal utilization when
deployed onto the hardware. Experiments indicate that our u-Ticket can
guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency
and 63.8% energy cost compared to the non-utilization-aware LTH method
Rethinking skip connections in Spiking Neural Networks with Time-To-First-Spike coding
Time-To-First-Spike (TTFS) coding in Spiking Neural Networks (SNNs) offers significant advantages in terms of energy efficiency, closely mimicking the behavior of biological neurons. In this work, we delve into the role of skip connections, a widely used concept in Artificial Neural Networks (ANNs), within the domain of SNNs with TTFS coding. Our focus is on two distinct types of skip connection architectures: (1) addition-based skip connections, and (2) concatenation-based skip connections. We find that addition-based skip connections introduce an additional delay in terms of spike timing. On the other hand, concatenation-based skip connections circumvent this delay but produce time gaps between after-convolution and skip connection paths, thereby restricting the effective mixing of information from these two paths. To mitigate these issues, we propose a novel approach involving a learnable delay for skip connections in the concatenation-based skip connection architecture. This approach successfully bridges the time gap between the convolutional and skip branches, facilitating improved information mixing. We conduct experiments on public datasets including MNIST and Fashion-MNIST, illustrating the advantage of the skip connection in TTFS coding architectures. Additionally, we demonstrate the applicability of TTFS coding on beyond image recognition tasks and extend it to scientific machine-learning tasks, broadening the potential uses of SNNs
Efficient human activity recognition with spatio-temporal spiking neural networks
In this study, we explore Human Activity Recognition (HAR), a task that aims to predict individuals' daily activities utilizing time series data obtained from wearable sensors for health-related applications. Although recent research has predominantly employed end-to-end Artificial Neural Networks (ANNs) for feature extraction and classification in HAR, these approaches impose a substantial computational load on wearable devices and exhibit limitations in temporal feature extraction due to their activation functions. To address these challenges, we propose the application of Spiking Neural Networks (SNNs), an architecture inspired by the characteristics of biological neurons, to HAR tasks. SNNs accumulate input activation as presynaptic potential charges and generate a binary spike upon surpassing a predetermined threshold. This unique property facilitates spatio-temporal feature extraction and confers the advantage of low-power computation attributable to binary spikes. We conduct rigorous experiments on three distinct HAR datasets using SNNs, demonstrating that our approach attains competitive or superior performance relative to ANNs, while concurrently reducing energy consumption by up to 94%