41 research outputs found
Efficient Real Time Recurrent Learning through combined activity and parameter sparsity
Backpropagation through time (BPTT) is the standard algorithm for training
recurrent neural networks (RNNs), which requires separate simulation phases for
the forward and backward passes for inference and learning, respectively.
Moreover, BPTT requires storing the complete history of network states between
phases, with memory consumption growing proportional to the input sequence
length. This makes BPTT unsuited for online learning and presents a challenge
for implementation on low-resource real-time systems. Real-Time Recurrent
Learning (RTRL) allows online learning, and the growth of required memory is
independent of sequence length. However, RTRL suffers from exceptionally high
computational costs that grow proportional to the fourth power of the state
size, making RTRL computationally intractable for all but the smallest of
networks. In this work, we show that recurrent networks exhibiting high
activity sparsity can reduce the computational cost of RTRL. Moreover,
combining activity and parameter sparsity can lead to significant enough
savings in computational and memory costs to make RTRL practical. Unlike
previous work, this improvement in the efficiency of RTRL can be achieved
without using any approximations for the learning process.Comment: Published as a workshop paper at ICLR 2023 Workshop on Sparsity in
Neural Network
Recommended from our members
Evaluating modular neuroevolution in robotic keepaway soccer
textKeepaway is a simpler subtask of robot soccer where three `keepers' attempt to keep possession of the ball while a `taker' tries to steal it from them. This is a less complex task than full robot soccer, and lends itself well as a testbed for multi-agent systems. This thesis does a comprehensive evaluation of various learning methods using neuroevolution with Enforced Sub-Populations (ESP) with the robocup soccer simulator. Both single and multi-component ESP are evaluated using various learning methods on homogeneous and heterogeneous teams of agents. In particular, the effectiveness of modularity and task decomposition for evolving keepaway teams is evaluated. It is shown that in the robocup soccer simulator, homogeneous agents controlled by monolithic networks perform the best. More complex learning approaches like layered learning, concurrent layered learning and co-evolution decrease the performance as does making the agents heterogeneous. The results are also compared with previous results in the keepaway domain.Computer Science
Fast learning without synaptic plasticity in spiking neural networks
Abstract Spiking neural networks are of high current interest, both from the perspective of modelling neural networks of the brain and for porting their fast learning capability and energy efficiency into neuromorphic hardware. But so far we have not been able to reproduce fast learning capabilities of the brain in spiking neural networks. Biological data suggest that a synergy of synaptic plasticity on a slow time scale with network dynamics on a faster time scale is responsible for fast learning capabilities of the brain. We show here that a suitable orchestration of this synergy between synaptic plasticity and network dynamics does in fact reproduce fast learning capabilities of generic recurrent networks of spiking neurons. This points to the important role of recurrent connections in spiking networks, since these are necessary for enabling salient network dynamics. We show more specifically that the proposed synergy enables synaptic weights to encode more general information such as priors and task structures, since moment-to-moment processing of new information can be delegated to the network dynamics
Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference
Artificial neural networks open up unprecedented machine learning
capabilities at the cost of ever growing computational requirements.
Sparsifying the parameters, often achieved through weight pruning, has been
identified as a powerful technique to compress the number of model parameters
and reduce the computational operations of neural networks. Yet, sparse
activations, while omnipresent in both biological neural networks and deep
learning systems, have not been fully utilized as a compression technique in
deep learning. Moreover, the interaction between sparse activations and weight
pruning is not fully understood. In this work, we demonstrate that activity
sparsity can compose multiplicatively with parameter sparsity in a recurrent
neural network model based on the GRU that is designed to be activity sparse.
We achieve up to reduction of computation while maintaining
perplexities below on the Penn Treebank language modeling task. This
magnitude of reduction has not been achieved previously with solely sparsely
connected LSTMs, and the language modeling performance of our model has not
been achieved previously with any sparsely activated recurrent neural networks
or spiking neural networks. Neuromorphic computing devices are especially good
at taking advantage of the dynamic activity sparsity, and our results provide
strong evidence that making deep learning models activity sparse and porting
them to neuromorphic devices can be a viable strategy that does not compromise
on task performance. Our results also drive further convergence of methods from
deep learning and neuromorphic computing for efficient machine learning.Comment: Accepted to the First MLNCP Workshop @ NeurIPS 202
Pattern representation and recognition with accelerated analog neuromorphic systems
Despite being originally inspired by the central nervous system, artificial
neural networks have diverged from their biological archetypes as they have
been remodeled to fit particular tasks. In this paper, we review several
possibilites to reverse map these architectures to biologically more realistic
spiking networks with the aim of emulating them on fast, low-power neuromorphic
hardware. Since many of these devices employ analog components, which cannot be
perfectly controlled, finding ways to compensate for the resulting effects
represents a key challenge. Here, we discuss three different strategies to
address this problem: the addition of auxiliary network components for
stabilizing activity, the utilization of inherently robust architectures and a
training method for hardware-emulated networks that functions without perfect
knowledge of the system's dynamics and parameters. For all three scenarios, we
corroborate our theoretical considerations with experimental results on
accelerated analog neuromorphic platforms.Comment: accepted at ISCAS 201
Block-local learning with probabilistic latent representations
The ubiquitous backpropagation algorithm requires sequential updates across
blocks of a network, introducing a locking problem. Moreover, backpropagation
relies on the transpose of weight matrices to calculate updates, introducing a
weight transport problem across blocks. Both these issues prevent efficient
parallelisation and horizontal scaling of models across devices. We propose a
new method that introduces a twin network that propagates information backwards
from the targets to the input to provide auxiliary local losses. Forward and
backward propagation can work in parallel and with different sets of weights,
addressing the problems of weight transport and locking. Our approach derives
from a statistical interpretation of end-to-end training which treats
activations of network layers as parameters of probability distributions. The
resulting learning framework uses these parameters locally to assess the
matching between forward and backward information. Error backpropagation is
then performed locally within each block, leading to `block-local' learning.
Several previously proposed alternatives to error backpropagation emerge as
special cases of our model. We present results on various tasks and
architectures, including transformers, demonstrating state-of-the-art
performance using block-local learning. These results provide a new principled
framework to train very large networks in a distributed setting and can also be
applied in neuromorphic systems
Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models
Activity and parameter sparsity are two standard methods of making neural
networks computationally more efficient. Event-based architectures such as
spiking neural networks (SNNs) naturally exhibit activity sparsity, and many
methods exist to sparsify their connectivity by pruning weights. While the
effect of weight pruning on feed-forward SNNs has been previously studied for
computer vision tasks, the effects of pruning for complex sequence tasks like
language modeling are less well studied since SNNs have traditionally struggled
to achieve meaningful performance on these tasks. Using a recently published
SNN-like architecture that works well on small-scale language modeling, we
study the effects of weight pruning when combined with activity sparsity.
Specifically, we study the trade-off between the multiplicative efficiency
gains the combination affords and its effect on task performance for language
modeling. To dissect the effects of the two sparsities, we conduct a
comparative analysis between densely activated models and sparsely activated
event-based models across varying degrees of connectivity sparsity. We
demonstrate that sparse activity and sparse connectivity complement each other
without a proportional drop in task performance for an event-based neural
network trained on the Penn Treebank and WikiText-2 language modeling datasets.
Our results suggest sparsely connected event-based neural networks are
promising candidates for effective and efficient sequence modeling.Comment: arXiv admin note: text overlap with arXiv:2311.0762
