Search CORE

41 research outputs found

Efficient Real Time Recurrent Learning through combined activity and parameter sparsity

Author: Subramoney Anand
Publication venue
Publication date: 09/03/2023
Field of study

Backpropagation through time (BPTT) is the standard algorithm for training recurrent neural networks (RNNs), which requires separate simulation phases for the forward and backward passes for inference and learning, respectively. Moreover, BPTT requires storing the complete history of network states between phases, with memory consumption growing proportional to the input sequence length. This makes BPTT unsuited for online learning and presents a challenge for implementation on low-resource real-time systems. Real-Time Recurrent Learning (RTRL) allows online learning, and the growth of required memory is independent of sequence length. However, RTRL suffers from exceptionally high computational costs that grow proportional to the fourth power of the state size, making RTRL computationally intractable for all but the smallest of networks. In this work, we show that recurrent networks exhibiting high activity sparsity can reduce the computational cost of RTRL. Moreover, combining activity and parameter sparsity can lead to significant enough savings in computational and memory costs to make RTRL practical. Unlike previous work, this improvement in the efficiency of RTRL can be achieved without using any approximations for the learning process.Comment: Published as a workshop paper at ICLR 2023 Workshop on Sparsity in Neural Network

arXiv.org e-Print Archive

Beyond Weights: Deep learning in Spiking Neural Networks with pure synaptic-delay training

Author: Grappolini Edoardo
Subramoney Anand
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 03/08/2023
Field of study

Royal Holloway - Pure

Recommended from our members

Evaluating modular neuroevolution in robotic keepaway soccer

Author: Subramoney Anand
Publication venue
Publication date: 24/04/2013
Field of study

textKeepaway is a simpler subtask of robot soccer where three `keepers' attempt to keep possession of the ball while a `taker' tries to steal it from them. This is a less complex task than full robot soccer, and lends itself well as a testbed for multi-agent systems. This thesis does a comprehensive evaluation of various learning methods using neuroevolution with Enforced Sub-Populations (ESP) with the robocup soccer simulator. Both single and multi-component ESP are evaluated using various learning methods on homogeneous and heterogeneous teams of agents. In particular, the effectiveness of modularity and task decomposition for evolving keepaway teams is evaluated. It is shown that in the robocup soccer simulator, homogeneous agents controlled by monolithic networks perform the best. More complex learning approaches like layered learning, concurrent layered learning and co-evolution decrease the performance as does making the agents heterogeneous. The results are also compared with previous results in the keepaway domain.Computer Science

Texas ScholarWorks

Fast learning without synaptic plasticity in spiking neural networks

Author: Bellec Guillaume
Legenstein Robert
Maass Wolfgang
Scherr Franz
Subramoney Anand
Publication venue
Publication date: 01/04/2024
Field of study

Abstract Spiking neural networks are of high current interest, both from the perspective of modelling neural networks of the brain and for porting their fast learning capability and energy efficiency into neuromorphic hardware. But so far we have not been able to reproduce fast learning capabilities of the brain in spiking neural networks. Biological data suggest that a synergy of synaptic plasticity on a slow time scale with network dynamics on a faster time scale is responsible for fast learning capabilities of the brain. We show here that a suitable orchestration of this synergy between synaptic plasticity and network dynamics does in fact reproduce fast learning capabilities of generic recurrent networks of spiking neurons. This points to the important role of recurrent connections in spiking networks, since these are necessary for enabling salient network dynamics. We show more specifically that the proposed synergy enables synaptic weights to encode more general information such as priors and task structures, since moment-to-moment processing of new information can be delegated to the network dynamics

Infoscience - École polytechnique fédérale de Lausanne

Directory of Open Access Journals

Royal Holloway - Pure

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference

Author: Mayr Christian
Mukherji Rishav
Nazeer Khaleelulla Khan
Schöne Mark
Subramoney Anand
Publication venue
Publication date: 07/12/2023
Field of study

Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both biological neural networks and deep learning systems, have not been fully utilized as a compression technique in deep learning. Moreover, the interaction between sparse activations and weight pruning is not fully understood. In this work, we demonstrate that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model based on the GRU that is designed to be activity sparse. We achieve up to

20\times

reduction of computation while maintaining perplexities below

60

on the Penn Treebank language modeling task. This magnitude of reduction has not been achieved previously with solely sparsely connected LSTMs, and the language modeling performance of our model has not been achieved previously with any sparsely activated recurrent neural networks or spiking neural networks. Neuromorphic computing devices are especially good at taking advantage of the dynamic activity sparsity, and our results provide strong evidence that making deep learning models activity sparse and porting them to neuromorphic devices can be a viable strategy that does not compromise on task performance. Our results also drive further convergence of methods from deep learning and neuromorphic computing for efficient machine learning.Comment: Accepted to the First MLNCP Workshop @ NeurIPS 202

arXiv.org e-Print Archive

Pattern representation and recognition with accelerated analog neuromorphic systems

Author: Bellec Guillaume
Bill Johannes
Breitwieser Oliver
Bytschok Ilja
Grübl Andreas
Güttler Maurice
Hartel Andreas
Hartmann Stephan
Husmann Dan
Husmann Kai
Jeltsch Sebastian
Karasenko Vitali
Kleider Mitja
Klähn Johann
Koke Christoph
Kononov Alexander
Legenstein Robert
Maass Wolfgang
Mauch Christian
Mayr Christian
Meier Karlheinz
Müller Eric
Müller Paul
Partzsch Johannes
Petrovici Mihai A.
Pfeil Thomas
Schemmel Johannes
Schiefer Stefan
Schmitt Sebastian
Scholze Stefan
Schroeder Anna
Schüffny René
Stöckel David
Subramoney Anand
Thanasoulis Vasilis
Vogginger Bernhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since many of these devices employ analog components, which cannot be perfectly controlled, finding ways to compensate for the resulting effects represents a key challenge. Here, we discuss three different strategies to address this problem: the addition of auxiliary network components for stabilizing activity, the utilization of inherently robust architectures and a training method for hardware-emulated networks that functions without perfect knowledge of the system's dynamics and parameters. For all three scenarios, we corroborate our theoretical considerations with experimental results on accelerated analog neuromorphic platforms.Comment: accepted at ISCAS 201

arXiv.org e-Print Archive

Crossref

Bern Open Repository and Information System (BORIS)

Block-local learning with probabilistic latent representations

Author: Fokam Cabrel Teguemne
Kappel David
Mayr Christian
Nazeer Khaleelulla Khan
Subramoney Anand
Publication venue
Publication date: 24/05/2023
Field of study

The ubiquitous backpropagation algorithm requires sequential updates across blocks of a network, introducing a locking problem. Moreover, backpropagation relies on the transpose of weight matrices to calculate updates, introducing a weight transport problem across blocks. Both these issues prevent efficient parallelisation and horizontal scaling of models across devices. We propose a new method that introduces a twin network that propagates information backwards from the targets to the input to provide auxiliary local losses. Forward and backward propagation can work in parallel and with different sets of weights, addressing the problems of weight transport and locking. Our approach derives from a statistical interpretation of end-to-end training which treats activations of network layers as parameters of probability distributions. The resulting learning framework uses these parameters locally to assess the matching between forward and backward information. Error backpropagation is then performed locally within each block, leading to `block-local' learning. Several previously proposed alternatives to error backpropagation emerge as special cases of our model. We present results on various tasks and architectures, including transformers, demonstrating state-of-the-art performance using block-local learning. These results provide a new principled framework to train very large networks in a distributed setting and can also be applied in neuromorphic systems

arXiv.org e-Print Archive

Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models

Author: Kappel David
Mayr Christian
Mukherji Rishav
Nazeer Khaleelulla Khan
Schöne Mark
Subramoney Anand
Publication venue
Publication date: 01/08/2024
Field of study

Royal Holloway - Pure

Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models

Author: Kappel David
Mayr Christian
Mukherji Rishav
Nazeer Khaleelulla Khan
Schöne Mark
Subramoney Anand
Publication venue
Publication date: 01/05/2024
Field of study

Activity and parameter sparsity are two standard methods of making neural networks computationally more efficient. Event-based architectures such as spiking neural networks (SNNs) naturally exhibit activity sparsity, and many methods exist to sparsify their connectivity by pruning weights. While the effect of weight pruning on feed-forward SNNs has been previously studied for computer vision tasks, the effects of pruning for complex sequence tasks like language modeling are less well studied since SNNs have traditionally struggled to achieve meaningful performance on these tasks. Using a recently published SNN-like architecture that works well on small-scale language modeling, we study the effects of weight pruning when combined with activity sparsity. Specifically, we study the trade-off between the multiplicative efficiency gains the combination affords and its effect on task performance for language modeling. To dissect the effects of the two sparsities, we conduct a comparative analysis between densely activated models and sparsely activated event-based models across varying degrees of connectivity sparsity. We demonstrate that sparse activity and sparse connectivity complement each other without a proportional drop in task performance for an event-based neural network trained on the Penn Treebank and WikiText-2 language modeling datasets. Our results suggest sparsely connected event-based neural networks are promising candidates for effective and efficient sequence modeling.Comment: arXiv admin note: text overlap with arXiv:2311.0762

arXiv.org e-Print Archive