180 research outputs found

    Learning Long Sequences in Spiking Neural Networks

    Full text link
    Spiking neural networks (SNNs) take inspiration from the brain to enable energy-efficient computations. Since the advent of Transformers, SNNs have struggled to compete with artificial networks on modern sequential tasks, as they inherit limitations from recurrent neural networks (RNNs), with the added challenge of training with non-differentiable binary spiking activations. However, a recent renewed interest in efficient alternatives to Transformers has given rise to state-of-the-art recurrent architectures named state space models (SSMs). This work systematically investigates, for the first time, the intersection of state-of-the-art SSMs with SNNs for long-range sequence modelling. Results suggest that SSM-based SNNs can outperform the Transformer on all tasks of a well-established long-range sequence modelling benchmark. It is also shown that SSM-based SNNs can outperform current state-of-the-art SNNs with fewer parameters on sequential image classification. Finally, a novel feature mixing layer is introduced, improving SNN accuracy while challenging assumptions about the role of binary activations in SNNs. This work paves the way for deploying powerful SSM-based architectures, such as large language models, to neuromorphic hardware for energy-efficient long-range sequence modelling.Comment: 18 pages, 10 Figures/Table

    Optimization of Deep Convolutional Neural Network with the Integrated Batch Normalization and Global pooling

    Get PDF
    Deep convolutional neural networks (DCNN) have made significant progress in a wide range of applications in recent years, which include image identification, audio recognition, and translation of machine information. These tasks assist machine intelligence in a variety of ways. However, because of the large number of parameters, float manipulations and conversion of machine terminal remains difficult. To handle this issue, optimization of convolution in the DCNN is initiated that adjusts the characteristics of the neural network, and the loss of information is minimized with enriched performance. Minimization of convolution function addresses the optimization issues. Initially, batch normalization is completed, and instead of lowering neighborhood values, a full feature map is minimized to a single value using the global pooling approach. Traditional convolution is split into depth and pointwise to decrease the model size and calculations. The optimized convolution-based DCNN's performance is evaluated with the assistance of accuracy and occurrence of error. The optimized DCNN is compared with the existing state-of-the-art techniques, and the optimized DCNN outperforms the existing technique

    Learning hard quantum distributions with variational autoencoders

    Get PDF
    Studying general quantum many-body systems is one of the major challenges in modern physics because it requires an amount of computational resources that scales exponentially with the size of the system.Simulating the evolution of a state, or even storing its description, rapidly becomes intractable for exact classical algorithms. Recently, machine learning techniques, in the form of restricted Boltzmann machines, have been proposed as a way to efficiently represent certain quantum states with applications in state tomography and ground state estimation. Here, we introduce a new representation of states based on variational autoencoders. Variational autoencoders are a type of generative model in the form of a neural network. We probe the power of this representation by encoding probability distributions associated with states from different classes. Our simulations show that deep networks give a better representation for states that are hard to sample from, while providing no benefit for random states. This suggests that the probability distributions associated to hard quantum states might have a compositional structure that can be exploited by layered neural networks. Specifically, we consider the learnability of a class of quantum states introduced by Fefferman and Umans. Such states are provably hard to sample for classical computers, but not for quantum ones, under plausible computational complexity assumptions. The good level of compression achieved for hard states suggests these methods can be suitable for characterising states of the size expected in first generation quantum hardware.Comment: v2: 9 pages, 3 figures, journal version with major edits with respect to v1 (rewriting of section "hard and easy quantum states", extended discussion on comparison with tensor networks

    Mély neuronhálós akusztikus modellek súlyinicializálásának vizsgálata

    Get PDF
    Az automatikus beszédfelismerés területén az akusztikus modellezésben gyakorlatilag egyeduralkodókká váltak a mély neurális hálók. Az irodalomban számos megoldást találunk arra, hogy hogyan érdemes beállítani a különböző paramétereket a DNN akusztikus modellek tanítása során, azonban általában kevés figyelmet szentelnek annak, hogy a hálók súlyait hogyan érdemes inicializálni. Eközben a gépi tanulási irodalomban ez egy igen aktív terület; a közelmúltban több stratégia is napvilágot látott a DNN kezdősúlyainak beállítására. Jelen munkánkban három ilyen eljárást tesztelünk mély neurális hálós akusztikus modellekben, három különböző aktivációs függvényt (szigmoid, ReLU és szoftplusz) használva. Eredményeink alapján mindenképp érdemes valamilyen speciális súlyinicializálási eljárást alkalmaznunk, ugyanakkor a három vizsgált stratégia (Glorot, He és Edge of Chaos) használatával elért fonémaszintű hibaarányok között nem találtunk szignifikáns különbséget

    Learning to process with spikes and to localise pulses

    Get PDF
    In the last few decades, deep learning with artificial neural networks (ANNs) has emerged as one of the most widely used techniques in tasks such as classification and regression, achieving competitive results and in some cases even surpassing human-level performance. Nonetheless, as ANN architectures are optimised towards empirical results and departed from their biological precursors, how exactly human brains process information using these short electrical pulses called spikes remains a mystery. Hence, in this thesis, we explore the problem of learning to process with spikes and to localise pulses. We first consider spiking neural networks (SNNs), a type of ANN that more closely mimic biological neural networks in that neurons communicate with one another using spikes. This unique architecture allows us to look into the role of heterogeneity in learning. Since it is conjectured that the information is encoded by the timing of spikes, we are particularly interested in the heterogeneity of time constants of neurons. We then trained SNNs for classification tasks on a range of visual and auditory neuromorphic datasets, which contain streams of events (spike times) instead of the conventional frame-based data, and show that the overall performance is improved by allowing the neurons to have different time constants, especially on tasks with richer temporal structure. We also find that the learned time constants are distributed similarly to those experimentally observed in some mammalian cells. Besides, we demonstrate that learning with heterogeneity improves robustness against hyperparameter mistuning. These results suggest that heterogeneity may be more than the byproduct of noisy processes and perhaps serves a key role in learning in changing environments, yet heterogeneity has been overlooked in basic artificial models. While neuromorphic datasets, which are often captured by neuromorphic devices that closely model the corresponding biological systems, have enabled us to explore the more biologically plausible SNNs, there still exists a gap in understanding how spike times encode information in actual biological neural networks like human brains, as such data is difficult to acquire due to the trade-off between the timing precision and the number of cells simultaneously recorded electrically. Instead, what we usually obtain is the low-rate discrete samples of trains of filtered spikes. Hence, in the second part of the thesis, we focus on a different type of problem involving pulses, that is to retrieve the precise pulse locations from these low-rate samples. We make use of the finite rate of innovation (FRI) sampling theory, which states that perfect reconstruction is possible for classes of continuous non-bandlimited signals that have a small number of free parameters. However, existing FRI methods break down under very noisy conditions due to the so-called subspace swap event. Thus, we present two novel model-based learning architectures: Deep Unfolded Projected Wirtinger Gradient Descent (Deep Unfolded PWGD) and FRI Encoder-Decoder Network (FRIED-Net). The former is based on the existing iterative denoising algorithm for subspace-based methods, while the latter models directly the relationship between the samples and the locations of the pulses using an autoencoder-like network. Using a stream of K Diracs as an example, we show that both algorithms are able to overcome the breakdown inherent in the existing subspace-based methods. Moreover, we extend our FRIED-Net framework beyond conventional FRI methods by considering when the shape is unknown. We show that the pulse shape can be learned using backpropagation. This coincides with the application of spike detection from real-world calcium imaging data, where we achieve competitive results. Finally, we explore beyond canonical FRI signals and demonstrate that FRIED-Net is able to reconstruct streams of pulses with different shapes.Open Acces
    corecore