180 research outputs found
Learning Long Sequences in Spiking Neural Networks
Spiking neural networks (SNNs) take inspiration from the brain to enable
energy-efficient computations. Since the advent of Transformers, SNNs have
struggled to compete with artificial networks on modern sequential tasks, as
they inherit limitations from recurrent neural networks (RNNs), with the added
challenge of training with non-differentiable binary spiking activations.
However, a recent renewed interest in efficient alternatives to Transformers
has given rise to state-of-the-art recurrent architectures named state space
models (SSMs). This work systematically investigates, for the first time, the
intersection of state-of-the-art SSMs with SNNs for long-range sequence
modelling. Results suggest that SSM-based SNNs can outperform the Transformer
on all tasks of a well-established long-range sequence modelling benchmark. It
is also shown that SSM-based SNNs can outperform current state-of-the-art SNNs
with fewer parameters on sequential image classification. Finally, a novel
feature mixing layer is introduced, improving SNN accuracy while challenging
assumptions about the role of binary activations in SNNs. This work paves the
way for deploying powerful SSM-based architectures, such as large language
models, to neuromorphic hardware for energy-efficient long-range sequence
modelling.Comment: 18 pages, 10 Figures/Table
Optimization of Deep Convolutional Neural Network with the Integrated Batch Normalization and Global pooling
Deep convolutional neural networks (DCNN) have made significant progress in a wide range of applications in recent years, which include image identification, audio recognition, and translation of machine information. These tasks assist machine intelligence in a variety of ways. However, because of the large number of parameters, float manipulations and conversion of machine terminal remains difficult. To handle this issue, optimization of convolution in the DCNN is initiated that adjusts the characteristics of the neural network, and the loss of information is minimized with enriched performance. Minimization of convolution function addresses the optimization issues. Initially, batch normalization is completed, and instead of lowering neighborhood values, a full feature map is minimized to a single value using the global pooling approach. Traditional convolution is split into depth and pointwise to decrease the model size and calculations. The optimized convolution-based DCNN's performance is evaluated with the assistance of accuracy and occurrence of error. The optimized DCNN is compared with the existing state-of-the-art techniques, and the optimized DCNN outperforms the existing technique
Learning hard quantum distributions with variational autoencoders
Studying general quantum many-body systems is one of the major challenges in
modern physics because it requires an amount of computational resources that
scales exponentially with the size of the system.Simulating the evolution of a
state, or even storing its description, rapidly becomes intractable for exact
classical algorithms. Recently, machine learning techniques, in the form of
restricted Boltzmann machines, have been proposed as a way to efficiently
represent certain quantum states with applications in state tomography and
ground state estimation. Here, we introduce a new representation of states
based on variational autoencoders. Variational autoencoders are a type of
generative model in the form of a neural network. We probe the power of this
representation by encoding probability distributions associated with states
from different classes. Our simulations show that deep networks give a better
representation for states that are hard to sample from, while providing no
benefit for random states. This suggests that the probability distributions
associated to hard quantum states might have a compositional structure that can
be exploited by layered neural networks. Specifically, we consider the
learnability of a class of quantum states introduced by Fefferman and Umans.
Such states are provably hard to sample for classical computers, but not for
quantum ones, under plausible computational complexity assumptions. The good
level of compression achieved for hard states suggests these methods can be
suitable for characterising states of the size expected in first generation
quantum hardware.Comment: v2: 9 pages, 3 figures, journal version with major edits with respect
to v1 (rewriting of section "hard and easy quantum states", extended
discussion on comparison with tensor networks
Mély neuronhálós akusztikus modellek súlyinicializálásának vizsgálata
Az automatikus beszédfelismerés területén az akusztikus modellezésben gyakorlatilag egyeduralkodókká váltak a mély neurális hálók. Az irodalomban számos megoldást találunk arra, hogy hogyan érdemes beállítani a különböző paramétereket a DNN akusztikus modellek tanítása során, azonban általában kevés figyelmet szentelnek annak, hogy a hálók súlyait hogyan érdemes inicializálni. Eközben a gépi tanulási irodalomban ez egy igen aktív terület; a közelmúltban több stratégia is napvilágot látott a DNN kezdősúlyainak beállítására. Jelen munkánkban három ilyen eljárást tesztelünk mély neurális hálós akusztikus modellekben, három különböző aktivációs függvényt (szigmoid, ReLU és szoftplusz) használva. Eredményeink alapján mindenképp érdemes valamilyen speciális súlyinicializálási eljárást alkalmaznunk, ugyanakkor a három vizsgált stratégia (Glorot, He és Edge of Chaos) használatával elért fonémaszintű hibaarányok között nem találtunk szignifikáns különbséget
Learning to process with spikes and to localise pulses
In the last few decades, deep learning with artificial neural networks (ANNs) has emerged as one of the most widely used techniques in tasks such as classification and regression, achieving competitive results and in some cases even surpassing human-level performance. Nonetheless, as ANN architectures are optimised towards empirical results and departed from their biological precursors, how exactly human brains process information using these short electrical pulses called spikes remains a mystery. Hence, in this thesis, we explore the problem of learning to process with spikes and to localise pulses.
We first consider spiking neural networks (SNNs), a type of ANN that more closely mimic biological neural networks in that neurons communicate with one another using spikes. This unique architecture allows us to look into the role of heterogeneity in learning. Since it is conjectured that the information is encoded by the timing of spikes, we are particularly interested in the heterogeneity of time constants of neurons. We then trained SNNs for classification tasks on a range of visual and auditory neuromorphic datasets, which contain streams of events (spike times) instead of the conventional frame-based data, and show that the overall performance is improved by allowing the neurons to have different time constants, especially on tasks with richer temporal structure. We also find that the learned time constants are distributed similarly to those experimentally observed in some mammalian cells. Besides, we demonstrate that learning with heterogeneity improves robustness against hyperparameter mistuning. These results suggest that heterogeneity may be more than the byproduct of noisy processes and perhaps serves a key role in learning in changing environments, yet heterogeneity has been overlooked in basic artificial models.
While neuromorphic datasets, which are often captured by neuromorphic devices that closely model the corresponding biological systems, have enabled us to explore the more biologically plausible SNNs, there still exists a gap in understanding how spike times encode information in actual biological neural networks like human brains, as such data is difficult to acquire due to the trade-off between the timing precision and the number of cells simultaneously recorded electrically. Instead, what we usually obtain is the low-rate discrete samples of trains of filtered spikes. Hence, in the second part of the thesis, we focus on a different type of problem involving pulses, that is to retrieve the precise pulse locations from these low-rate samples. We make use of the finite rate of innovation (FRI) sampling theory, which states that perfect reconstruction is possible for classes of continuous non-bandlimited signals that have a small number of free parameters. However, existing FRI methods break down under very noisy conditions due to the so-called subspace swap event. Thus, we present two novel model-based learning architectures: Deep Unfolded Projected Wirtinger Gradient Descent (Deep Unfolded PWGD) and FRI Encoder-Decoder Network (FRIED-Net). The former is based on the existing iterative denoising algorithm for subspace-based methods, while the latter models directly the relationship between the samples and the locations of the pulses using an autoencoder-like network. Using a stream of K Diracs as an example, we show that both algorithms are able to overcome the breakdown inherent in the existing subspace-based methods. Moreover, we extend our FRIED-Net framework beyond conventional FRI methods by considering when the shape is unknown. We show that the pulse shape can be learned using backpropagation. This coincides with the application of spike detection from real-world calcium imaging data, where we achieve competitive results. Finally, we explore beyond canonical FRI signals and demonstrate that FRIED-Net is able to reconstruct streams of pulses with different shapes.Open Acces
- …