236 research outputs found

    Deep spiking neural networks with applications to human gesture recognition

    Get PDF
    The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals. From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems. From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed. In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise.The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals. From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems. From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed. In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise

    Computational modelling of neural mechanisms underlying natural speech perception

    Get PDF
    Humans are highly skilled at the analysis of complex auditory scenes. In particular, the human auditory system is characterized by incredible robustness to noise and can nearly effortlessly isolate the voice of a specific talker from even the busiest of mixtures. However, neural mechanisms underlying these remarkable properties remain poorly understood. This is mainly due to the inherent complexity of speech signals and multi-stage, intricate processing performed in the human auditory system. Understanding these neural mechanisms underlying speech perception is of interest for clinical practice, brain-computer interfacing and automatic speech processing systems. In this thesis, we developed computational models characterizing neural speech processing across different stages of the human auditory pathways. In particular, we studied the active role of slow cortical oscillations in speech-in-noise comprehension through a spiking neural network model for encoding spoken sentences. The neural dynamics of the model during noisy speech encoding reflected speech comprehension of young, normal-hearing adults. The proposed theoretical model was validated by predicting the effects of non-invasive brain stimulation on speech comprehension in an experimental study involving a cohort of volunteers. Moreover, we developed a modelling framework for detecting the early, high-frequency neural response to the uninterrupted speech in non-invasive neural recordings. We applied the method to investigate top-down modulation of this response by the listener's selective attention and linguistic properties of different words from a spoken narrative. We found that in both cases, the detected responses of predominantly subcortical origin were significantly modulated, which supports the functional role of feedback, between higher- and lower levels stages of the auditory pathways, in speech perception. The proposed computational models shed light on some of the poorly understood neural mechanisms underlying speech perception. The developed methods can be readily employed in future studies involving a range of experimental paradigms beyond these considered in this thesis.Open Acces

    Neuromorphic Engineering Editors' Pick 2021

    Get PDF
    This collection showcases well-received spontaneous articles from the past couple of years, which have been specially handpicked by our Chief Editors, Profs. André van Schaik and Bernabé Linares-Barranco. The work presented here highlights the broad diversity of research performed across the section and aims to put a spotlight on the main areas of interest. All research presented here displays strong advances in theory, experiment, and methodology with applications to compelling problems. This collection aims to further support Frontiers’ strong community by recognizing highly deserving authors

    Neurobiological mechanisms for language, symbols and concepts: Clues from brain-constrained deep neural networks

    Get PDF
    Neural networks are successfully used to imitate and model cognitive processes. However, to provide clues about the neurobiological mechanisms enabling human cognition, these models need to mimic the structure and function of real brains. Brain-constrained networks differ from classic neural networks by implementing brain similarities at different scales, ranging from the micro- and mesoscopic levels of neuronal function, local neuronal links and circuit interaction to large-scale anatomical structure and between-area connectivity. This review shows how brain-constrained neural networks can be applied to study in silico the formation of mechanisms for symbol and concept processing and to work towards neurobiological explanations of specifically human cognitive abilities. These include verbal working memory and learning of large vocabularies of symbols, semantic binding carried by specific areas of cortex, attention focusing and modulation driven by symbol type, and the acquisition of concrete and abstract concepts partly influenced by symbols. Neuronal assembly activity in the networks is analyzed to deliver putative mechanistic correlates of higher cognitive processes and to develop candidate explanations founded in established neurobiological principles

    Binaural sound source localization using machine learning with spiking neural networks features extraction

    Get PDF
    Human and animal binaural hearing systems are able take advantage of a variety of cues to localise sound-sources in a 3D space using only two sensors. This work presents a bionic system that utilises aspects of binaural hearing in an automated source localisation task. A head and torso emulator (KEMAR) are used to acquire binaural signals and a spiking neural network is used to compare signals from the two sensors. The firing rates of coincidence-neurons in the spiking neural network model provide information as to the location of a sound source. Previous methods have used a winner-takesall approach, where the location of the coincidence-neuron with the maximum firing rate is used to indicate the likely azimuth and elevation. This was shown to be accurate for single sources, but when multiple sources are present the accuracy significantly reduces. To improve the robustness of the methodology, an alternative approach is developed where the spiking neural network is used as a feature pre-processor. The firing rates of all coincidence-neurons are then used as inputs to a Machine Learning model which is trained to predict source location for both single and multiple sources. A novel approach that applied spiking neural networks as a binaural feature extraction method was presented. These features were processed using deep neural networks to localise multisource sound signals that were emitted from different locations. Results show that the proposed bionic binaural emulator can accurately localise sources including multiple and complex sources to 99% correctly predicted angles from single-source localization model and 91% from multi-source localization model. The impact of background noise on localisation performance has also been investigated and shows significant degradation of performance. The multisource localization model was trained with multi-condition background noise at SNRs of 10dB, 0dB, and -10dB and tested at controlled SNRs. The findings demonstrate an enhancement in the model performance in compared with noise free training data

    Noise processing in the auditory system with applications in speech enhancement

    Get PDF
    Abstract: The auditory system is extremely efficient in extracting auditory information in the presence of background noise. However, speech enhancement algorithms, aimed at removing the background noise from a degraded speech signal, are not achieving results that are near the efficacy of the auditory system. The purpose of this study is thus to first investigate how noise affects the spiking activity of neurons in the auditory system and then use the brain activity in the presence of noise to design better speech enhancement algorithms. In order to investigate how noise affects the spiking activity of neurons, we first design a generalized linear model that relates the spiking activity of neurons to intrinsic and extrinsic covariates that can affect their activity, such as noise. From this model, we extract two metrics, one that shows the effects of noise on the spiking activity and another the relative effects of vocalization compared to noise. We use these metrics to analyze neural data, recorded from a structure of the auditory system named the inferior colliculus (IC), while presenting noisy vocalizations. We studied the effect of different kinds of noises (non-stationary, white and natural stationary), different vocalizations, different input sound levels and signal-to-noise ratios (SNR). We found that the presence of non-stationary noise increases the spiking activity of neurons, regardless of the SNR, input level or vocalization type. The presence of white or natural stationary noises however causes a great diversity of responses where the activity of sites could increase, decrease or remain unchanged. This shows that the noise invariance previously reported in the IC depends on the noisy conditions, which had not been observed before. We then address the problem of speech enhancement using information from the brain's processing in the presence of noise. It has been shown before that the brain waves of a listener strongly correlates with the speaker to which the listener attends. Given this, we design two speech enhancement algorithms with a denoising autoencoder structure, namely the Brain Enhanced Speech Denoiser (BESD) and U-shaped Brain Enhanced Speech Denoiser (U-BESD). These algorithms take advantage of the attended auditory information present in the brain activity of the listener to denoise a multi-talker speech. The U-BESD is built upon the BESD with the addition of skip connections and dilated convolutions. Compared to previously proposed approaches, BESD and U-BESD are trained in a single neural architecture, lowering the complexity of the algorithm. We investigate two experimental settings. In the first one, the attended speaker is known, referred to as the speaker-specific setting, and in the second one no prior information is available about the attended speaker, referred to as the speaker-independent setting. In the speaker-specific setting, we show that both the BESD and U-BESD algorithms surpass a similar denoising autoencoder. Moreover, we also show that in the speaker-independent setting, U-BESD surpasses the performance of the only known approach that also uses the brain's activity.Le système auditif est extrêmement efficace pour extraire de l’information pertinente en présence d’un bruit de fond. Par contre, les algorithmes de rehaussement de la parole, visant à supprimer le bruit d’un signal de parole bruité, n’atteignent pas des résultats proches de l’efficacité du système auditif. Le but de cette étude est donc d’abord d’étudier comment le bruit affecte l’activité neuronale dans le système auditif, puis d’utiliser l’activité cérébrale en présence de bruit pour concevoir de meilleurs algorithmes de rehaussement. Afin d’étudier comment le bruit peut affecter l’activité des neurones, nous concevons d’abord un modèle linéaire généralisé qui relie l’activité des neurones aux covariables intrinsèques et extrinsèques qui peuvent affecter leur activité, comme le bruit. De ce modèle, nous extrayons deux métriques, l’une qui permet d’étudier les effets du bruit sur l’activité neuronale et l’autre les effets relatifs sur cette activité de la vocalisation par rapport au bruit. Nous utilisons ces métriques pour analyser l’activité neuronale d’une structure du système auditif, nomée le colliculus inférieur (IC), enregistrée lors de la présentation de vocalisations bruitées. Nous avons étudié l’effet de différents types de bruits, différentes vocalisations, différents niveaux sonores d’entrée et différents rapports signal sur bruit (SNR). Nous avons constaté que la présence de bruit non stationnaire augmente l’activité des neurones, quel que soit le SNR, le niveau d’entrée ou le type de vocalisation. La présence de bruits stationnaires blancs ou naturels provoque cependant une grande diversité de réponses où l’activité des sites d’enregistrement pouvait augmenter, diminuer ou rester inchangée. Cela montre que l’invariance du bruit précédemment signalée dans l’IC dépend des conditions de bruit, ce qui n’avait pas été observé auparavant. Nous abordons ensuite le problème du rehaussement de la parole en utilisant de l’information provenant du cerveau. Il a été démontré auparavant que les ondes cérébrales d’un auditeur sont fortement corrélées avec le locuteur auquel l’auditeur porte attention. Compte tenu de cette corrélation, nous concevons deux algorithmes de rehaussement de la parole, le Brain Enhanced Speech Denoiser (BESD) et le U-shaped Brain Enhanced Speech Denoiser (U-BESD), qui tirent parti de l’information présente dans l’activité cérébrale de l’auditeur pour débruiter un signal de parole multi-locuteurs. L’U-BESD est construit à partir du BESD avec l’ajout de sauts de connexions (skip connections) et de convolutions dilatées. De plus, BESD et U-BESD sont constitués respectivement d’un seul réseau qui nécessite un seul entraînement, ce qui réduit la complexité de l’algorithme en comparaison avec les approches existantes. Nous étudions deux conditions expérimentales. Dans la première, le locuteur auquel l’auditeur porte attention est connu, et dans la seconde, ce locuteur n’est pas connu. Dans le cadre du locuteur connu, nous montrons que les algorithmes BESD et U-BESD surpassent un autoencodeur similaire. De plus, nous montrons également que dans le cadre du locuteur inconnu, le U-BESD surpasse les performances de la seule approche existante connue qui utilise également l’activité cérébrale

    2022 roadmap on neuromorphic computing and engineering

    Full text link
    Modern computation based on von Neumann architecture is now a mature cutting-edge science. In the von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018^{18} calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges for each research area. We hope that this roadmap will be a useful resource by providing a concise yet comprehensive introduction to readers outside this field, for those who are just entering the field, as well as providing future perspectives for those who are well established in the neuromorphic computing community
    • …
    corecore