100 research outputs found

    Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

    Full text link
    Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research

    Advances in Nonnegative Matrix Decomposition with Application to Cluster Analysis

    Get PDF
    Nonnegative Matrix Factorization (NMF) has found a wide variety of applications in machine learning and data mining. NMF seeks to approximate a nonnegative data matrix by a product of several low-rank factorizing matrices, some of which are constrained to be nonnegative. Such additive nature often results in parts-based representation of the data, which is a desired property especially for cluster analysis.  This thesis presents advances in NMF with application in cluster analysis. It reviews a class of higher-order NMF methods called Quadratic Nonnegative Matrix Factorization (QNMF). QNMF differs from most existing NMF methods in that some of its factorizing matrices occur twice in the approximation. The thesis also reviews a structural matrix decomposition method based on Data-Cluster-Data (DCD) random walk. DCD goes beyond matrix factorization and has a solid probabilistic interpretation by forming the approximation with cluster assigning probabilities only. Besides, the Kullback-Leibler divergence adopted by DCD is advantageous in handling sparse similarities for cluster analysis.  Multiplicative update algorithms have been commonly used for optimizing NMF objectives, since they naturally maintain the nonnegativity constraint of the factorizing matrix and require no user-specified parameters. In this work, an adaptive multiplicative update algorithm is proposed to increase the convergence speed of QNMF objectives.  Initialization conditions play a key role in cluster analysis. In this thesis, a comprehensive initialization strategy is proposed to improve the clustering performance by combining a set of base clustering methods. The proposed method can better accommodate clustering methods that need a careful initialization such as the DCD.  The proposed methods have been tested on various real-world datasets, such as text documents, face images, protein, etc. In particular, the proposed approach has been applied to the cluster analysis of emotional data

    Smart Bagged Tree-based Classifier optimized by Random Forests (SBT-RF) to Classify Brain- Machine Interface Data

    Get PDF
    Brain-Computer Interface (BCI) is a new technology that uses electrodes and sensors to connect machines and computers with the human brain to improve a person\u27s mental performance. Also, human intentions and thoughts are analyzed and recognized using BCI, which is then translated into Electroencephalogram (EEG) signals. However, certain brain signals may contain redundant information, making classification ineffective. Therefore, relevant characteristics are essential for enhancing classification performance. . Thus, feature selection has been employed to eliminate redundant data before sorting to reduce computation time. BCI Competition III Dataset Iva was used to investigate the efficacy of the proposed system. A Smart Bagged Tree-based Classifier (SBT-RF) technique is presented to determine the importance of the features for selecting and classifying the data. As a result, SBT-RF is better at improving the mean accuracy of the dataset. It also decreases computation cost and training time and increases prediction speed. Furthermore, fewer features mean fewer electrodes, thus lowering the risk of damage to the brain. The proposed algorithm has the greatest average accuracy of ~98% compared to other relevant algorithms in the literature. SBT-RF is compared to state-of-the-art algorithms based on the following performance metrics: Confusion Matrix, ROC-AUC, F1-Score, Training Time, Prediction speed, and Accuracy

    Decomposition and classification of electroencephalography data

    Get PDF

    Altered EEG Oscillatory Brain Networks During Music-Listening in Major Depression

    Get PDF
    To examine the electrophysiological underpinnings of the functional networks involved in music listening, previous approaches based on spatial independent component analysis (ICA) have recently been used to ongoing electroencephalography (EEG) and magnetoencephalography (MEG). However, those studies focused on healthy subjects, and failed to examine the group-level comparisons during music listening. Here, we combined group-level spatial Fourier ICA with acoustic feature extraction, to enable group comparisons in frequency-specific brain networks of musical feature processing. It was then applied to healthy subjects and subjects with major depressive disorder (MDD). The music-induced oscillatory brain patterns were determined by permutation correlation analysis between individual time courses of Fourier-ICA components and musical features. We found that (1) three components, including a beta sensorimotor network, a beta auditory network and an alpha medial visual network, were involved in music processing among most healthy subjects; and that (2) one alpha lateral component located in the left angular gyrus was engaged in music perception in most individuals with MDD. The proposed method allowed the statistical group comparison, and we found that: (1) the alpha lateral component was activated more strongly in healthy subjects than in the MDD individuals, and that (2) the derived frequency-dependent networks of musical feature processing seemed to be altered in MDD participants compared to healthy subjects. The proposed pipeline appears to be valuable for studying disrupted brain oscillations in psychiatric disorders during naturalistic paradigms.Peer reviewe

    Interpretable Machine Learning for Electro-encephalography

    Get PDF
    While behavioral, genetic and psychological markers can provide important information about brain health, research in that area over the last decades has much focused on imaging devices such as magnetic resonance tomography (MRI) to provide non-invasive information about cognitive processes. Unfortunately, MRI based approaches, able to capture the slow changes in blood oxygenation levels, cannot capture electrical brain activity which plays out on a time scale up to three orders of magnitude faster. Electroencephalography (EEG), which has been available in clinical settings for over 60 years, is able to measure brain activity based on rapidly changing electrical potentials measured non-invasively on the scalp. Compared to MRI based research into neurodegeneration, EEG based research has, over the last decade, received much less interest from the machine learning community. But generally, EEG in combination with sophisticated machine learning offers great potential such that neglecting this source of information, compared to MRI or genetics, is not warranted. In collaborating with clinical experts, the ability to link any results provided by machine learning to the existing body of research is especially important as it ultimately provides an intuitive or interpretable understanding. Here, interpretable means the possibility for medical experts to translate the insights provided by a statistical model into a working hypothesis relating to brain function. To this end, we propose in our first contribution a method allowing for ultra-sparse regression which is applied on EEG data in order to identify a small subset of important diagnostic markers highlighting the main differences between healthy brains and brains affected by Parkinson's disease. Our second contribution builds on the idea that in Parkinson's disease impaired functioning of the thalamus causes changes in the complexity of the EEG waveforms. The thalamus is a small region in the center of the brain affected early in the course of the disease. Furthermore, it is believed that the thalamus functions as a pacemaker - akin to a conductor of an orchestra - such that changes in complexity are expressed and quantifiable based on EEG. We use these changes in complexity to show their association with future cognitive decline. In our third contribution we propose an extension of archetypal analysis embedded into a deep neural network. This generative version of archetypal analysis allows to learn an appropriate representation where every sample of a data set can be decomposed into a weighted sum of extreme representatives, the so-called archetypes. This opens up an interesting possibility of interpreting a data set relative to its most extreme representatives. In contrast, clustering algorithms describe a data set relative to its most average representatives. For Parkinson's disease, we show based on deep archetypal analysis, that healthy brains produce archetypes which are different from those produced by brains affected by neurodegeneration

    On the encoding of natural music in computational models and human brains

    Get PDF
    This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music

    Brain connectivity analysis: a short survey

    Get PDF
    This short survey the reviews recent literature on brain connectivity studies. It encompasses all forms of static and dynamic connectivity whether anatomical, functional, or effective. The last decade has seen an ever increasing number of studies devoted to deduce functional or effective connectivity, mostly from functional neuroimaging experiments. Resting state conditions have become a dominant experimental paradigm, and a number of resting state networks, among them the prominent default mode network, have been identified. Graphical models represent a convenient vehicle to formalize experimental findings and to closely and quantitatively characterize the various networks identified. Underlying these abstract concepts are anatomical networks, the so-called connectome, which can be investigated by functional imaging techniques as well. Future studies have to bridge the gap between anatomical neuronal connections and related functional or effective connectivities
    corecore