Search CORE

147 research outputs found

A Review of Accent-Based Automatic Speech Recognition Models for E-Learning Environment

Author: Omojokun Gabriel Aju
Veronica Ijebusomma Osubor
Publication venue: Covenant University, Ota, Nigeria
Publication date: 16/12/2022
Field of study

The adoption of electronics learning (e-learning) as a method of disseminating knowledge in the global educational system is growing at a rapid rate, and has created a shift in the knowledge acquisition methods from the conventional classrooms and tutors to the distributed e-learning technique that enables access to various learning resources much more conveniently and flexibly. However, notwithstanding the adaptive advantages of learner-centric contents of e-learning programmes, the distributed e-learning environment has unconsciously adopted few international languages as the languages of communication among the participants despite the various accents (mother language influence) among these participants. Adjusting to and accommodating these various accents has brought about the introduction of accents-based automatic speech recognition into the e-learning to resolve the effects of the accent differences. This paper reviews over 50 research papers to determine the development so far made in the design and implementation of accents-based automatic recognition models for the purpose of e-learning between year 2001 and 2021. The analysis of the review shows that 50% of the models reviewed adopted English language, 46.50% adopted the major Chinese and Indian languages and 3.50% adopted Swedish language as the mode of communication. It is therefore discovered that majority of the ASR models are centred on the European, American and Asian accents, while unconsciously excluding the various accents peculiarities associated with the less technologically resourced continents

Covenant Journals (Covenant University)

AUTOMATED ARTIFACT REMOVAL AND DETECTION OF MILD COGNITIVE IMPAIRMENT FROM SINGLE CHANNEL ELECTROENCEPHALOGRAPHY SIGNALS FOR REAL-TIME IMPLEMENTATIONS ON WEARABLES

Author: Khatun Saleha
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2018
Field of study

Electroencephalogram (EEG) is a technique for recording asynchronous activation of neuronal firing inside the brain with non-invasive scalp electrodes. EEG signal is well studied to evaluate the cognitive state, detect brain diseases such as epilepsy, dementia, coma, autism spectral disorder (ASD), etc. In this dissertation, the EEG signal is studied for the early detection of the Mild Cognitive Impairment (MCI). MCI is the preliminary stage of Dementia that may ultimately lead to Alzheimers disease (AD) in the elderly people. Our goal is to develop a minimalistic MCI detection system that could be integrated to the wearable sensors. This contribution has three major aspects: 1) cleaning the EEG signal, 2) detecting MCI, and 3) predicting the severity of the MCI using the data obtained from a single-channel EEG electrode. Artifacts such as eye blink activities can corrupt the EEG signals. We investigate unsupervised and effective removal of ocular artifact (OA) from single-channel streaming raw EEG data. Wavelet transform (WT) decomposition technique was systematically evaluated for effectiveness of OA removal for a single-channel EEG system. Discrete Wavelet Transform (DWT) and Stationary Wavelet Transform (SWT), is studied with four WT basis functions: haar, coif3, sym3, and bior4.4. The performance of the artifact removal algorithm was evaluated by the correlation coefficients (CC), mutual information (MI), signal to artifact ratio (SAR), normalized mean square error (NMSE), and time-frequency analysis. It is demonstrated that WT can be an effective tool for unsupervised OA removal from single channel EEG data for real-time applications.For the MCI detection from the clean EEG data, we collected the scalp EEG data, while the subjects were stimulated with five auditory speech signals. We extracted 590 features from the Event-Related Potential (ERP) of the collected EEG signals, which included time and spectral domain characteristics of the response. The top 25 features, ranked by the random forest method, were used for classification models to identify subjects with MCI. Robustness of our model was tested using leave-one-out cross-validation while training the classifiers. Best results (leave-one-out cross-validation accuracy 87.9%, sensitivity 84.8%, specificity 95%, and F score 85%) were obtained using support vector machine (SVM) method with Radial Basis Kernel (RBF) (sigma = 10, cost = 102). Similar performances were also observed with logistic regression (LR), further validating the results. Our results suggest that single-channel EEG could provide a robust biomarker for early detection of MCI. We also developed a single channel Electro-encephalography (EEG) based MCI severity monitoring algorithm by generating the Montreal Cognitive Assessment (MoCA) scores from the features extracted from EEG. We performed multi-trial and single-trail analysis for the algorithm development of the MCI severity monitoring. We studied Multivariate Regression (MR), Ensemble Regression (ER), Support Vector Regression (SVR), and Ridge Regression (RR) for multi-trial and deep neural regression for the single-trial analysis. In the case of multi-trial, the best result was obtained from the ER. In our single-trial analysis, we constructed the time-frequency image from each trial and feed it to the convolutional deep neural network (CNN). Performance of the regression models was evaluated by the RMSE and the residual analysis. We obtained the best accuracy with the deep neural regression method

University of Memphis Digital Commons

Wavelet-based techniques for speech recognition

Author: Omar Farooq (7204418)
Publication venue
Publication date: 01/01/2002
Field of study

In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. [Continues.

Loughborough University Institutional Repository

Wavelet methods in speech recognition

Author: Christopher J. Long (7202105)
Publication venue
Publication date: 01/01/1999
Field of study

In this thesis, novel wavelet techniques are developed to improve parametrization of speech signals prior to classification. It is shown that non-linear operations carried out in the wavelet domain improve the performance of a speech classifier and consistently outperform classical Fourier methods. This is because of the localised nature of the wavelet, which captures correspondingly well-localised time-frequency features within the speech signal. Furthermore, by taking advantage of the approximation ability of wavelets, efficient representation of the non-stationarity inherent in speech can be achieved in a relatively small number of expansion coefficients. This is an attractive option when faced with the so-called 'Curse of Dimensionality' problem of multivariate classifiers such as Linear Discriminant Analysis (LDA) or Artificial Neural Networks (ANNs). Conventional time-frequency analysis methods such as the Discrete Fourier Transform either miss irregular signal structures and transients due to spectral smearing or require a large number of coefficients to represent such characteristics efficiently. Wavelet theory offers an alternative insight in the representation of these types of signals. As an extension to the standard wavelet transform, adaptive libraries of wavelet and cosine packets are introduced which increase the flexibility of the transform. This approach is observed to be yet more suitable for the highly variable nature of speech signals in that it results in a time-frequency sampled grid that is well adapted to irregularities and transients. They result in a corresponding reduction in the misclassification rate of the recognition system. However, this is necessarily at the expense of added computing time. Finally, a framework based on adaptive time-frequency libraries is developed which invokes the final classifier to choose the nature of the resolution for a given classification problem. The classifier then performs dimensionaIity reduction on the transformed signal by choosing the top few features based on their discriminant power. This approach is compared and contrasted to an existing discriminant wavelet feature extractor. The overall conclusions of the thesis are that wavelets and their relatives are capable of extracting useful features for speech classification problems. The use of adaptive wavelet transforms provides the flexibility within which powerful feature extractors can be designed for these types of application

Loughborough University Institutional Repository

Interactive speech-driven facial animation

Author: Hodgkinson Warren
Publication venue
Publication date: 18/07/2008
Field of study

One of the fastest developing areas in the entertainment industry is digital animation. Television programmes and movies frequently use 3D animations to enhance or replace actors and scenery. With the increase in computing power, research is also being done to apply these animations in an interactive manner. Two of the biggest obstacles to the success of these undertakings are control (manipulating the models) and realism. This text describes many of the ways to improve control and realism aspects, in such a way that interactive animation becomes possible. Specifically, lip-synchronisation (driven by human speech), and various modeling and rendering techniques are discussed. A prototype that shows that interactive animation is feasible, is also described.Mr. A. Hardy Prof. S. von Solm

University of Johannesburg Institutional Repository

Some Commonly Used Speech Feature Extraction Algorithms

Author: Alim Sabur Ajibola
Rashid Nahrul Khair Alang
Publication venue: 'IntechOpen'
Publication date: 12/12/2018
Field of study

Speech is a complex naturally acquired human motor ability. It is characterized in adults with the production of about 14 different sounds per second via the harmonized actions of roughly 100 muscles. Speaker recognition is the capability of a software or hardware to receive speech signal, identify the speaker present in the speech signal and recognize the speaker afterwards. Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively minimized data rate for subsequent processing and analysis. Therefore, acceptable classification is derived from excellent and quality features. Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT) and Perceptual Linear Prediction (PLP) are the speech feature extraction techniques that were discussed in these chapter. These methods have been tested in a wide variety of applications, giving them high level of reliability and acceptability. Researchers have made several modifications to the above discussed techniques to make them less susceptible to noise, more robust and consume less time. In conclusion, none of the methods is superior to the other, the area of application would determine which method to select

IntechOpen

Crossref

Discrete Wavelet Transform Based Cancelable Biometric System for Speaker Recognition

Author: Abd El-wahab , Basant
Publication venue: Arab Journals Platform
Publication date: 03/10/2023
Field of study

The biometric template characteristics and privacy conquest are challenging issues. To resolve such limitations, the cancelable biometric systems have been briefed. In this paper, the efficient cancelable biometric system based on the cryptosystem is introduced. It depends on permutation using a chaotic Baker map and substitution using masks in various transform domains. The proposed cancelable system features extraction phase is based on the Cepstral analysis from the encrypted speech signal in the time domain combined with the encrypted speech signal in the discrete wavelet transform (DWT). Then, the resultant features are applied to the artificial neural network for classification. Furthermore, wavelet denoising is used at the receiver side to enhance the proposed system. The cryptosystem provides a robust protection level of the speech template. This speech template can be replaced and recertified if it is breached. Our proposed system enables the generation of various templates from the same speech signal under the constraint of linkability between them. The simulation results confirmed that the proposed cancelable biometric system achieved higher a level of performance than traditional biometric systems, which achieved 97.5% recognition rate at low signal to noise ratio (SNR) of -25dB and 100% with -15dB and above

Arab Journals Platform

Recommended from our members

Speaker recognition with hybrid features from a deep belief network

Author: AR Mohamed
Artur S. d’Avila Garcez
C Burges
Emmanouil Benetos
F Richardson
GE Hinton
GE Hinton
H Ali
H Ali
H Lee
Hazrat Ali
L Deng
N Dehak
N Roux Le
Son N. Tran
T Kinnunen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/08/2016
Field of study

Learning representation from audio data has shown advantages over the handcrafted features such as mel-frequency cepstral coefficients (MFCCs) in many audio applications. In most of the representation learning approaches, the connectionist systems have been used to learn and extract latent features from the fixed length data. In this paper, we propose an approach to combine the learned features and the MFCC features for speaker recognition task, which can be applied to audio scripts of different lengths. In particular, we study the use of features from different levels of deep belief network for quantizing the audio data into vectors of audio word counts. These vectors represent the audio scripts of different lengths that make them easier to train a classifier. We show in the experiment that the audio word count vectors generated from mixture of DBN features at different layers give better performance than the MFCC features. We also can achieve further improvement by combining the audio word count vector and the MFCC features

City Research Online

Crossref

University of Tasmania Open Access Repository

Queen Mary Research Online

Non-linear dynamical analysis of biosignals

Author: Patil Anupkumarv
Publication venue
Publication date
Field of study

Biosignals are physiological signals that are recorded from various parts of the body. Some of the major biosignals are electromyograms (EMG), electroencephalograms (EEG) and electrocardiograms (ECG). These signals are of great clinical and diagnostic importance, and are analysed to understand their behaviour and to extract maximum information from them. However, they tend to be random and unpredictable in nature (non-linear). Conventional linear methods of analysis are insufficient. Hence, analysis using non-linear and dynamical system theory, chaos theory and fractal dimensions, is proving to be very beneficial. In this project, ECG signals are of interest. Changes in the normal rhythm of a human heart may result in different cardiac arrhythmias, which may be fatal or cause irreparable damage to the heart when sustained over long periods of time. Hence the ability to identify arrhythmias from ECG recordings is of importance for clinical diagnosis and treatment and also for understanding the electrophysiological mechanism of arrhythmias. To achieve this aim, algorithms were developed with the help of MATLAB® software. The classical logic of correlation was used in the development of algorithms to place signals into the various categories of cardiac arrhythmias. A sample set of 35 known ECG signals were obtained from the Physionet website for testing purposes. Later, 5 unknown ECG signals were used to determine the efficiency of the algorithms. A peak detection algorithm was written to detect the QRS complex. This complex is the most prominent waveform within an ECG signal and its shape, duration and time of occurrence provides valuable information about the current state of the heart. The peak detection algorithm gave excellent results with very good accuracy for all the downloaded ECG signals, and was developed using classical linear techniques. Later, a peak detection algorithm using the discrete wavelet transform (DWT) was implemented. This code was developed using nonlinear techniques and was amenable for implementation. Also, the time required for execution was reduced, making this code ideal for real-time processing. Finally, algorithms were developed to calculate the Kolmogorov complexity and Lyapunov exponent, which are nonlinear descriptors and enable the randomness and chaotic nature of ECG signals to be estimated. These measures of randomness and chaotic nature enable us to apply correct interrogative methods to the signal to extract maximum information. The codes developed gave fair results. It was possible to differentiate between normal ECGs and ECGs with ventricular fibrillation. The results show that the Kolmogorov complexity measure increases with an increase in pathology, approximately 12.90 for normal ECGs and increasing to 13.87 to 14.39 for ECGs with ventricular fibrillation and ventricular tachycardia. Similar results were obtained for Lyapunov exponent measures with a notable difference between normal ECG (0 – 0.0095) and ECG with ventricular fibrillation (0.1114 – 0.1799). However, it was difficult to differentiate between different types of arrhythmias.Biosignals are physiological signals that are recorded from various parts of the body. Some of the major biosignals are electromyograms (EMG), electroencephalograms (EEG) and electrocardiograms (ECG). These signals are of great clinical and diagnostic importance, and are analysed to understand their behaviour and to extract maximum information from them. However, they tend to be random and unpredictable in nature (non-linear). Conventional linear methods of analysis are insufficient. Hence, analysis using non-linear and dynamical system theory, chaos theory and fractal dimensions, is proving to be very beneficial. In this project, ECG signals are of interest. Changes in the normal rhythm of a human heart may result in different cardiac arrhythmias, which may be fatal or cause irreparable damage to the heart when sustained over long periods of time. Hence the ability to identify arrhythmias from ECG recordings is of importance for clinical diagnosis and treatment and also for understanding the electrophysiological mechanism of arrhythmias. To achieve this aim, algorithms were developed with the help of MATLAB® software. The classical logic of correlation was used in the development of algorithms to place signals into the various categories of cardiac arrhythmias. A sample set of 35 known ECG signals were obtained from the Physionet website for testing purposes. Later, 5 unknown ECG signals were used to determine the efficiency of the algorithms. A peak detection algorithm was written to detect the QRS complex. This complex is the most prominent waveform within an ECG signal and its shape, duration and time of occurrence provides valuable information about the current state of the heart. The peak detection algorithm gave excellent results with very good accuracy for all the downloaded ECG signals, and was developed using classical linear techniques. Later, a peak detection algorithm using the discrete wavelet transform (DWT) was implemented. This code was developed using nonlinear techniques and was amenable for implementation. Also, the time required for execution was reduced, making this code ideal for real-time processing. Finally, algorithms were developed to calculate the Kolmogorov complexity and Lyapunov exponent, which are nonlinear descriptors and enable the randomness and chaotic nature of ECG signals to be estimated. These measures of randomness and chaotic nature enable us to apply correct interrogative methods to the signal to extract maximum information. The codes developed gave fair results. It was possible to differentiate between normal ECGs and ECGs with ventricular fibrillation. The results show that the Kolmogorov complexity measure increases with an increase in pathology, approximately 12.90 for normal ECGs and increasing to 13.87 to 14.39 for ECGs with ventricular fibrillation and ventricular tachycardia. Similar results were obtained for Lyapunov exponent measures with a notable difference between normal ECG (0 – 0.0095) and ECG with ventricular fibrillation (0.1114 – 0.1799). However, it was difficult to differentiate between different types of arrhythmias

STAX (Strathclyde Repository)