Search CORE

1,316 research outputs found

Some Commonly Used Speech Feature Extraction Algorithms

Author: Alim Sabur Ajibola
Rashid Nahrul Khair Alang
Publication venue: 'IntechOpen'
Publication date: 12/12/2018
Field of study

Speech is a complex naturally acquired human motor ability. It is characterized in adults with the production of about 14 different sounds per second via the harmonized actions of roughly 100 muscles. Speaker recognition is the capability of a software or hardware to receive speech signal, identify the speaker present in the speech signal and recognize the speaker afterwards. Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively minimized data rate for subsequent processing and analysis. Therefore, acceptable classification is derived from excellent and quality features. Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT) and Perceptual Linear Prediction (PLP) are the speech feature extraction techniques that were discussed in these chapter. These methods have been tested in a wide variety of applications, giving them high level of reliability and acceptability. Researchers have made several modifications to the above discussed techniques to make them less susceptible to noise, more robust and consume less time. In conclusion, none of the methods is superior to the other, the area of application would determine which method to select

IntechOpen

Crossref

CNN AND LSTM FOR THE CLASSIFICATION OF PARKINSON'S DISEASE BASED ON THE GTCC AND MFCC

Author: BELHOUSSINE DRISSI Taoufiq
BOUALOULOU Nouhaila
NSIRI Benayad
Publication venue: Lublin University of Technology
Publication date: 30/06/2023
Field of study

Parkinson's disease is a recognizable clinical syndrome with a variety of causes and clinical presentations; it represents a rapidly growing neurodegenerative disorder. Since about 90 percent of Parkinson's disease sufferers have some form of early speech impairment, recent studies on tele diagnosis of Parkinson's disease have focused on the recognition of voice impairments from vowel phonations or the subjects' discourse. In this paper, we present a new approach for Parkinson's disease detection from speech sounds that are based on CNN and LSTM and uses two categories of characteristics Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Cepstral Coefficients (GTCC) obtained from noise-removed speech signals with comparative EMD-DWT and DWT-EMD analysis. The proposed model is divided into three stages. In the first step, noise is removed from the signals using the EMD-DWT and DWT-EMD methods. In the second step, the GTCC and MFCC are extracted from the enhanced audio signals. The classification process is carried out in the third step by feeding these features into the LSTM and CNN models, which are designed to define sequential information from the extracted features. The experiments are performed using PC-GITA and Sakar datasets and 10-fold cross validation method, the highest classification accuracy for the Sakar dataset reached 100% for both EMD-DWT-GTCC-CNN and DWT-EMD-GTCC-CNN, and for the PC-GITA dataset, the accuracy is reached 100% for EMD-DWT-GTCC-CNN and 96.55% for DWT-EMD-GTCC-CNN. The results of this study indicate that the characteristics of GTCC are more appropriate and accurate for the assessment of PD than MFCC

Lublin University of Technology Journals

Recommended from our members

An end-to-end framework for real-time automatic sleep stage classification.

Author: Ancoli-Israel Sonia
Chee Michael WL
Gooley Joshua J
Ong Ju Lynn
Patanaik Amiya
Publication venue: eScholarship, University of California
Publication date: 01/05/2018
Field of study

Sleep staging is a fundamental but time consuming process in any sleep laboratory. To greatly speed up sleep staging without compromising accuracy, we developed a novel framework for performing real-time automatic sleep stage classification. The client-server architecture adopted here provides an end-to-end solution for anonymizing and efficiently transporting polysomnography data from the client to the server and for receiving sleep stages in an interoperable fashion. The framework intelligently partitions the sleep staging task between the client and server in a way that multiple low-end clients can work with one server, and can be deployed both locally as well as over the cloud. The framework was tested on four datasets comprising ≈1700 polysomnography records (≈12000 hr of recordings) collected from adolescents, young, and old adults, involving healthy persons as well as those with medical conditions. We used two independent validation datasets: one comprising patients from a sleep disorders clinic and the other incorporating patients with Parkinson's disease. Using this system, an entire night's sleep was staged with an accuracy on par with expert human scorers but much faster (≈5 s compared with 30-60 min). To illustrate the utility of such real-time sleep staging, we used it to facilitate the automatic delivery of acoustic stimuli at targeted phase of slow-sleep oscillations to enhance slow-wave sleep

eScholarship - University of California

Modeling Sub-Band Information Through Discrete Wavelet Transform to Improve Intelligibility Assessment of Dysarthric Speech

Author: Pradhan Gayadhar
Sahu Laxmi Priya
Singh Jyoti Prakash
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 16/12/2022
Field of study

The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR- MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker- independent test case

Re-UNIR

Hand Geometry Techniques: A Review

Author: Bakshe R. C. (Rahul)
Patil A. M. (A)
Publication venue: 'Engineering Research Publication ERP'
Publication date: 01/11/2014
Field of study

Volume 2 Issue 11 (November 2014

Neliti

Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners

Author: Aguiar G. J.
Barbon S.
Guido R. C.
Patil H. A.
Proenca M. L.
Santana E. J.
Publication venue
Publication date: 01/01/2023
Field of study

Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., multiple disorders might exist, demanding multi-label classification procedures for effective diagnoses. Consequently, the objective of this paper is to investigate the application of five multi-label classification methods based on problem transformation to play the role of base-learners, i.e., Label Powerset, Binary Relevance, Nested Stacking, Classifier Chains, and Dependent Binary Relevance with Random Forest (RF) and Support Vector Machine (SVM), in addition to a Deep Neural Network (DNN) from an algorithm adaptation method, to detect multiple voice disorders, i.e., Dysphonia, Laryngitis, Reinke's Edema, Vox Senilis, and Central Laryngeal Motion Disorder. Receiving as input three handcrafted features, i.e., signal energy (SE), zero-crossing rates (ZCRs), and signal entropy (SH), which allow for interpretable descriptors in terms of speech analysis, production, and perception, we observed that the DNN-based approach powered with SE-based feature vectors presented the best values of F1-score among the tested methods, i.e., 0.943, as the averaged value from all the balancing scenarios, under Saarbrücken Voice Database (SVD) and considering 20% of balancing rate with Synthetic Minority Over-sampling Technique (SMOTE). Finally, our findings of most false negatives for laryngitis may explain the reason why its detection is a serious issue in speech technology. The results we report provide an original contribution, allowing for the consistent detection of multiple speech pathologies and advancing the state-of-the-art in the field of handcrafted acoustic-based non-invasive diagnosis of voice disorders

Archivio istituzionale della ricerca - Università di Trieste

Techniques of EMG signal analysis: detection, processing, classification and applications

Author: A Boca del
A Hamilton-Wright
A Merlo
A Moser
AJ McComas
AJ Thexton
AM Syeed
AR Ismail
CI Christodoulou
CJ Luca de
CL Nikias
CL Nikias
CL Nikias
CS Pattichis
D Farina
D Gabor
D Graupe
D Graupe
D Zennaro
DA Winter
DK Kumar
DW Stashuk
F Laterza
F. Mohd-Yasin
FHY Chan
G Cheron
G Hefftner
GB Giannakis
H Piper
HJ Yeom
J Duchene
JR Cram
JV Basmajian
K Kanosue
K Tohru
K Yana
KB Englehart
KR Wheeler
L Bernatos
L Cohen
LA Major
LQ Zhang
M. B. I. Reaz
M. S. Hussain
NR Lorente de
P Bornato
P Rosenfalck
P Wellig
PA Kaplanis
PC Doerschuk
R Boualem
RFM Kleissen
S Karlsson
S Micera
S Shahid
SD Nandedkar
V Stanford
W Martin
W Peasgood
X Lanyi
X Zhengquan
Y Zhou
YT Zhang
Publication venue: Biological Procedures Online
Publication date: 01/01/2006
Field of study

Electromyography (EMG) signals can be used for clinical/biomedical applications, Evolvable Hardware Chip (EHW) development, and modern human computer interaction. EMG signals acquired from muscles require advanced methods for detection, decomposition, processing, and classification. The purpose of this paper is to illustrate the various methodologies and algorithms for EMG signal analysis to provide efficient and effective ways of understanding the signal and its nature. We further point up some of the hardware implementations using EMG focusing on applications related to prosthetic hand control, grasp recognition, and human computer interaction. A comparison study is also given to show performance of various EMG signal analysis methods. This paper provides researchers a good understanding of EMG signal and its analysis procedures. This knowledge will help them develop more powerful, flexible, and efficient applications

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

SHDL@MMU Digital Repository

Special issue on signal processing and machine learning for biomedical data

Author: Cascio D.
Raso G.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

This Special Issue is focused on advanced techniques in signal processing, analysis, modelling, and classification, applied to a variety of medical diagnostic problems. Biomedical data play a fundamental role in many fields of research and clinical practice. Very often the complexity of these data and their large volume makes it necessary to develop advanced analysis techniques and systems. Furthermore, the introduction of new techniques and methodologies for diagnostic purposes, especially in the field of medical imaging, requires new signal processing and machine learning methods. The recent progress in machine learning techniques, and in particular deep learning, revolutionized various fields of artificial vision, significantly pushing the state of the art of artificial vision systems into a wide range of high-level tasks. Such progress can help address problems in the analysis of biomedical data.This Special Issue placed particular emphasis on contributions dealing with practical, applications-led research, on the use of methods and devices in clinical diagnosis. The works that make up this special issue show a remarkable variety of applications for the detection and classification of medical imaging problems. In particular, the aforementioned works can be divided on the basis of types of techniques used, into three categories—signal processing (SP) methods, traditional machine learning (ML) methods, and deep learning (DL) methods

Archivio istituzionale della ricerca - Università di Palermo

Voice pathologies : the most comum features and classification tools

Author: Fernandes Joana Filipa
Freitas Diamantino
Teixeira João Paulo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Speech pathologies are quite common in society, however the exams that exist are invasive, making them uncomfortable for patients and depending on the experience of the clinician who performs the assessment. Hence the need to develop non-invasive methods, which allow objective and efficient analysis. Taking this need into account in this work, the most promising list of features and classifiers was identified. As features, jitter, shimmer, HNR, LPC, PLP, and MFCC were identified and as classifiers CNN, RNN and LSTM. This study intends to develop a device to support medical decision, however this article already presents the system interface.info:eu-repo/semantics/publishedVersio

Biblioteca Digital do IPB