3,246 research outputs found

    Assessment of vocal cord nodules: A case study in speech processing by using Hilbert-Huang Transform

    Get PDF
    Vocal cord nodules represent a pathological condition for which the growth of unnatural masses on vocal folds affects the patients. Among other effects, changes in the vocal cords' overall mass and stiffness alter their vibratory behaviour, thus changing the vocal emission generated by them. This causes dysphonia, i.e. abnormalities in the patients' voice, which can be analysed and inspected via audio signals. However, the evaluation of voice condition through speech processing is not a trivial task, as standard methods based on the Fourier Transform, fail to fit the non-stationary nature of vocal signals. In this study, four audio tracks, provided by a volunteer patient, whose vocal fold nodules have been surgically removed, were analysed using a relatively new technique: the Hilbert-Huang Transform (HHT) via Empirical Mode Decomposition (EMD); specifically, by using the CEEMDAN (Complete Ensemble EMD with Adaptive Noise) algorithm. This method has been applied here to speech signals, which were recorded before removal surgery and during convalescence, to investigate specific trends. Possibilities offered by the HHT are exposed, but also some limitations of decomposing the signals into so-called intrinsic mode functions (IMFs) are highlighted. The results of these preliminary studies are intended to be a basis for the development of new viable alternatives to the softwares currently used for the analysis and evaluation of pathological voice

    CNN AND LSTM FOR THE CLASSIFICATION OF PARKINSON'S DISEASE BASED ON THE GTCC AND MFCC

    Get PDF
    Parkinson's disease is a recognizable clinical syndrome with a variety of causes and clinical presentations; it represents a rapidly growing neurodegenerative disorder. Since about 90 percent of Parkinson's disease sufferers have some form of early speech impairment, recent studies on tele diagnosis of Parkinson's disease have focused on the recognition of voice impairments from vowel phonations or the subjects' discourse. In this paper, we present a new approach for Parkinson's disease detection from speech sounds that are based on CNN and LSTM and uses two categories of characteristics Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Cepstral Coefficients (GTCC) obtained from noise-removed speech signals with comparative EMD-DWT and DWT-EMD analysis. The proposed model is divided into three stages. In the first step, noise is removed from the signals using the EMD-DWT and DWT-EMD methods. In the second step, the GTCC and MFCC are extracted from the enhanced audio signals. The classification process is carried out in the third step by feeding these features into the LSTM and CNN models, which are designed to define sequential information from the extracted features. The experiments are performed using PC-GITA and Sakar datasets and 10-fold cross validation method, the highest classification accuracy for the Sakar dataset reached 100% for both EMD-DWT-GTCC-CNN and DWT-EMD-GTCC-CNN, and for the PC-GITA dataset, the accuracy is reached 100% for EMD-DWT-GTCC-CNN and 96.55% for DWT-EMD-GTCC-CNN. The results of this study indicate that the characteristics of GTCC are more appropriate and accurate for the assessment of PD than MFCC

    Machine Learning-Based Classification of Pulmonary Diseases through Real-Time Lung Sounds

    Get PDF
        The study presents a computer-based automated system that employs machine learning to classify pulmonary diseases using lung sound data collected from hospitals. Denoising techniques, such as discrete wavelet transform and variational mode decomposition, are applied to enhance classifier performance. The system combines cepstral features, such as Mel-frequency cepstrum coefficients and gammatone frequency cepstral coefficients, for classification. Four machine learning classifiers, namely the decision tree, k-nearest neighbor, linear discriminant analysis, and random forest, are compared. Evaluation metrics such as accuracy, recall, specificity, and f1 score are employed. This study includes patients affected by chronic obstructive pulmonary disease, asthma, bronchiectasis, and healthy individuals. The results demonstrate that the random forest classifier outperforms the others, achieving an accuracy of 99.72% along with 100% recall, specificity, and f1 scores. The study suggests that the computer-based system serves as a decision-making tool for classifying pulmonary diseases, especially in resource-limited settings

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease

    Get PDF
    Vocal performance degradation is a common symptom for the vast majority of Parkinson's disease (PD) subjects, who typically follow personalized one-to-one periodic rehabilitation meetings with speech experts over a long-term period. Recently, a novel computer program called Lee Silverman voice treatment (LSVT) Companion was developed to allow PD subjects to independently progress through a rehabilitative treatment session. This study is part of the assessment of the LSVT Companion, aiming to investigate the potential of using sustained vowel phonations towards objectively and automatically replicating the speech experts' assessments of PD subjects' voices as “acceptable” (a clinician would allow persisting during in-person rehabilitation treatment) or “unacceptable” (a clinician would not allow persisting during in-person rehabilitation treatment). We characterize each of the 156 sustained vowel /a/ phonations with 309 dysphonia measures, select a parsimonious subset using a robust feature selection algorithm, and automatically distinguish the two cohorts (acceptable versus unacceptable) with about 90% overall accuracy. Moreover, we illustrate the potential of the proposed methodology as a probabilistic decision support tool to speech experts to assess a phonation as “acceptable” or “unacceptable.” We envisage the findings of this study being a first step towards improving the effectiveness of an automated rehabilitative speech assessment tool

    Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease

    Get PDF
    There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD

    Introducing non-linear analysis into sustained speech characterization to improve sleep apnea detection

    Get PDF
    We present a novel approach for detecting severe obstructive sleep apnea (OSA) cases by introducing non-linear analysis into sustained speech characterization. The proposed scheme was designed for providing additional information into our baseline system, built on top of state-of-the-art cepstral domain modeling techniques, aiming to improve accuracy rates. This new information is lightly correlated with our previous MFCC modeling of sustained speech and uncorrelated with the information in our continuous speech modeling scheme. Tests have been performed to evaluate the improvement for our detection task, based on sustained speech as well as combined with a continuous speech classifier, resulting in a 10% relative reduction in classification for the first and a 33% relative reduction for the fused scheme. Results encourage us to consider the existence of non-linear effects on OSA patients' voices, and to think about tools which could be used to improve short-time analysis
    corecore