32 research outputs found

    A software system for pathological voice acoustic analysis

    Get PDF
    International audienceA software system for pathological voice analysis using only the resources of a personal computer with a sound card is proposed. The system is written on the basis of specific methods and algorithms for pathological voice analysis and allows evaluation of: 1) Pitch period (To); 2) Degree of unvoiceness; 3) Pitch perturbation and amplitude perturbation quotients; 4) Dissimilarity of surfaces of the pitch pulses; 5) Ratio aperiodic/periodic components in cepstra; 6) Ratio {energy in the cepstral pitch pulse}-to-{total cepstral energy}; 7) Harmonics-to-noise ratio; 8) Degree of hoarseness; 9) Ratio low-to-high frequency energies; 10) Glottal Closing Quotient. The voices of 400 persons were analyzed - 100 (50 females/50 males) normal speakers and 300 (100 females/200 males) patients. The statistical analysis shows very significant changes in PPQ, DH, DPP, DUV, APR, HNR and PECM, and significant changes in APQ and CQ

    A Parametric Approach for Classification of Distortions in Pathological Voices

    Get PDF
    In biomedical acoustics, distortion in voice signals, commonly present during acquisition and transmission, adversely affects acoustic features extracted from pathological voice. Information on the type of distortion can help in compensating for its effects. This paper proposes a new approach to detecting four major types of commonly encountered distortion in remote analysis of pathological voice, namely background noise, reverberation, clipping and coding. In this approach, by applying factor analysis to Gaussian mixture model mean supervectors, distortions in variable-duration recordings are modeled by fixed-length, low-dimensional channel vectors. Then, linear discriminant analysis (LDA) is used to remove the remaining nuisance effects in the channel vectors. Finally, two different classifiers, namely support vector machines and probabilistic LDA classify the different types of distortion. Experimental results obtained using Parkinson's voices, as an example of pathological voice, show 11.4% relative improvement in performance over systems which directly use acoustic features for distortion classification

    Assessment of severe apnoea through voice analysis, automatic speech, and speaker recognition techniques

    Full text link
    The electronic version of this article is the complete one and can be found online at: http://asp.eurasipjournals.com/content/2009/1/982531This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.The activities described in this paper were funded by the Spanish Ministry of Science and Technology as part of the TEC2006-13170-C02-02 Project

    Improved Algorithm for Pathological and Normal Voices Identification

    Get PDF
    There are a lot of papers on automatic classification between normal and pathological voices, but they have the lack in the degree of severity estimation of the identified voice disorders. Building a model of pathological and normal voices identification, that can also evaluate the degree of severity of the identified voice disorders among students. In the present work, we present an automatic classifier using acoustical measurements on registered sustained vowels /a/ and pattern recognition tools based on neural networks. The training set was done by classifying students’ recorded voices based on threshold from the literature. We retrieve the pitch, jitter, shimmer and harmonic-to-noise ratio values of the speech utterance /a/, which constitute the input vector of the neural network. The degree of severity is estimated to evaluate how the parameters are far from the standard values based on the percent of normal and pathological values. In this work, the base data used for testing the proposed algorithm of the neural network is formed by healthy and pathological voices from German database of voice disorders. The performance of the proposed algorithm is evaluated in a term of the accuracy (97.9%), sensitivity (1.6%), and specificity (95.1%). The classification rate is 90% for normal class and 95% for pathological class

    Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato

    Full text link
    On average the lack of biological markers causes a one year diagnostic delay to detect amyotrophic lateral sclerosis (ALS). To improve the diagnostic process an automatic voice assessment based on acoustic analysis can be used. The purpose of this work was to verify the sutability of the sustain vowel phonation test for automatic detection of patients with ALS. We proposed enhanced procedure for separation of voice signal into fundamental periods that requires for calculation of perturbation measurements (such as jitter and shimmer). Also we proposed method for quantitative assessment of pathological vibrato manifestations in sustain vowel phonation. The study's experiments show that using the proposed acoustic analysis methods, the classifier based on linear discriminant analysis attains 90.7\% accuracy with 86.7\% sensitivity and 92.2\% specificity.Comment: Proc. of International Conference Signal Processing Algorithms, Architectures, Arrangements, and Applications (SPA 2019

    A Study on how Pre-whitening Influences Fundamental Frequency Estimation

    Get PDF

    Cross-lingual dysphonic speech detection using pretrained speaker embeddings

    Get PDF
    In this study, cross-lingual binary classification and severity estimation of dysphonic speech have been carried out. Hand-crafted acoustic feature extraction is replaced by the speaker embedding techniques used in the speaker verification. Two state of art deep learning methods for speaker verification have been used: the X-vector and ECAPA-TDNN. Embeddings are extracted from speech samples in Hungarian and Dutch languages and used to train Support Vector Machine (SVM) and Support Vector Regressor (SVR) for binary classification and severity estimation, in a cross-language manner. Our results were competitive with manual feature engineering, when the models were trained on Hungarian samples and evaluated on Dutch samples in the binary classification of dysphonic speech and outperformed in estimating the severity level of dysphonic speech. Moreover, our model achieved 0.769 and 0.771 in Spearman and Pearson correlations. Also, our results in both classification and regression were superior compared to manual feature extraction technique when models were trained on Dutch samples and evaluated on Hungarian samples with only a limited number of samples are available for training. An accuracy of 86.8% was reached with features extracted from embedding methods, while the maximum accuracy using hand-crafted acoustic features was 66.8%. Overall results show that Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) performs better than the former X-vector in both tasks

    Speech outcome in tongue cancer surgery: objective evaluation by acoustic analysis software

    Get PDF
    BACKGROUND. Cancer of the oral cavity is one of the most common malignancies of which 60% affect the tongue. Carcinoma of the tongue causes significant alterations of the articulatory and swallowing functions. The gold standard of care remains primary surgical resection with or without postoperative adjuvant therapy. Whereas T1 and T2 tongue tumors can be treated with more conservative surgeries, as partial glossectomies, the larger tumors require total and aggressive glossectomies which increase survival, but, on the other hand, they might often make speech, chewing and swallowing impossible. MATERIAL AND METHODS. Our study was performed on a total of 21 patients with Squamous Cell Carcinoma of the tongue who underwent either partial resection or hemiglossectomy. Each subject (either surgical patients or controls) was asked to pronounce the vowels /a/, /e/, /i/, /u/, and all signals were evaluated separately by two operators. Acoustic (F0, jitter, shimmer, NHR) and vowel metric (the ratio F2i/F2u, tVSA, qVSA, FCR) features have been extracted. In order to define the speech intelligibility, all patients were evaluated by two doctors and one speech therapist and all patients received the Speech Handicap Index (SHI) translated into Italian language before recording. RESULTS. No statistically significant variations were observed, regardless of the gender, between controls and surgically resected patients when tumor staging was T1-T2. On the contrary, when patients had to undergo more extensive surgical resection due to the presence of a T3-T4 tumor, a dramatic increase of F2u could be observed. This change, together with a decrease of F2i, led to a highly significant reduction in the F2i/F2u parameter in surgically resected patients as compared to controls. The other parameters which were reduced in a statistically significant manner in T3-T4 surgically resected patients were tVSA and qVSA. Instead, two parameters increased in a statistically significant manner in T3-T4 surgically resected patients: FCR and SHI. Again, none of the above-mentioned parameters was altered in a statistically significant manner in early tumor stage resected patients, regar dless of the gender. CONCLUSION. For the first time, we used a series of newly developed formant parameters, introduced by various authors for the study of the articulatory undershoot of the tongue in various neurodegenerative diseases. The statistical analysis of our results highlighted in an incontrovertible way a strong correlation and significance of each of our parameters F2 / i / / F2 / u /, FCR, tVSA, qVSA, with the entity of the TNM, and therefore of the surgical extension of the resection, and in parallel with the loss of the intelligibility of the speech that proportionally reaches higher values in the advanced stages of the disease as can be deduced from the SHI trend

    A CNN-Based Approach to Identification of Degradations in Speech Signals

    Get PDF
    corecore