32 research outputs found
A software system for pathological voice acoustic analysis
International audienceA software system for pathological voice analysis using only the resources of a personal computer with a sound card is proposed. The system is written on the basis of specific methods and algorithms for pathological voice analysis and allows evaluation of: 1) Pitch period (To); 2) Degree of unvoiceness; 3) Pitch perturbation and amplitude perturbation quotients; 4) Dissimilarity of surfaces of the pitch pulses; 5) Ratio aperiodic/periodic components in cepstra; 6) Ratio {energy in the cepstral pitch pulse}-to-{total cepstral energy}; 7) Harmonics-to-noise ratio; 8) Degree of hoarseness; 9) Ratio low-to-high frequency energies; 10) Glottal Closing Quotient. The voices of 400 persons were analyzed - 100 (50 females/50 males) normal speakers and 300 (100 females/200 males) patients. The statistical analysis shows very significant changes in PPQ, DH, DPP, DUV, APR, HNR and PECM, and significant changes in APQ and CQ
A Parametric Approach for Classification of Distortions in Pathological Voices
In biomedical acoustics, distortion in voice signals, commonly present during acquisition and transmission, adversely affects acoustic features extracted from pathological voice. Information on the type of distortion can help in compensating for its effects. This paper proposes a new approach to detecting four major types of commonly encountered distortion in remote analysis of pathological voice, namely background noise, reverberation, clipping and coding. In this approach, by applying factor analysis to Gaussian mixture model mean supervectors, distortions in variable-duration recordings are modeled by fixed-length, low-dimensional channel vectors. Then, linear discriminant analysis (LDA) is used to remove the remaining nuisance effects in the channel vectors. Finally, two different classifiers, namely support vector machines and probabilistic LDA classify the different types of distortion. Experimental results obtained using Parkinson's voices, as an example of pathological voice, show 11.4% relative improvement in performance over systems which directly use acoustic features for distortion classification
Assessment of severe apnoea through voice analysis, automatic speech, and speaker recognition techniques
The electronic version of this article is the complete one and can be found online at:
http://asp.eurasipjournals.com/content/2009/1/982531This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.The activities described in this paper were funded by the Spanish Ministry of Science and Technology as part of the TEC2006-13170-C02-02 Project
Improved Algorithm for Pathological and Normal Voices Identification
There are a lot of papers on automatic classification between normal and pathological voices, but they have the lack in the degree of severity estimation of the identified voice disorders. Building a model of pathological and normal voices identification, that can also evaluate the degree of severity of the identified voice disorders among students. In the present work, we present an automatic classifier using acoustical measurements on registered sustained vowels /a/ and pattern recognition tools based on neural networks. The training set was done by classifying students’ recorded voices based on threshold from the literature. We retrieve the pitch, jitter, shimmer and harmonic-to-noise ratio values of the speech utterance /a/, which constitute the input vector of the neural network. The degree of severity is estimated to evaluate how the parameters are far from the standard values based on the percent of normal and pathological values. In this work, the base data used for testing the proposed algorithm of the neural network is formed by healthy and pathological voices from German database of voice disorders. The performance of the proposed algorithm is evaluated in a term of the accuracy (97.9%), sensitivity (1.6%), and specificity (95.1%). The classification rate is 90% for normal class and 95% for pathological class
Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato
On average the lack of biological markers causes a one year diagnostic delay
to detect amyotrophic lateral sclerosis (ALS). To improve the diagnostic
process an automatic voice assessment based on acoustic analysis can be used.
The purpose of this work was to verify the sutability of the sustain vowel
phonation test for automatic detection of patients with ALS. We proposed
enhanced procedure for separation of voice signal into fundamental periods that
requires for calculation of perturbation measurements (such as jitter and
shimmer). Also we proposed method for quantitative assessment of pathological
vibrato manifestations in sustain vowel phonation. The study's experiments show
that using the proposed acoustic analysis methods, the classifier based on
linear discriminant analysis attains 90.7\% accuracy with 86.7\% sensitivity
and 92.2\% specificity.Comment: Proc. of International Conference Signal Processing Algorithms,
Architectures, Arrangements, and Applications (SPA 2019
Cross-lingual dysphonic speech detection using pretrained speaker embeddings
In this study, cross-lingual binary classification and severity estimation of dysphonic speech have been carried out. Hand-crafted acoustic feature extraction is replaced by the speaker embedding techniques used in the speaker verification. Two state of art deep learning methods for speaker verification have been used: the X-vector and ECAPA-TDNN. Embeddings are extracted from speech samples in Hungarian and Dutch languages and used to train Support Vector Machine (SVM) and Support Vector Regressor (SVR) for binary classification and severity estimation, in a cross-language manner. Our results were competitive with manual feature engineering, when the models were trained on Hungarian samples and evaluated on Dutch samples in the binary classification of dysphonic speech and outperformed in estimating the severity level of dysphonic speech. Moreover, our model achieved 0.769 and 0.771 in Spearman and Pearson correlations. Also, our results in both classification and regression were superior compared to manual feature extraction technique when models were trained on Dutch samples and evaluated on Hungarian samples with only a limited number of samples are available for training. An accuracy of 86.8% was reached with features extracted from embedding methods, while the maximum accuracy using hand-crafted acoustic features was 66.8%. Overall results show that Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) performs better than the former X-vector in both tasks
Speech outcome in tongue cancer surgery: objective evaluation by acoustic analysis software
BACKGROUND. Cancer of the oral cavity is one of the most common malignancies of which 60% affect the tongue.
Carcinoma of the tongue causes significant alterations of the articulatory and swallowing functions. The gold standard of
care remains primary surgical resection with or without postoperative adjuvant therapy. Whereas T1 and T2 tongue tumors
can be treated with more conservative surgeries, as partial glossectomies, the larger tumors require total and aggressive
glossectomies which increase survival, but, on the other hand, they might often make speech, chewing and swallowing
impossible.
MATERIAL AND METHODS. Our study was performed on a total of 21 patients with Squamous Cell Carcinoma of the
tongue who underwent either partial resection or hemiglossectomy. Each subject (either surgical patients or controls)
was asked to pronounce the vowels /a/, /e/, /i/, /u/, and all signals were evaluated separately by two operators. Acoustic
(F0, jitter, shimmer, NHR) and vowel metric (the ratio F2i/F2u, tVSA, qVSA, FCR) features have been extracted. In
order to define the speech intelligibility, all patients were evaluated by two doctors and one speech therapist and all patients
received the Speech Handicap Index (SHI) translated into Italian language before recording.
RESULTS. No statistically significant variations were observed, regardless of the gender, between controls and surgically
resected patients when tumor staging was T1-T2. On the contrary, when patients had to undergo more extensive surgical
resection due to the presence of a T3-T4 tumor, a dramatic increase of F2u could be observed. This change, together with
a decrease of F2i, led to a highly significant reduction in the F2i/F2u parameter in surgically resected patients as compared
to controls. The other parameters which were reduced in a statistically significant manner in T3-T4 surgically resected
patients were tVSA and qVSA. Instead, two parameters increased in a statistically significant manner in T3-T4
surgically resected patients: FCR and SHI. Again, none of the above-mentioned parameters was altered in a statistically
significant manner in early tumor stage resected patients, regar dless of the gender.
CONCLUSION. For the first time, we used a series of newly developed formant parameters, introduced by various authors
for the study of the articulatory undershoot of the tongue in various neurodegenerative diseases. The statistical analysis
of our results highlighted in an incontrovertible way a strong correlation and significance of each of our parameters F2
/ i / / F2 / u /, FCR, tVSA, qVSA, with the entity of the TNM, and therefore of the surgical extension of the resection,
and in parallel with the loss of the intelligibility of the speech that proportionally reaches higher values in the advanced
stages of the disease as can be deduced from the SHI trend