658 research outputs found

    Cross-linguistic study of vocal pathology: perceptual features of spasmodic dysphonia in French-speaking subjects

    Get PDF
    Clinical characterisation of Spasmodic Dysphonia of the adductor type (SD) in French speakers by Klap and colleagues (1993) appears to differ from that of SD in English. This perceptual analysis aims to describe the phonetic features of French SD. A video of 6 French speakers with SD supplied by Klap and colleagues was analysed for frequency of phonatory breaks, pitch breaks, harshness, creak, breathiness and falsetto voice, rate of production, and quantity of speech output. In contrast to English SD, the French speaking SD patients demonstrated no evidence pitch breaks, but phonatory breaks, harshness and breathiness were prominent features. This verifies the French authors’ (1993) clinical description. These findings suggest that phonetic properties of a specific language may affect the manifestation of pathology in neurogenic voice disorders

    Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease

    Get PDF
    Vocal performance degradation is a common symptom for the vast majority of Parkinson's disease (PD) subjects, who typically follow personalized one-to-one periodic rehabilitation meetings with speech experts over a long-term period. Recently, a novel computer program called Lee Silverman voice treatment (LSVT) Companion was developed to allow PD subjects to independently progress through a rehabilitative treatment session. This study is part of the assessment of the LSVT Companion, aiming to investigate the potential of using sustained vowel phonations towards objectively and automatically replicating the speech experts' assessments of PD subjects' voices as “acceptable” (a clinician would allow persisting during in-person rehabilitation treatment) or “unacceptable” (a clinician would not allow persisting during in-person rehabilitation treatment). We characterize each of the 156 sustained vowel /a/ phonations with 309 dysphonia measures, select a parsimonious subset using a robust feature selection algorithm, and automatically distinguish the two cohorts (acceptable versus unacceptable) with about 90% overall accuracy. Moreover, we illustrate the potential of the proposed methodology as a probabilistic decision support tool to speech experts to assess a phonation as “acceptable” or “unacceptable.” We envisage the findings of this study being a first step towards improving the effectiveness of an automated rehabilitative speech assessment tool

    Voice and speech functions (B310-B340)

    Get PDF
    The International Classification of Functioning, Disability and Health for Children and Youth (ICF-CY) domain ‘voice and speech functions’ (b3) includes production and quality of voice (b310), articulation functions (b320), fluency and rhythm of speech (b330) and alternative vocalizations (b340, such as making musical sounds and crying, which are not reviewed here)

    Characterization of the Pathological Voices (Dysphonia) in the frequency space

    No full text
    International audienceThis paper is related to the dysphonic voice assessment. It aims at studying the characteristic of dysphonia on the frequency domain. In this context, a GMM based automatic classication system is coupled to a frequency subband architecture in order to investigate which frequency bands are relevant for dysphonia characterization. Through various experiments, the low frequencies [0- 3000] Hz tend to be more interesting for dysphonia discrimination compared with higher frequencies

    Effect of voice training and voice therapy : content and dosage

    Get PDF

    Master of Science

    Get PDF
    thesisThis study investigated the relationship between a cepstral/spectral index of dysphonia severity (i.e., the CSID) and listener severity ratings of disordered voices. To assess the value of the CSID as a potential objective treatment outcomes tool, pre- and posttreatment samples of continuous speech and sustained vowel /a/ productions were elicited from 112 patients (with varying degrees of dysphonia) from six diagnostic categories: (1) unilateral vocal fold paralysis (UVFP), (2) adductor spasmodic dysphonia (ADSD), (3) primary muscle tension dysphonia (PMTD), (4) benign vocal fold lesions (BVFL), (5) presbylaryngis, and (6) mutational falsetto. Perceptual ratings of dysphonia severity in continuous speech were compared to acoustically-derived severity estimates using a three factor CSID model consisting of the cepstral peak prominence (CPP), the ratio of low-to-high spectral energy, and its standard deviation. A five factor CSID model incorporating all acoustic variables as well as gender and the CPP standard deviation was used to estimate severity in sustained vowel samples. Results showed strong relationships between perceptual and acoustic estimates in dysphonia severity in connected speech (r = 0.72, p < 0.0001) and sustained vowels (r = 0.836, p < 0.0001). A strong relationship between the perceived and predicted change in dysphonia severity from pre- to posttreatment was also observed for connected speech (r = 0.77, p < 0.001) and sustained vowels (r = 0.81, p < 0.0001). Spectrum effects were also examined, and overall severity (mild, moderate, or severe) did not influence the relationship between perceived and estimated severity ratings in connected speech (F[1, 2] = 0.58, p = 0.56); however, dysphonia severity did influence the relationship in sustained vowels (F[1, 2] = 6.22, p = 0.002). In general, the results confirm a robust relationship between listener perceived and acoustically-derived estimates of severity within the contexts of connected speech and sustained vowels across diverse diagnostic categories and varying degrees of dysphonia severity. As such, the CSID shows considerable promise an objective treatment outcomes measure

    Voice pathologies : the most comum features and classification tools

    Get PDF
    Speech pathologies are quite common in society, however the exams that exist are invasive, making them uncomfortable for patients and depending on the experience of the clinician who performs the assessment. Hence the need to develop non-invasive methods, which allow objective and efficient analysis. Taking this need into account in this work, the most promising list of features and classifiers was identified. As features, jitter, shimmer, HNR, LPC, PLP, and MFCC were identified and as classifiers CNN, RNN and LSTM. This study intends to develop a device to support medical decision, however this article already presents the system interface.info:eu-repo/semantics/publishedVersio

    Cross-lingual dysphonic speech detection using pretrained speaker embeddings

    Get PDF
    In this study, cross-lingual binary classification and severity estimation of dysphonic speech have been carried out. Hand-crafted acoustic feature extraction is replaced by the speaker embedding techniques used in the speaker verification. Two state of art deep learning methods for speaker verification have been used: the X-vector and ECAPA-TDNN. Embeddings are extracted from speech samples in Hungarian and Dutch languages and used to train Support Vector Machine (SVM) and Support Vector Regressor (SVR) for binary classification and severity estimation, in a cross-language manner. Our results were competitive with manual feature engineering, when the models were trained on Hungarian samples and evaluated on Dutch samples in the binary classification of dysphonic speech and outperformed in estimating the severity level of dysphonic speech. Moreover, our model achieved 0.769 and 0.771 in Spearman and Pearson correlations. Also, our results in both classification and regression were superior compared to manual feature extraction technique when models were trained on Dutch samples and evaluated on Hungarian samples with only a limited number of samples are available for training. An accuracy of 86.8% was reached with features extracted from embedding methods, while the maximum accuracy using hand-crafted acoustic features was 66.8%. Overall results show that Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) performs better than the former X-vector in both tasks

    Cepstral peak prominence: a comprehensive analysis

    Full text link
    An analytical study of cepstral peak prominence (CPP) is presented, intended to provide an insight into its meaning and relation with voice perturbation parameters. To carry out this analysis, a parametric approach is adopted in which voice production is modelled using the traditional source-filter model and the first cepstral peak is assumed to have Gaussian shape. It is concluded that the meaning of CPP is very similar to that of the first rahmonic and some insights are provided on its dependence with fundamental frequency and vocal tract resonances. It is further shown that CPP integrates measures of voice waveform and periodicity perturbations, be them either amplitude, frequency or noise
    • 

    corecore