10,082 research outputs found

    Reliability of perceptions of voice quality: evidence from a problem asthma clinic population

    Get PDF
    <p>Introduction: Methods of perceptual voice evaluation have yet to achieve satisfactory consistency; complete acceptance of a recognised clinical protocol is still some way off.</p> <p>Materials and methods: Three speech and language therapists rated the voices of 43 patients attending the problem asthma clinic of a teaching hospital, according to the grade-roughness-breathiness-asthenicity-strain (GRBAS) scale and other perceptual categories.</p> <p>Results and analysis: Use of the GRBAS scale achieved only a 64.7 per cent inter-rater reliability and a 69.6 per cent intra-rater reliability for the grade component. One rater achieved a higher degree of consistency. Improved concordance on the GRBAS scale was observed for subjects with laryngeal abnormalities. Raters failed to reach any useful level of agreement in the other categories employed, except for perceived gender.</p> <p>Discussion: These results should sound a note of caution regarding routine adoption of the GRBAS scale for characterising voice quality for clinical purposes. The importance of training and the use of perceptual anchors for reliable perceptual rating need to be further investigated.</p&gt

    Voice and speech functions (B310-B340)

    Get PDF
    The International Classification of Functioning, Disability and Health for Children and Youth (ICF-CY) domain ‘voice and speech functions’ (b3) includes production and quality of voice (b310), articulation functions (b320), fluency and rhythm of speech (b330) and alternative vocalizations (b340, such as making musical sounds and crying, which are not reviewed here)

    Aspects of voice irregularity measurement in connected speech

    Get PDF
    Applications of the use of connected speech material for the objective assessment of two primary physical aspects of voice quality are described and discussed. Simple auditory perceptual criteria are employed to guide the choice of analysis parameters for the physical correlate of pitch, and their utility is investigated by the measurement of the characteristics of particular examples of the normal-speaking voice. This approach is extended to the measurement of vocal fold contact phase control in connected speech and both techniques are applied to pathological voice data

    Emotion Recognition from Acted and Spontaneous Speech

    Get PDF
    DizertačnĂ­ prĂĄce se zabĂœvĂĄ rozpoznĂĄnĂ­m emočnĂ­ho stavu mluvčích z ƙečovĂ©ho signĂĄlu. PrĂĄce je rozdělena do dvou hlavnĂ­ch častĂ­, prvnĂ­ část popisuju navrĆŸenĂ© metody pro rozpoznĂĄnĂ­ emočnĂ­ho stavu z hranĂœch databĂĄzĂ­. V rĂĄmci tĂ©to části jsou pƙedstaveny vĂœsledky rozpoznĂĄnĂ­ pouĆŸitĂ­m dvou rĆŻznĂœch databĂĄzĂ­ s rĆŻznĂœmi jazyky. HlavnĂ­mi pƙínosy tĂ©to části je detailnĂ­ analĂœza rozsĂĄhlĂ© ĆĄkĂĄly rĆŻznĂœch pƙíznakĆŻ zĂ­skanĂœch z ƙečovĂ©ho signĂĄlu, nĂĄvrh novĂœch klasifikačnĂ­ch architektur jako je napƙíklad „emočnĂ­ pĂĄrovĂĄní“ a nĂĄvrh novĂ© metody pro mapovĂĄnĂ­ diskrĂ©tnĂ­ch emočnĂ­ch stavĆŻ do dvou dimenzionĂĄlnĂ­ho prostoru. DruhĂĄ část se zabĂœvĂĄ rozpoznĂĄnĂ­m emočnĂ­ch stavĆŻ z databĂĄze spontĂĄnnĂ­ ƙeči, kterĂĄ byla zĂ­skĂĄna ze zĂĄznamĆŻ hovorĆŻ z reĂĄlnĂœch call center. Poznatky z analĂœzy a nĂĄvrhu metod rozpoznĂĄnĂ­ z hranĂ© ƙeči byly vyuĆŸity pro nĂĄvrh novĂ©ho systĂ©mu pro rozpoznĂĄnĂ­ sedmi spontĂĄnnĂ­ch emočnĂ­ch stavĆŻ. JĂĄdrem navrĆŸenĂ©ho pƙístupu je komplexnĂ­ klasifikačnĂ­ architektura zaloĆŸena na fĂșzi rĆŻznĂœch systĂ©mĆŻ. PrĂĄce se dĂĄle zabĂœvĂĄ vlivem emočnĂ­ho stavu mluvčího na Ășspěơnosti rozpoznĂĄnĂ­ pohlavĂ­ a nĂĄvrhem systĂ©mu pro automatickou detekci ĂșspěơnĂœch hovorĆŻ v call centrech na zĂĄkladě analĂœzy parametrĆŻ dialogu mezi ĂșčastnĂ­ky telefonnĂ­ch hovorĆŻ.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.

    Acoustic measurement of overall voice quality in sustained vowels and continuous speech

    Get PDF
    Measurement of dysphonia severity involves auditory-perceptual evaluations and acoustic analyses of sound waves. Meta-analysis of proportional associations between these two methods showed that many popular perturbation metrics and noise-to-harmonics and others ratios do not yield reasonable results. However, this meta-analysis demonstrated that the validity of specific autocorrelation- and cepstrum-based measures was much more convincing, and appointed ‘smoothed cepstral peak prominence’ as the most promising metric of dysphonia severity. Original research confirmed this inferiority of perturbation measures and superiority of cepstral indices in dysphonia measurement of laryngeal-vocal and tracheoesophageal voice samples. However, to be truly representative for daily voice use patterns, measurement of overall voice quality is ideally founded on the analysis of sustained vowels ánd continuous speech. A customized method for including both sample types and calculating the multivariate Acoustic Voice Quality Index (i.e., AVQI), was constructed for this purpose. Original study of the AVQI revealed acceptable results in terms of initial concurrent validity, diagnostic precision, internal and external cross-validity and responsiveness to change. It thus was concluded that the AVQI can track changes in dysphonia severity across the voice therapy process. There are many freely and commercially available computer programs and systems for acoustic metrics of dysphonia severity. We investigated agreements and differences between two commonly available programs (i.e., Praat and Multi-Dimensional Voice Program) and systems. The results indicated that clinicians better not compare frequency perturbation data across systems and programs and amplitude perturbation data across systems. Finally, acoustic information can also be utilized as a biofeedback modality during voice exercises. Based on a systematic literature review, it was cautiously concluded that acoustic biofeedback can be a valuable tool in the treatment of phonatory disorders. When applied with caution, acoustic algorithms (particularly cepstrum-based measures and AVQI) have merited a special role in assessment and/or treatment of dysphonia severity

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

    Cepstral analysis of hypokinetic and ataxic voices : correlations with perceptual and other acoustic measures

    Get PDF
    To investigate the validity of cepstral analyses against other conventional acoustic measures of voice quality in determining the perceptual impression in different motor speech disorders—hypokinetic and ataxic dysarthria, and speech tasks—prolonged vowels and connected speech. Prolonged vowel productions and connected speech samples (reading passages and monologues) from 43 participants with Parkinson disease and 10 speakers with ataxia were analyzed perceptually by a trained listener using GRBAS. In addition, acoustic measures of cepstral peak prominence (CPP), smoothed CPP (CPPs), harmonics-to-noise ratio (HNR), shimmer %, shimmer dB, amplitude perturbation quotient (APQ), relative average perturbation (RAP), jitter, and pitch perturbation quotient (PPQ) were performed. Statistical analysis involved correlations between perceptual and acoustic measures, as well as determination of differences across speaker groups and elicitation tasks. CPP and CPPs results showed greater levels of correlation with overall dysphonia, breathiness, and asthenia ratings than the other acoustic measures, except in the case of roughness. Sustained vowel production produced a higher number of significant correlations across all parameters other than connected speech, but task choice did not affect CPP and CPPs results. There were no significant differences in any parameters across the two speaker groups. The results of this study are consistent with the results of other studies investigating the same measures in speakers with nonmotor-related voice pathologies. In addition, there was an indication that they performed better in relation to asthenia, which might be particularly relevant for the current speaker group. The results support the clinical and research use of CPP and CPPs as a quantitative measure of voice quality in populations with motor speech disorder

    Development and validation of a comprehensive assessment questionnaire for Cantonese alaryngeal speakers' speech performance

    Get PDF
    The study devised and validated the perceptual assessment questionnaire for evaluating the speech performance of Cantonese alaryngeal speakers. Forty-eight male alaryngeal speakers participated in the study: 10 electrolaryngeal, 10 esophageal, 9 tracheoesophageal, 9 pneumatic artificial and 10 normal laryngeal speakers. Five speech therapists also participated in the perceptual rating procedures. Results indicated moderate to strong inter-rater reliability in all parameters that involve only auditory judgment except that of rating electrolarynx noise. Assessment parameters that require both auditory and visual judgment might require further modification. For tone perception, moderate to strong inter-rater reliability was also noted. High intra-rater reliability of the assessment questionnaire was also found. In addition, the parameters adopted were reported to have significant correlation with the acoustic correlates except that for pitch rating. The assessment questionnaire suggested appeared to be valid for evaluating auditory dependent speech characteristics of the four types of alaryngeal speech.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science
    • 

    corecore