Search CORE

10,082 research outputs found

Reliability of perceptions of voice quality: evidence from a problem asthma clinic population

Author: A E Stanton
A McConnachie
Awan
Barnes
C E Bucknall
C P Dunnet
C Sellars
Carding
De Bodt
Dejonckere
Dejonckere
Fairbanks
Hirano
K MacKenzie
Kreiman
L M Chapman
Leslie
Stanton
Stanton
Stanton
Williamson
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2009
Field of study

Introduction: Methods of perceptual voice evaluation have yet to achieve satisfactory consistency; complete acceptance of a recognised clinical protocol is still some way off. Materials and methods: Three speech and language therapists rated the voices of 43 patients attending the problem asthma clinic of a teaching hospital, according to the grade-roughness-breathiness-asthenicity-strain (GRBAS) scale and other perceptual categories. Results and analysis: Use of the GRBAS scale achieved only a 64.7 per cent inter-rater reliability and a 69.6 per cent intra-rater reliability for the grade component. One rater achieved a higher degree of consistency. Improved concordance on the GRBAS scale was observed for subjects with laryngeal abnormalities. Raters failed to reach any useful level of agreement in the other categories employed, except for perceived gender. Discussion: These results should sound a note of caution regarding routine adoption of the GRBAS scale for characterising voice quality for clinical purposes. The importance of training and the use of perceptual anchors for reliable perceptual rating need to be further investigated.</p&gt

Crossref

Enlighten

Voice and speech functions (B310-B340)

Author: McCartney Elspeth
Publication venue: Mac Keith Press
Publication date: 01/01/2012
Field of study

The International Classification of Functioning, Disability and Health for Children and Youth (ICF-CY) domain ‘voice and speech functions’ (b3) includes production and quality of voice (b310), articulation functions (b320), fluency and rhythm of speech (b330) and alternative vocalizations (b340, such as making musical sounds and crying, which are not reviewed here)

University of Strathclyde Institutional Repository

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Aspects of voice irregularity measurement in connected speech

Author: Fourcin A.
Publication venue
Publication date: 01/01/2009
Field of study

Applications of the use of connected speech material for the objective assessment of two primary physical aspects of voice quality are described and discussed. Simple auditory perceptual criteria are employed to guide the choice of analysis parameters for the physical correlate of pitch, and their utility is investigated by the measurement of the characteristics of particular examples of the normal-speaking voice. This approach is extended to the measurement of vocal fold contact phase control in connected speech and both techniques are applied to pathological voice data

UCL Discovery

Emotion Recognition from Acted and Spontaneous Speech

Author: Atassi Hicham
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2014
Field of study

Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.

Digital library of Brno University of Technology

National Repository of Grey Literature

Acoustic measurement of overall voice quality in sustained vowels and continuous speech

Author: Maryn Youri
Publication venue: Ghent University. Faculty of Medicine and Health Sciences
Publication date: 01/01/2010
Field of study

Measurement of dysphonia severity involves auditory-perceptual evaluations and acoustic analyses of sound waves. Meta-analysis of proportional associations between these two methods showed that many popular perturbation metrics and noise-to-harmonics and others ratios do not yield reasonable results. However, this meta-analysis demonstrated that the validity of specific autocorrelation- and cepstrum-based measures was much more convincing, and appointed ‘smoothed cepstral peak prominence’ as the most promising metric of dysphonia severity. Original research confirmed this inferiority of perturbation measures and superiority of cepstral indices in dysphonia measurement of laryngeal-vocal and tracheoesophageal voice samples. However, to be truly representative for daily voice use patterns, measurement of overall voice quality is ideally founded on the analysis of sustained vowels ánd continuous speech. A customized method for including both sample types and calculating the multivariate Acoustic Voice Quality Index (i.e., AVQI), was constructed for this purpose. Original study of the AVQI revealed acceptable results in terms of initial concurrent validity, diagnostic precision, internal and external cross-validity and responsiveness to change. It thus was concluded that the AVQI can track changes in dysphonia severity across the voice therapy process. There are many freely and commercially available computer programs and systems for acoustic metrics of dysphonia severity. We investigated agreements and differences between two commonly available programs (i.e., Praat and Multi-Dimensional Voice Program) and systems. The results indicated that clinicians better not compare frequency perturbation data across systems and programs and amplitude perturbation data across systems. Finally, acoustic information can also be utilized as a biofeedback modality during voice exercises. Based on a systematic literature review, it was cautiously concluded that acoustic biofeedback can be a valuable tool in the treatment of phonatory disorders. When applied with caution, acoustic algorithms (particularly cepstrum-based measures and AVQI) have merited a special role in assessment and/or treatment of dysphonia severity

Ghent University Academic Bibliography

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

Author: Bohac Marek
Koldovsky Zbynek
Malek Jiri
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 11/12/2019
Field of study

This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

arXiv.org e-Print Archive

DSpace@TUL

Cepstral analysis of hypokinetic and ataxic voices : correlations with perceptual and other acoustic measures

Author: Anja Lowit
Aronson
Awan
Awan
Awan
Balasubramanium
Bele
Bielamowicz
Buder
Carrillo
Cooper
de Krom
Deal
Dejonckere
Dejonckere
Eadie
Eadie
Eadie
Gerratt
Gilman
Grabe
Halberstam
Heman-Ackah
Heman-Ackah
Herbst
Hertrich
Hillenbrand
Hillenbrand
Hillenbrand
Hirano
Hoehn
Holmes
Jimenez-Jimenez
Karnell
Kempster
Kent
Kent
Klingholz
Kreiman
Kreiman
Logemann
Lowit
Lowit
Lowit
Ma
Maryn
Maryn
Maryn
Maryn
Midi
Moers
Nemr
Nichols
Oates
Orlikoff
Parsa
Peterson
Qi
Ramig
Santos
Smith
Stephen Jannetts
Titze
Wolfe
Wolfe
Wolfe
Wolfe
Wuyts
Wuyts
Yu
Yumoto
Yücetürk
Zraick
Publication venue: 'Elsevier BV'
Publication date: 01/11/2014
Field of study

To investigate the validity of cepstral analyses against other conventional acoustic measures of voice quality in determining the perceptual impression in different motor speech disorders—hypokinetic and ataxic dysarthria, and speech tasks—prolonged vowels and connected speech. Prolonged vowel productions and connected speech samples (reading passages and monologues) from 43 participants with Parkinson disease and 10 speakers with ataxia were analyzed perceptually by a trained listener using GRBAS. In addition, acoustic measures of cepstral peak prominence (CPP), smoothed CPP (CPPs), harmonics-to-noise ratio (HNR), shimmer %, shimmer dB, amplitude perturbation quotient (APQ), relative average perturbation (RAP), jitter, and pitch perturbation quotient (PPQ) were performed. Statistical analysis involved correlations between perceptual and acoustic measures, as well as determination of differences across speaker groups and elicitation tasks. CPP and CPPs results showed greater levels of correlation with overall dysphonia, breathiness, and asthenia ratings than the other acoustic measures, except in the case of roughness. Sustained vowel production produced a higher number of significant correlations across all parameters other than connected speech, but task choice did not affect CPP and CPPs results. There were no significant differences in any parameters across the two speaker groups. The results of this study are consistent with the results of other studies investigating the same measures in speakers with nonmotor-related voice pathologies. In addition, there was an indication that they performed better in relation to asthenia, which might be particularly relevant for the current speaker group. The results support the clinical and research use of CPP and CPPs as a quantitative measure of voice quality in populations with motor speech disorder

Crossref

University of Strathclyde Institutional Repository

The acoustic voice quality index version 02.02 in the Finnish-speaking population

Author: Asikainen Marja
Barsties V. Latoszek Ben
Ilomäki Irma
Kankare Elina
Laukkanen Anne-Maria
Maryn Youri
Rantala Leena
Rorarius Eija
Tyrmi Jaana
Vilpas Sarkku
Publication venue: 'Informa UK Limited'
Publication date: 04/12/2019
Field of study

Institutional Repository Universiteit Antwerpen

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Development and validation of a comprehensive assessment questionnaire for Cantonese alaryngeal speakers' speech performance

Author: Yip Chui-yan
葉翠茵
Publication venue: The University of Hong Kong (Pokfulam, Hong Kong)
Publication date: 01/01/2011
Field of study

The study devised and validated the perceptual assessment questionnaire for evaluating the speech performance of Cantonese alaryngeal speakers. Forty-eight male alaryngeal speakers participated in the study: 10 electrolaryngeal, 10 esophageal, 9 tracheoesophageal, 9 pneumatic artificial and 10 normal laryngeal speakers. Five speech therapists also participated in the perceptual rating procedures. Results indicated moderate to strong inter-rater reliability in all parameters that involve only auditory judgment except that of rating electrolarynx noise. Assessment parameters that require both auditory and visual judgment might require further modification. For tone perception, moderate to strong inter-rater reliability was also noted. High intra-rater reliability of the assessment questionnaire was also found. In addition, the parameters adopted were reported to have significant correlation with the acoustic correlates except that for pitch rating. The assessment questionnaire suggested appeared to be valid for evaluating auditory dependent speech characteristics of the four types of alaryngeal speech.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

HKU Scholars Hub