Automatic Rating of Hoarseness by Text-based Cepstral and Prosodic Evaluation

A. Batliner; A. Smola; B. Halberstam; C. Moers; I. Witten; J. Hillenbrand; J. Kreiman; M. Hirano; S. Awan; T. Haderlein; T. Nawka; V. Parsa; V. Wolfe; Y. Heman-Ackah; Y. Maryn

Automatic Rating of Hoarseness by Text-based Cepstral and Prosodic Evaluation

Authors: A. Batliner
A. Smola
B. Halberstam
C. Moers
I. Witten
J. Hillenbrand
J. Kreiman
M. Hirano
S. Awan
T. Haderlein
T. Nawka
V. Parsa
V. Wolfe
Y. Heman-Ackah
Y. Maryn
Publication date: 1 January 2012
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

The standard for the analysis of distorted voices is perceptual rating of read-out texts or spontaneous speech. Automatic voice evaluation, however, is usually done on stable sections of sustained vowels. In this paper, text-based and established vowel-based analysis are compared with respect to their ability to measure hoarseness and its subclasses. 73 hoarse patients (48.3±16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Five speech therapists and physicians rated roughness, breathiness, and hoarseness according to the German RBH evaluation scheme. The best human-machine correlations were obtained for measures based on the Cepstral Peak Prominence (CPP; up to |r | = 0.73). Support Vector Regression (SVR) on CPP-based measures and prosodic features improved the results further to r ≈0.8 and confirmed that automatic voice evaluation should be performed on a text recording

Similar works

Full text

Available Versions

MPG.PuRe

oai:pure.mpg.de:item_1539243

Last time updated on 15/06/2019

Crossref

info:doi/10.1007%2F978-3-642-3...

Last time updated on 01/04/2019