Nowadays modern automatic dialogue systems are able to understand complex sentences instead of only a few commands like Stop or No. In a call-center, such a system should be able to determine in a critical phase of the dialogue if the call should be passed over to a human operator. Such a critical phase can be indicated by the customer's vocal expression. Other studies prooved that it is possible to distinguish between anger and neutral speech with prosodic features alone. Subjects in these studies were mostly people acting or simulating emotions like anger. In this paper we use data from a so-called Wizard of Oz (WoZ) scenario to get more realistic data instead of simulated anger. As shown below, the classification rate for the two classes "emotion" (class E) and "neutral" (class :E) is significantly worse for these more realistic data. Furthermore the classification results are heavily speaker dependent. Prosody alone might thus not be sufficient and has to be supplemented by the use of other knowledge sources such as the detection of repetitions, reformulations, swear words, and dialogue acts
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.