20 research outputs found
Using system and user performance features to improve emotion detection in spoken tutoring dialogs
In this study, we incorporate automatically obtained system/user performance features into machine learning experiments to detect student emotion in computer tutoring dialogs. Our results show a relative improvement of 2.7% on classification accuracy and 8.08% on Kappa over using standard lexical, prosodie, sequential, and identification features. This level of improvement is comparable to the performance improvement shown in previous studies by applying dialog acts or lexical/prosodic-/discourse- level contextual features
Detecting Emotion in Speech: Experiments in Three Domains
Abstract The goal of my proposed dissertation work is to help answer two fundamental questions: (1) How is emotion communicated in speech? and (2) Does emotion modeling improve spoken dialogue applications? In this paper I describe feature extraction and emotion classification experiments I have conducted and plan to conduct on three different domains: EPSaT, HMIHY, and ITSpoke. In addition, I plan to implement emotion modeling capabilities into ITSpoke and evaluate the effectiveness of doing so
Recognizing Uncertainty in Speech
We address the problem of inferring a speaker's level of certainty based on
prosodic information in the speech signal, which has application in
speech-based dialogue systems. We show that using phrase-level prosodic
features centered around the phrases causing uncertainty, in addition to
utterance-level prosodic features, improves our model's level of certainty
classification. In addition, our models can be used to predict which phrase a
person is uncertain about. These results rely on a novel method for eliciting
utterances of varying levels of certainty that allows us to compare the utility
of contextually-based feature sets. We elicit level of certainty ratings from
both the speakers themselves and a panel of listeners, finding that there is
often a mismatch between speakers' internal states and their perceived states,
and highlighting the importance of this distinction.Comment: 11 page
Recommended from our members
Detecting Emotion in Speech: Experiments in Three Domains
The goal of my proposed dissertation work is to help answer two fundamental questions: (1) How is emotion communicated in speech? and (2) Does emotion modeling improve spoken dialogue applications? In this paper I describe feature extraction and emotion classification experiments I have conducted and plan to conduct on three different domains: EPSaT, HMIHY, and ITSpoke. In addition, I plan to implement emotion modeling capabilities into ITSpoke and evaluate the effectiveness of doing so
Recommended from our members
Eliciting and annotating uncertainty in spoken language
A major challenge in the ļ¬eld of automatic recognition of emotion and affect in speech is the subjective nature of affect labels. The most common approach to acquiring affect labels is to ask a panel of listeners to rate a corpus of spoken utterances along one or more dimensions of interest. For applications ranging from educational technology to voice search to dictation, a speakerās level of certainty is a primary dimension of interest. In such applications, we would like to know the speakerās actual level of certainty, but past research has only revealed listenersā perception of the speakerās level of certainty. In this paper, we present a method for eliciting spoken utterances using stimuli that we design such that they have a quantitative, crowdsourced legibility score. While we cannot control a speakerās actual internal level of certainty, the use of these stimuli provides a better estimate of internal certainty compared to existing speech corpora. The Harvard Uncertainty Speech Corpus, containing speech data, certainty annotations, and prosodic features, is made available to the research community.Engineering and Applied Science
Recommended from our members
Computational Approaches to Modeling Speaker State in the Medical Domain
Recently, researchers in computer science and engineering have begun to explore the possibility of finding speech-based correlates of various medical conditions using automatic, computational methods. If such language cues can be identified and quantified automatically, this information can be used to support diagnosis and treatment of medical conditions in clinical settings and to further fundamental research in understanding cognition. This chapter reviews computational approaches that explore communicative patterns of patients who suffer from medical conditions such as depression, autism spectrum disorders, schizophrenia, and cancer. There are two main approaches discussed: research that explores features extracted from the acoustic signal and research that focuses on lexical and semantic features. We also present some applied research that uses computational methods to develop assistive technologies. In the final sections we discuss issues related to and the future of this emerging field of research
When to Say What and How: Adapting the Elaborateness and Indirectness of Spoken Dialogue Systems
With the aim of designing a spoken dialogue system which has the ability to adapt to the user's communication idiosyncrasies, we investigate whether it is possible to carry over insights from the usage of communication styles in human-human interaction to human-computer interaction. In an extensive literature review, it is demonstrated that communication styles play an important role in human communication. Using a multi-lingual data set, we show that there is a significant correlation between the communication style of the system and the preceding communication style of the user. This is why two components that extend the standard architecture of spoken dialogue systems are presented: 1) a communication style classifier that automatically identifies the user communication style and 2) a communication style selection module that selects an appropriate system communication style. We consider the communication styles elaborateness and indirectness as it has been shown that they influence the user's satisfaction and the user's perception of a dialogue. We present a neural classification approach based on supervised learning for each task. Neural networks are trained and evaluated with features that can be automatically derived during an ongoing interaction in every spoken dialogue system. It is shown that both components yield solid results and outperform the baseline in form of a majority-class classifier