7,300 research outputs found
Recognizing Uncertainty in Speech
We address the problem of inferring a speaker's level of certainty based on
prosodic information in the speech signal, which has application in
speech-based dialogue systems. We show that using phrase-level prosodic
features centered around the phrases causing uncertainty, in addition to
utterance-level prosodic features, improves our model's level of certainty
classification. In addition, our models can be used to predict which phrase a
person is uncertain about. These results rely on a novel method for eliciting
utterances of varying levels of certainty that allows us to compare the utility
of contextually-based feature sets. We elicit level of certainty ratings from
both the speakers themselves and a panel of listeners, finding that there is
often a mismatch between speakers' internal states and their perceived states,
and highlighting the importance of this distinction.Comment: 11 page
When Does Disengagement Correlate with Performance in Spoken Dialog Computer Tutoring?
In this paper we investigate how student disengagement relates to two performance metrics in a spoken dialog computer tutoring corpus, both when disengagement is measured through manual annotation by a trained human judge, and also when disengagement is measured through automatic annotation by the system based on a machine learning model. First, we investigate whether manually labeled overall disengagement and six different disengagement types are predictive of learning and user satisfaction in the corpus. Our results show that although students’ percentage of overall disengaged turns negatively correlates both with the amount they learn and their user satisfaction, the individual types of disengagement correlate differently: some negatively correlate with learning and user satisfaction, while others don’t correlate with eithermetric at all. Moreover, these relationships change somewhat depending on student prerequisite knowledge level. Furthermore, using multiple disengagement types to predict learning improves predictive power. Overall, these manual label-based results suggest that although adapting to disengagement should improve both student learning and user satisfaction in computer tutoring, maximizing performance requires the system to detect and respond differently based on disengagement type. Next, we present an approach to automatically detecting and responding to user disengagement types based on their differing correlations with correctness. Investigation of ourmachine learningmodel of user disengagement shows that its automatic labels negatively correlate with both performance metrics in the same way as the manual labels. The similarity of the correlations across the manual and automatic labels suggests that the automatic labels are a reasonable substitute for the manual labels. Moreover, the significant negative correlations themselves suggest that redesigning ITSPOKE to automatically detect and respond to disengagement has the potential to remediate disengagement and thereby improve performance, even in the presence of noise introduced by the automatic detection process
Low-level grounding in a multimodal mobile service robot conversational system using graphical models
The main task of a service robot with a voice-enabled communication interface is to engage a user in dialogue providing an access to the services it is designed for. In managing such interaction, inferring the user goal (intention) from the request for a service at each dialogue turn is the key issue. In service robot deployment conditions speech recognition limitations with noisy speech input and inexperienced users may jeopardize user goal identification. In this paper, we introduce a grounding state-based model motivated by reducing the risk of communication failure due to incorrect user goal identification. The model exploits the multiple modalities available in the service robot system to provide evidence for reaching grounding states. In order to handle the speech input as sufficiently grounded (correctly understood) by the robot, four proposed states have to be reached. Bayesian networks combining speech and non-speech modalities during user goal identification are used to estimate probability that each grounding state has been reached. These probabilities serve as a base for detecting whether the user is attending to the conversation, as well as for deciding on an alternative input modality (e.g., buttons) when the speech modality is unreliable. The Bayesian networks used in the grounding model are specially designed for modularity and computationally efficient inference. The potential of the proposed model is demonstrated comparing a conversational system for the mobile service robot RoboX employing only speech recognition for user goal identification, and a system equipped with multimodal grounding. The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation with multimodal data collected during the conversations of the robot RoboX with user
Recommended from our members
The Importance of Sub-Utterance Prosody in Predicting Level of Certainty
We present an experiment aimed at understanding how to optimally use acoustic and prosodic information to predict a speaker's level of certainty. With a corpus of utterances where we can isolate a single word or phrase that is responsible for the speaker's level of certainty we use different sets of sub-utterance prosodic features to train models for predicting an utterance's perceived level of certainty. Our results suggest that using prosodic features of the word or phrase responsible for the level of certainty and of its surrounding context improves the prediction accuracy without increasing the total number of features when compared to using only features taken from the utterance as a whole.Engineering and Applied Science
Recommended from our members
Identifying Uncertain Words within an Utterance via Prosodic Features
We describe an experiment that investigates whether sub-utterance prosodic features can be used to detect uncertainty at the wordlevel. That is, given an utterance that is classified as uncertain, we want to determine which word or phrase the speaker is uncertain about. We have a corpus of utterances spoken under varying degrees of certainty. Using combinations of sub-utterance prosodic features we train models to predict the level of certainty of an utterance. On a set of utterances that were perceived to be uncertain, we compare the predictions of our models for two candidate target word segmentations: (a) one with the actual word causing uncertainty as the proposed target word, and (b) one with a control word as the proposed target word. Our best model correctly identifies the word causing the uncertainty rather than the control word 91% of the time.Engineering and Applied Science
Exploring affect-context dependencies for adaptive system development
We use χ2 to investigate the context dependency of student affect in our computer tutoring dialogues, targeting uncertainty in student answers in 3 automatically monitorable contexts. Our results show significant dependencies between uncertain answers and specific contexts. Identification and analysis of these dependencies is our first step in developing an adaptive version of our dialogue system.
Content, Social, and Metacognitive Statements: An Empirical Study Comparing Human-Human and Human-Computer Tutorial Dialogue
We present a study which compares human-human computer-mediated tutoring with two computer tutoring systems based on the same materials but differing in the type of feedback they provide. Our results show that there are significant differences in interaction style between human-human and human-computer tutoring, as well as between the two computer tutors, and that different dialogue characteristics predict learning gain in different conditions. We show that there are significant differences in the non-content statements that students make to human and computer tutors, but also to different types of computer tutors. These differences also affect which factors are correlated with learning gain and user satisfaction. We argue that ITS designers should pay particular attention to strategies for dealing with negative social and metacognitive statements, and also conduct further research on how interaction style affects human-computer tutoring. © 2010 Springer-Verlag Berlin Heidelberg
Towards the Use of Dialog Systems to Facilitate Inclusive Education
Continuous advances in the development of information technologies have currently led to the possibility
of accessing learning contents from anywhere, at anytime, and almost instantaneously. However,
accessibility is not always the main objective in the design of educative applications, specifically to
facilitate their adoption by disabled people. Different technologies have recently emerged to foster the
accessibility of computers and new mobile devices, favoring a more natural communication between
the student and the developed educative systems. This chapter describes innovative uses of multimodal
dialog systems in education, with special emphasis in the advantages that they provide for creating
inclusive applications and learning activities
- …