245 research outputs found

    The Impact of Interpretation Problems on Tutorial Dialogue

    Get PDF
    Supporting natural language input may improve learning in intelligent tutoring systems. However, interpretation errors are unavoidable and require an effective recovery policy. We describe an evaluation of an error recovery policy in the BEE-TLE II tutorial dialogue system and discuss how different types of interpretation problems affect learning gain and user satisfaction. In particular, the problems arising from student use of non-standard terminology appear to have negative consequences. We argue that existing strategies for dealing with terminology problems are insufficient and that improving such strategies is important in future ITS research.

    Recognizing Uncertainty in Speech

    Get PDF
    We address the problem of inferring a speaker's level of certainty based on prosodic information in the speech signal, which has application in speech-based dialogue systems. We show that using phrase-level prosodic features centered around the phrases causing uncertainty, in addition to utterance-level prosodic features, improves our model's level of certainty classification. In addition, our models can be used to predict which phrase a person is uncertain about. These results rely on a novel method for eliciting utterances of varying levels of certainty that allows us to compare the utility of contextually-based feature sets. We elicit level of certainty ratings from both the speakers themselves and a panel of listeners, finding that there is often a mismatch between speakers' internal states and their perceived states, and highlighting the importance of this distinction.Comment: 11 page

    Exploring User Satisfaction in a Tutorial Dialogue System

    Get PDF
    Abstract User satisfaction is a common evaluation metric in task-oriented dialogue systems, whereas tutorial dialogue systems are often evaluated in terms of student learning gain. However, user satisfaction is also important for such systems, since it may predict technology acceptance. We present a detailed satisfaction questionnaire used in evaluating the BEETLE II system (REVU-NL), and explore the underlying components of user satisfaction using factor analysis. We demonstrate interesting patterns of interaction between interpretation quality, satisfaction and the dialogue policy, highlighting the importance of more finegrained evaluation of user satisfaction

    A Satisfaction-based Model for Affect Recognition from Conversational Features in Spoken Dialog Systems

    Get PDF
    Detecting user affect automatically during real-time conversation is the main challenge towards our greater aim of infusing social intelligence into a natural-language mixed-initiative High-Fidelity (Hi-Fi) audio control spoken dialog agent. In recent years, studies on affect detection from voice have moved on to using realistic, non-acted data, which is subtler. However, it is more challenging to perceive subtler emotions and this is demonstrated in tasks such as labelling and machine prediction. This paper attempts to address part of this challenge by considering the role of user satisfaction ratings and also conversational/dialog features in discriminating contentment and frustration, two types of emotions that are known to be prevalent within spoken human-computer interaction. However, given the laboratory constraints, users might be positively biased when rating the system, indirectly making the reliability of the satisfaction data questionable. Machine learning experiments were conducted on two datasets, users and annotators, which were then compared in order to assess the reliability of these datasets. Our results indicated that standard classifiers were significantly more successful in discriminating the abovementioned emotions and their intensities (reflected by user satisfaction ratings) from annotator data than from user data. These results corroborated that: first, satisfaction data could be used directly as an alternative target variable to model affect, and that they could be predicted exclusively by dialog features. Second, these were only true when trying to predict the abovementioned emotions using annotator?s data, suggesting that user bias does exist in a laboratory-led evaluation

    Reflection and Learning Robustness in a Natural Language Conceptual Physics Tutoring System

    Get PDF
    This thesis investigates whether reflection after tutoring with the Itspoke qualitative physics tutoring system can improve both near and far transfer learning and retention. This question is formalized in three major hypotheses. H1: that reading a post-tutoring reflective text will improve learning compared to reading a non-reflective text. H2: that a more cohesive reflective text will produce higher learning gains for most students. And H3: that students with high domain knowledge will learn more from a less cohesive text.In addition, this thesis addresses the question of which mechanisms affect learning from a reflective text. Secondary hypotheses H4 and H5 posit that textual cohesion and student motivation, respectively, each affect learning by influencing the amount of inference performed while reading.These hypotheses were tested by asking students to read a reflective/abstractive text after tutoring with the Itspoke tutor. This text compared dialog parts in which similar physics principles had been applied to different situations. Students were randomly assigned among two experimental conditions which got ``high' or ``low' cohesion versions of this text, or a control condition which read non-reflective physics material after tutoring.The secondary hypotheses were tested using two measures of cognitive load while reading: reading speeds and a self-report measure of reading difficulty.Near and far transfer learning was measured using sets of questions that were mostly isomorphic vs. non-isomorphic the tutored problems, and retention was measured by administering both an immediate and a delayed post-test. Motivation was measured using a questionnaire.Reading a reflective text improved learning, but only for students with a middle amount of motivation, confirming H1 for that group. These students also learned more from a more cohesive reflective text, supporting H2. Cohesion also affected high and low knowledge students significantly differently, supporting H3, except that high knowledge students learned best from high, not low cohesion text.Students with higher amounts of motivation did have higher cognitive load, confirming hypothesis H5 and suggesting that they engaged the text more actively. However, secondary hypothesis H4 failed to show a role for cognitive load in explaining the learning interaction between knowledge and cohesion demonstrated in H3

    User Simulation for Spoken Dialog System Development

    Get PDF
    A user simulation is a computer program which simulates human user behaviors. Recently, user simulations have been widely used in two spoken dialog system development tasks. One is to generate large simulated corpora for applying machine learning to learn new dialog strategies, and the other is to replace human users to test dialog system performance. Although previous studies have shown successful examples of applying user simulations in both tasks, it is not clear what type of user simulation is most appropriate for a specific task because few studies compare different user simulations in the same experimental setting. In this research, we investigate how to construct user simulations in a specific task for spoken dialog system development. Since most current user simulations generate user actions based on probabilistic models, we identify two main factors in constructing such user simulations: the choice of user simulation model and the approach to set up user action probabilities. We build different user simulation models which differ in their efforts in simulating realistic user behaviors and exploring more user actions. We also investigate different manual and trained approaches to set up user action probabilities. We introduce both task-dependent and task-independent measures to compare these simulations. We show that a simulated user which mimics realistic user behaviors is not always necessary for the dialog strategy learning task. For the dialog system testing task, a user simulation which simulates user behaviors in a statistical way can generate both objective and subjective measures of dialog system performance similar to human users. Our research examines the strengths and weaknesses of user simulations in spoken dialog system development. Although our results are constrained to our task domain and the resources available, we provide a general framework for comparing user simulations in a task-dependent context. In addition, we summarize and validate a set of evaluation measures that can be used in comparing different simulated users as well as simulated versus human users
    corecore