652 research outputs found

    Producing Acoustic-Prosodic Entrainment in a Robotic Learning Companion to Build Learner Rapport

    Get PDF
    abstract: With advances in automatic speech recognition, spoken dialogue systems are assuming increasingly social roles. There is a growing need for these systems to be socially responsive, capable of building rapport with users. In human-human interactions, rapport is critical to patient-doctor communication, conflict resolution, educational interactions, and social engagement. Rapport between people promotes successful collaboration, motivation, and task success. Dialogue systems which can build rapport with their user may produce similar effects, personalizing interactions to create better outcomes. This dissertation focuses on how dialogue systems can build rapport utilizing acoustic-prosodic entrainment. Acoustic-prosodic entrainment occurs when individuals adapt their acoustic-prosodic features of speech, such as tone of voice or loudness, to one another over the course of a conversation. Correlated with liking and task success, a dialogue system which entrains may enhance rapport. Entrainment, however, is very challenging to model. People entrain on different features in many ways and how to design entrainment to build rapport is unclear. The first goal of this dissertation is to explore how acoustic-prosodic entrainment can be modeled to build rapport. Towards this goal, this work presents a series of studies comparing, evaluating, and iterating on the design of entrainment, motivated and informed by human-human dialogue. These models of entrainment are implemented in the dialogue system of a robotic learning companion. Learning companions are educational agents that engage students socially to increase motivation and facilitate learning. As a learning companion’s ability to be socially responsive increases, so do vital learning outcomes. A second goal of this dissertation is to explore the effects of entrainment on concrete outcomes such as learning in interactions with robotic learning companions. This dissertation results in contributions both technical and theoretical. Technical contributions include a robust and modular dialogue system capable of producing prosodic entrainment and other socially-responsive behavior. One of the first systems of its kind, the results demonstrate that an entraining, social learning companion can positively build rapport and increase learning. This dissertation provides support for exploring phenomena like entrainment to enhance factors such as rapport and learning and provides a platform with which to explore these phenomena in future work.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Applications of Discourse Structure for Spoken Dialogue Systems

    Get PDF
    Language exhibits structure beyond the sentence level (e.g. the syntactic structure of a sentence). In particular, dialogues, either human-human or human-computer, have an inherent structure called the discourse structure. Models of discourse structure attempt to explain why a sequence of random utterances combines to form a dialogue or no dialogue at all. Due to the relatively simple structure of the dialogues that occur in the information-access domains of typical spoken dialogue systems (e.g. travel planning), discourse structure has often seen limited application in such systems. In this research, we investigate the utility of discourse structure for spoken dialogue systems in more complex domains, e.g. tutoring. This work was driven by two intuitions.First, we believed that the "position in the dialogue" is a critical information source for two tasks: performance analysis and characterization of dialogue phenomena. We define this concept using transitions in the discourse structure. For performance analysis, these transitions are used to create a number of novel factors which we show to be predictive of system performance. One of these factors informs a promising modification of our system which is implemented and compared with the original version of the system through a user study. Results show that the modification leads to objective improvements. For characterization of dialogue phenomena, we find statistical dependencies between discourse structure transitions and two dialogue phenomena which allow us to speculate where and why these dialogue phenomena occur and to better understand system behavior.Second, we believed that users will benefit from direct access to discourse structure information. We enable this through a graphical representation of discourse structure called the Navigation Map. We demonstrate the subjective and objective utility of the Navigation Map through two user studies.Overall, our work demonstrates that discourse structure is an important information source for designers of spoken dialogue systems

    Psychophysiological analysis of a pedagogical agent and robotic peer for individuals with autism spectrum disorders.

    Get PDF
    Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by ongoing problems in social interaction and communication, and engagement in repetitive behaviors. According to Centers for Disease Control and Prevention, an estimated 1 in 68 children in the United States has ASD. Mounting evidence shows that many of these individuals display an interest in social interaction with computers and robots and, in general, feel comfortable spending time in such environments. It is known that the subtlety and unpredictability of people’s social behavior are intimidating and confusing for many individuals with ASD. Computerized learning environments and robots, however, prepare a predictable, dependable, and less complicated environment, where the interaction complexity can be adjusted so as to account for these individuals’ needs. The first phase of this dissertation presents an artificial-intelligence-based tutoring system which uses an interactive computer character as a pedagogical agent (PA) that simulates a human tutor teaching sight word reading to individuals with ASD. This phase examines the efficacy of an instructional package comprised of an autonomous pedagogical agent, automatic speech recognition, and an evidence-based instructional procedure referred to as constant time delay (CTD). A concurrent multiple-baseline across-participants design is used to evaluate the efficacy of intervention. Additionally, post-treatment probes are conducted to assess maintenance and generalization. The results suggest that all three participants acquired and maintained new sight words and demonstrated generalized responding. The second phase of this dissertation describes the augmentation of the tutoring system developed in the first phase with an autonomous humanoid robot which serves the instructional role of a peer for the student. In this tutoring paradigm, the robot adopts a peer metaphor, where its function is to act as a peer. With the introduction of the robotic peer (RP), the traditional dyadic interaction in tutoring systems is augmented to a novel triadic interaction in order to enhance the social richness of the tutoring system, and to facilitate learning through peer observation. This phase evaluates the feasibility and effects of using PA-delivered sight word instruction, based on a CTD procedure, within a small-group arrangement including a student with ASD and the robotic peer. A multiple-probe design across word sets, replicated across three participants, is used to evaluate the efficacy of intervention. The findings illustrate that all three participants acquired, maintained, and generalized all the words targeted for instruction. Furthermore, they learned a high percentage (94.44% on average) of the non-target words exclusively instructed to the RP. The data show that not only did the participants learn nontargeted words by observing the instruction to the RP but they also acquired their target words more efficiently and with less errors by the addition of an observational component to the direct instruction. The third and fourth phases of this dissertation focus on physiology-based modeling of the participants’ affective experiences during naturalistic interaction with the developed tutoring system. While computers and robots have begun to co-exist with humans and cooperatively share various tasks; they are still deficient in interpreting and responding to humans as emotional beings. Wearable biosensors that can be used for computerized emotion recognition offer great potential for addressing this issue. The third phase presents a Bluetooth-enabled eyewear – EmotiGO – for unobtrusive acquisition of a set of physiological signals, i.e., skin conductivity, photoplethysmography, and skin temperature, which can be used as autonomic readouts of emotions. EmotiGO is unobtrusive and sufficiently lightweight to be worn comfortably without interfering with the users’ usual activities. This phase presents the architecture of the device and results from testing that verify its effectiveness against an FDA-approved system for physiological measurement. The fourth and final phase attempts to model the students’ engagement levels using their physiological signals collected with EmotiGO during naturalistic interaction with the tutoring system developed in the second phase. Several physiological indices are extracted from each of the signals. The students’ engagement levels during the interaction with the tutoring system are rated by two trained coders using the video recordings of the instructional sessions. Supervised pattern recognition algorithms are subsequently used to map the physiological indices to the engagement scores. The results indicate that the trained models are successful at classifying participants’ engagement levels with the mean classification accuracy of 86.50%. These models are an important step toward an intelligent tutoring system that can dynamically adapt its pedagogical strategies to the affective needs of learners with ASD

    Designing Embodied Interactive Software Agents for E-Learning: Principles, Components, and Roles

    Get PDF
    Embodied interactive software agents are complex autonomous, adaptive, and social software systems with a digital embodiment that enables them to act on and react to other entities (users, objects, and other agents) in their environment through bodily actions, which include the use of verbal and non-verbal communicative behaviors in face-to-face interactions with the user. These agents have been developed for various roles in different application domains, in which they perform tasks that have been assigned to them by their developers or delegated to them by their users or by other agents. In computer-assisted learning, embodied interactive pedagogical software agents have the general task to promote human learning by working with students (and other agents) in computer-based learning environments, among them e-learning platforms based on Internet technologies, such as the Virtual Linguistics Campus (www.linguistics-online.com). In these environments, pedagogical agents provide contextualized, qualified, personalized, and timely assistance, cooperation, instruction, motivation, and services for both individual learners and groups of learners. This thesis develops a comprehensive, multidisciplinary, and user-oriented view of the design of embodied interactive pedagogical software agents, which integrates theoretical and practical insights from various academic and other fields. The research intends to contribute to the scientific understanding of issues, methods, theories, and technologies that are involved in the design, implementation, and evaluation of embodied interactive software agents for different roles in e-learning and other areas. For developers, the thesis provides sixteen basic principles (Added Value, Perceptible Qualities, Balanced Design, Coherence, Consistency, Completeness, Comprehensibility, Individuality, Variability, Communicative Ability, Modularity, Teamwork, Participatory Design, Role Awareness, Cultural Awareness, and Relationship Building) plus a large number of specific guidelines for the design of embodied interactive software agents and their components. Furthermore, it offers critical reviews of theories, concepts, approaches, and technologies from different areas and disciplines that are relevant to agent design. Finally, it discusses three pedagogical agent roles (virtual native speaker, coach, and peer) in the scenario of the linguistic fieldwork classes on the Virtual Linguistics Campus and presents detailed considerations for the design of an agent for one of these roles (the virtual native speaker)

    The State of Speech in HCI: Trends, Themes and Challenges

    Get PDF

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Interventions to Regulate Confusion during Learning

    Get PDF
    Confusion provides opportunities to learn at deeper levels. However, learners must put forth the necessary effort to resolve their confusion to convert this opportunity into actual learning gains. Learning occurs when learners engage in cognitive activities beneficial to learning (e.g., reflection, deliberation, problem solving) during the process of confusion resolution. Unfortunately, learners are not always able to resolve their confusion on their own. The inability to resolve confusion can be due to a lack of knowledge, motivation, or skills. The present dissertation explored methods to aid confusion resolution and ultimately promote learning through a multi-pronged approach. First, a survey revealed that learners prefer more information and feedback when confused and that they preferred different interventions for confusion compared to boredom and frustration. Second, expert human tutors were found to most frequently handle learner confusion by providing direct instruction and responded differently to learner confusion compared to anxiety, frustration, and happiness. Finally, two experiments were conducted to test the effectiveness of pedagogical and motivational confusion regulation interventions. Both types of interventions were investigated within a learning environment that experimentally induced confusion via the presentation of contradictory information by two animated agents (tutor and peer student agents). Results showed across both studies that learner effort during the confusion regulation task impacted confusion resolution and that learning occurred when the intervention provided the opportunity for learners to stop, think, and deliberate about the concept being discussed. Implications for building more effective affect-sensitive learning environments are discussed
    • …
    corecore