86 research outputs found

    Cognitive Principles in Robust Multimodal Interpretation

    Full text link
    Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech and gesture. To build effective multimodal interfaces, automated interpretation of user multimodal inputs is important. Inspired by the previous investigation on cognitive status in multimodal human machine interaction, we have developed a greedy algorithm for interpreting user referring expressions (i.e., multimodal reference resolution). This algorithm incorporates the cognitive principles of Conversational Implicature and Givenness Hierarchy and applies constraints from various sources (e.g., temporal, semantic, and contextual) to resolve references. Our empirical results have shown the advantage of this algorithm in efficiently resolving a variety of user references. Because of its simplicity and generality, this approach has the potential to improve the robustness of multimodal input interpretation

    Keyboardless Visual Programming Using Voice, Handwriting, and Gesture

    Get PDF
    Visual programming languages have facilitated the application development process, improving our ability to express programs, as well as our ability to view, edit and interact with them. Yet even in programming environments, productivity is restricted by the primary input sources: the mouse and the keyboard. As an alternative, we investigate a program development interface which responds to the most natural human communication technologies: voice, handwriting and gesture. Speech- and pen-based systems have yet to find broad acceptance in everyday life because they are insufficiently advantageous to overcome problems with reliability. However, we believe that a visual programming environment with a multimodal user interface properly constrained so as not to exceed the limits of the current technology has the potential to increase programming productivity for not only those people who are manually or visually impaired, but for the general population as well. In this paper we report on such a system

    A good gesture: exploring nonverbal communication for robust SLDSs

    Full text link
    Actas de las IV Jornadas de Tecnología del Habla (JTH 2006)In this paper we propose a research framework to explore the possibilities that state-of-the-art embodied conversational agents (ECAs) technology can offer to overcome typical robustness problems in spoken language dialogue systems (SLDSs), such as error detection and recovery, changes of turn and clarification requests, that occur in many human-machine dialogue situations in real applications. Our goal is to study the effects of nonverbal communication throughout the dialogue, and find out to what extent ECAs can help overcome user frustration in critical situations. In particular, we have created a gestural repertoire that we will test and continue to refine and expand, to fit as closely as possible the users’ expectations and intuitions, and to favour a more efficient and pleasant dialogue flow for the users. We also describe the test environment we have designed, simulating a realistic mobile application, as well as the evaluation methodology for the assessment, in forthcoming tests, of the potential benefits of adding nonverbal communication in complex dialogue situations.This work has been possible thanks to the support grant received from project TIC2003-09068-C02-02 of the Spanish Plan Nacional de I+D

    Interactive translation of conversational speech

    Get PDF

    The Effect of Attention to Self-Regulation of Speech Sound Productions on Speech Fluency in Oral Reading

    Get PDF
    Purpose: This study ultimately sought to test whether a condition of heightened attention to speech sound production during connected speech serves to trigger increased disfluencies. Disfluencies, or disruptions in the flow of speech, are highly variable in form and location, both within and across individuals and situations. Research to identify conditions that can predictably trigger disfluencies has the potential to provide insight into their elusive nature. A review of related literature covered the cognitive-linguistic theories related to speech fluency and stuttering. This review of previous literature also served as the foundation for why it was proposed that disfluencies would be triggered by heightened self-monitoring attention to how speech sounds are made during connected speech. Methods: Participants included 10 male and 10 female normally fluent adult college students. Their tasks included a baseline oral reading of a 330-word passage, learning of two new speech sounds, followed by an experimental reading of the same passage again. During the experimental reading, target sounds, which were indicated by highlighted locations within the passage, had to be replaced with the newly learned speech sounds. Participants indicated much greater attention was given to how speech sounds were produced during the experimental oral reading than in the baseline oral reading, to support and validate the nature of the task. Results: Disfluencies and oral reading rates were examined using descriptive statistics and analyzed by means of the negative binomial distribution model. Secondary analyses of oral reading rates were conducted with the Wilcoxon’s Signed Rank test. The results revealed that the experimental reading task was associated with a significant increase in Stuttering-Like Disfluency (SLD) and Other Disfluency (OD), and a significant decrease in oral reading rate. Furthermore, SLDs increased significantly more than ODs from the first to the second reading. Discussion: Results supported the hypothesis that disfluency, especially SLD, can be triggered by a condition of increased attention to self-monitoring how speech sounds are produced during connected speech. These findings support theories explaining disfluencies as a symptom of a speaker’s cognitive-linguistic speech planning processes being over-burdened. Implications are raised for specific populations that may be at risk-for more disfluencies: young children learning language, second-language learners, and children in speech therapy. Future research directions are recommended to better understand how to prevent disfluencies in at-risk populations and clarify the enigmatic relationship among attentional processes, phonological production planning, and stuttering

    ECA gesture strategies for robust SLDSs

    Get PDF
    This paper explores the use of embodied conversational agents (ECAs) to improve interaction with spoken language dialogue systems (SLDSs). For this purpose we have identified typical interaction problems with SLDSs and associated with each of them a particular ECA gesture or behaviour. User tests were carried out dividing the test users into two groups, each facing a different interaction metaphor (one with an ECA in the interface, and the other implemented only with voice). Our results suggest user frustration is lower when an ECA is present in the interface, and the dialogue flows more smoothly, partly due to the fact that users are better able to tell when they are expected to speak and whether the system has heard and understood. The users’ overall perceptions regarding the system were also affected, and interaction seems to be more enjoyable with an ECA than without it

    Evaluation of ECA Gesture strategies for robust Human-Computer Interaction

    Full text link
    Embodied Conversational Agents (ECAs) offer us the possibility to design pleasant and efficient human-machine interaction. In this paper we present an evaluation scheme to compare dialogue-based speaker authentication and information retrieval systems with and without ECAs on the interface. We used gestures and other visual cues to improve fluency and robustness of interaction with these systems. Our tests results suggest that when an ECA is present users perceive fewer system errors, their frustration levels are lower, turn-changing goes more smoothly, the interaction experience is more enjoyable, and system capabilities are generally perceived more positively than when no ECA is present. However, the ECA seems to intensify the users' privacy concerns
    corecore