367 research outputs found

    Evaluation and Optimisation of Incremental Processors

    Get PDF
    Incremental spoken dialogue systems, which process user input as it unfolds, pose additional engineering challenges compared to more standard non-incremental systems: Their processing components must be able to accept partial, and possibly subsequently revised input, and must produce output that is at the same time as accurate as possible and delivered with as little delay as possible. In this article, we define metrics that measure how well a given processor meets these challenges, and we identify types of gold standards for evaluation. We exemplify these metrics in the evaluation of several incremental processors that we have developed. We also present generic means to optimise some of the measures, if certain trade-offs are accepted. We believe that this work will help enable principled comparison of components for incremental dialogue systems and portability of results

    Confidence in uncertainty: Error cost and commitment in early speech hypotheses

    Get PDF
    © 2018 Loth et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Interactions with artificial agents often lack immediacy because agents respond slower than their users expect. Automatic speech recognisers introduce this delay by analysing a user’s utterance only after it has been completed. Early, uncertain hypotheses of incremental speech recognisers can enable artificial agents to respond more timely. However, these hypotheses may change significantly with each update. Therefore, an already initiated action may turn into an error and invoke error cost. We investigated whether humans would use uncertain hypotheses for planning ahead and/or initiating their response. We designed a Ghost-in-the-Machine study in a bar scenario. A human participant controlled a bartending robot and perceived the scene only through its recognisers. The results showed that participants used uncertain hypotheses for selecting the best matching action. This is comparable to computing the utility of dialogue moves. Participants evaluated the available evidence and the error cost of their actions prior to initiating them. If the error cost was low, the participants initiated their response with only suggestive evidence. Otherwise, they waited for additional, more confident hypotheses if they still had time to do so. If there was time pressure but only little evidence, participants grounded their understanding with echo questions. These findings contribute to a psychologically plausible policy for human-robot interaction that enables artificial agents to respond more timely and socially appropriately under uncertainty

    On the Development of Adaptive and User-Centred Interactive Multimodal Interfaces

    Get PDF
    Multimodal systems have attained increased attention in recent years, which has made possible important improvements in the technologies for recognition, processing, and generation of multimodal information. However, there are still many issues related to multimodality which are not clear, for example, the principles that make it possible to resemble human-human multimodal communication. This chapter focuses on some of the most important challenges that researchers have recently envisioned for future multimodal interfaces. It also describes current efforts to develop intelligent, adaptive, proactive, portable and affective multimodal interfaces

    Do You Feel Me?: Learning Language from Humans with Robot Emotional Displays

    Get PDF
    In working towards accomplishing a human-level acquisition and understanding of language, a robot must meet two requirements: the ability to learn words from interactions with its physical environment, and the ability to learn language from people in settings for language use, such as spoken dialogue. The second requirement poses a problem: If a robot is capable of asking a human teacher well-formed questions, it will lead the teacher to provide responses that are too advanced for a robot, which requires simple inputs and feedback to build word-level comprehension. In a live interactive study, we tested the hypothesis that emotional displays are a viable solution to this problem of how to communicate without relying on language the robot doesn\u27t--indeed, cannot--actually know. Emotional displays can relate the robot\u27s state of understanding to its human teacher, and are developmentally appropriate for the most common language acquisition setting: an adult interacting with a child. For our study, we programmed a robot to independently explore the world and elicit relevant word references and feedback from the participants who are confronted with two robot settings: a setting in which the robot displays emotions, and a second setting where the robot focuses on the task without displaying emotions, which also tests if emotional displays lead a participant to make incorrect assumptions regarding the robot\u27s understanding. Analyzing the results from the surveys and the Grounded Semantics classifiers, we discovered that the use of emotional displays increases the number of inputs provided to the robot, an effect that\u27s modulated by the ratio of positive to negative emotions that were displayed

    Sistemas de diálogo: una revisión

    Get PDF
    Spoken dialogue systems are computer programs developed to interact with users employing speech in order to provide them with specific automated services. The interaction is carried out by means of dialogue turns, which in many studies available in the literature, researchers aim to make as similar as possible to those between humans in terms of naturalness, intelligence and affective content. In this paper we describe the fundaments of these systems including the main technologies employed for their development. We also present an evolution of this technology and discuss some current applications. Moreover, we discuss development paradigms, including scripting languages and the development of conversational interfaces for mobile apps. The correct modelling of the user is a key aspect of this technology. This is why we also describe affective, personality and contextual models. Finally, we address some current research trends in terms of verbal communication, multimodal interaction and dialogue management.Los sistemas de diálogo son programas de ordenador desarrollados para interaccionar con los usuarios mediante habla, con la finalidad de proporcionarles servicios automatizados. La interacción se lleva a cabo mediante turnos de un tipo de diálogo que, en muchos estudios existentes en la literatura, los investigadores intentan que se parezca lo más posible al diálogo real que se lleva a cabo entre las personas en lo que se refiere a naturalidad, inteligencia y contenido afectivo. En este artículo describimos los fundamentos de esta tecnología, incluyendo las tecnologías básicas que se utilizan para implementar este tipo de sistemas. También presentamos una evolución de la tecnología y comentamos algunas aplicaciones actuales. Asimismo, describimos paradigmas de interacción, incluyendo lenguajes de script y desarrollo de interfaces conversacionales para aplicaciones móviles. Un aspecto clave de esta tecnología consiste en realizar un correcto modelado del usuario. Por este motivo, discutimos diversos modelos afectivos, de personalidad y contextuales. Finalmente, comentamos algunas líneas de investigación actuales relacionadas con la comunicación verbal, interacción multimodal y gestión del diálogo

    Using contextual information to understand searching and browsing behavior

    Get PDF
    There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and utilize contextual information in order to understand and improve users' searching and browsing behavior on the web. Our results capture important aspects of context under the realistic conditions of different online search services, aiming to ensure that our scientific insights and solutions transfer to the operational settings of real world applications

    Robust Dialog Management Through A Context-centric Architecture

    Get PDF
    This dissertation presents and evaluates a method of managing spoken dialog interactions with a robust attention to fulfilling the human user’s goals in the presence of speech recognition limitations. Assistive speech-based embodied conversation agents are computer-based entities that interact with humans to help accomplish a certain task or communicate information via spoken input and output. A challenging aspect of this task involves open dialog, where the user is free to converse in an unstructured manner. With this style of input, the machine’s ability to communicate may be hindered by poor reception of utterances, caused by a user’s inadequate command of a language and/or faults in the speech recognition facilities. Since a speech-based input is emphasized, this endeavor involves the fundamental issues associated with natural language processing, automatic speech recognition and dialog system design. Driven by ContextBased Reasoning, the presented dialog manager features a discourse model that implements mixed-initiative conversation with a focus on the user’s assistive needs. The discourse behavior must maintain a sense of generality, where the assistive nature of the system remains constant regardless of its knowledge corpus. The dialog manager was encapsulated into a speech-based embodied conversation agent platform for prototyping and testing purposes. A battery of user trials was performed on this agent to evaluate its performance as a robust, domain-independent, speech-based interaction entity capable of satisfying the needs of its users
    corecore