8 research outputs found

    Just Ask:An Interactive Learning Framework for Vision and Language Navigation

    Full text link
    In the vision and language navigation task, the agent may encounter ambiguous situations that are hard to interpret by just relying on visual information and natural language instructions. We propose an interactive learning framework to endow the agent with the ability to ask for users' help in such situations. As part of this framework, we investigate multiple learning approaches for the agent with different levels of complexity. The simplest model-confusion-based method lets the agent ask questions based on its confusion, relying on the predefined confidence threshold of a next action prediction model. To build on this confusion-based method, the agent is expected to demonstrate more sophisticated reasoning such that it discovers the timing and locations to interact with a human. We achieve this goal using reinforcement learning (RL) with a proposed reward shaping term, which enables the agent to ask questions only when necessary. The success rate can be boosted by at least 15% with only one question asked on average during the navigation. Furthermore, we show that the RL agent is capable of adjusting dynamically to noisy human responses. Finally, we design a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human. We demonstrate the proposed strategy is substantially more realistic and data-efficient compared to previously proposed pre-exploration techniques.Comment: 8 pages, accepted to AAAI 202

    VISION-BASED URBAN NAVIGATION PROCEDURES FOR VERBALLY INSTRUCTED ROBOTS

    Get PDF
    The work presented in this thesis is part of a project in instruction based learning (IBL) for mobile robots were a robot is designed that can be instructed by its users through unconstrained natural language. The robot uses vision guidance to follow route instructions in a miniature town model. The aim of the work presented here was to determine the functional vocabulary of the robot in the form of "primitive procedures". In contrast to previous work in the field of instructable robots this was done following a "user-centred" approach were the main concern was to create primitive procedures that can be directly associated with natural language instructions. To achieve this, a corpus of human-to-human natural language instructions was collected and analysed. A set of primitive actions was found with which the collected corpus could be represented. These primitive actions were then implemented as robot-executable procedures. Natural language instructions are under-specified when destined to be executed by a robot. This is because instructors omit information that they consider as "commonsense" and rely on the listener's sensory-motor capabilities to determine the details of the task execution. In this thesis the under-specification problem is solved by determining the missing information, either during the learning of new routes or during their execution by the robot. During learning, the missing information is determined by imitating the commonsense approach human listeners take to achieve the same purpose. During execution, missing information, such as the location of road layout features mentioned in route instructions, is determined from the robot's view by using image template matching. The original contribution of this thesis, in both these methods, lies in the fact that they are driven by the natural language examples found in the corpus collected for the IDL project. During the testing phase a high success rate of primitive calls, when these were considered individually, showed that the under-specification problem has overall been solved. A novel method for testing the primitive procedures, as part of complete route descriptions, is also proposed in this thesis. This was done by comparing the performance of human subjects when driving the robot, following route descriptions, with the performance of the robot when executing the same route descriptions. The results obtained from this comparison clearly indicated where errors occur from the time when a human speaker gives a route description to the time when the task is executed by a human listener or by the robot. Finally, a software speed controller is proposed in this thesis in order to control the wheel speeds of the robot used in this project. The controller employs PI (Proportional and Integral) and PID (Proportional, Integral and Differential) control and provides a good alternative to expensive hardware

    HUMAN ROBOT INTERACTION THROUGH SEMANTIC INTEGRATION OF MULTIPLE MODALITIES, DIALOG MANAGEMENT, AND CONTEXTS

    Get PDF
    The hypothesis for this research is that applying the Human Computer Interaction (HCI) concepts of using multiple modalities, dialog management, context, and semantics to Human Robot Interaction (HRI) will improve the performance of Instruction Based Learning (IBL) compared to only using speech. We tested the hypothesis by simulating a domestic robot that can be taught to clean a house using a multi-modal interface. We used a method of semantically integrating the inputs from multiple modalities and contexts that multiplies a confidence score for each input by a Fusion Weight, sums the products, and then uses the input with the highest product sum. We developed an algorithm for determining the Fusion Weights. We concluded that different modalities, contexts, and modes of dialog management impact human robot interaction; however, which combination is better depends on the importance of the accuracy of learning what is taught versus the succinctness of the dialog between the user and the robot

    A Cloud-Based Extensible Avatar For Human Robot Interaction

    Get PDF
    Adding an interactive avatar to a human-robot interface requires the development of tools that animate the avatar so as to simulate an intelligent conversation partner. Here we describe a toolkit that supports interactive avatar modeling for human-computer interaction. The toolkit utilizes cloud-based speech-to-text software that provides active listening, a cloud-based AI to generate appropriate textual responses to user queries, and a cloud-based text-to-speech generation engine to generate utterances for this text. This output is combined with a cloud-based 3D avatar animation synchronized to the spoken response. Generated text responses are embedded within an XML structure that allows for tuning the nature of the avatar animation to simulate different emotional states. An expression package controls the avatar's facial expressions. The introduced rendering latency is obscured through parallel processing and an idle loop process that animates the avatar between utterances. The efficiency of the approach is validated through a formal user study

    Comunicação humano-robô através de linguagem falada

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaNos últimos anos, as tecnologias que dão suporte à robótica avançaram expressivamente. É possível encontrar robôs de serviço nos mais variados campos. O próximo passo é o desenvolvimento de robôs inteligentes, com capacidade de comunicação em linguagem falada e de realizar trabalhos úteis em interação/cooperação com humanos. Torna-se necessário, então, encontrar um modo de interagir eficientemente com esses robôs, e com agentes inteligentes de maneira geral, que permita a transmissão de conhecimento em ambos os sentidos. Partiremos da hipótese de que é possível desenvolver um sistema de diálogo baseado em linguagem natural falada que resolva esse problema. Assim, o objetivo principal deste trabalho é a definição, implementação e avaliação de um sistema de diálogo utilizável na interação baseada em linguagem natural falada entre humanos e agentes inteligentes. Ao longo deste texto, mostraremos os principais aspectos da comunicação por linguagem falada, tanto entre os humanos, como também entre humanos e máquinas. Apresentaremos as principais categorias de sistemas de diálogo, com exemplos de alguns sistemas implementados, assim como ferramentas para desenvolvimento e algumas técnicas de avaliação. A seguir, entre outros aspectos, desenvolveremos os seguintes: a evolução levada a efeito na arquitetura computacional do Carl, robô utilizado neste trabalho; o módulo de aquisição e gestão de conhecimento, desenvolvido para dar suporte à interação; e o novo gestor de diálogo, baseado na abordagem de “Estado da Informação”, também concebido e implementado no âmbito desta tese. Por fim, uma avaliação experimental envolvendo a realização de diversas tarefas de interação com vários participantes voluntários demonstrou ser possível interagir com o robô e realizar as tarefas solicitadas. Este trabalho experimental incluiu avaliação parcial de funcionalidades, avaliação global do sistema de diálogo e avaliação de usabilidade.In recent years, robotics-related technologies have reached a remarkable level of maturity. Service robots can be found in various fields. The next step is the development of intelligent robots, capable of communicating in spoken language and doing useful work in interaction / cooperation with humans. It is then necessary to find a way to efficiently interact with these robots, and with intelligent agents in general, enabling the transmission of knowledge in both directions. We will assume that one can develop a spoken language dialogue system to solve this problem. Therefore, the main goal of this work is the design, implementation and evaluation of a dialogue system that can be used on spoken language interaction between humans and intelligent agents. Throughout this document, we present and discuss the main aspects related to spoken language communication, among humans as well as between humans and machines. We present the main dialogue system categories, with examples of some implemented systems, development tools and a few evaluation techniques. Then, we describe the developed dialog system and its integration in a real robot, including the following aspects: the evolution in the computational architecture of Carl, the robot used in this work; the knowledge acquisition and management module, developed to support the interaction; and the new dialogue manager, based on the “Information State” approach, also designed and implemented within this thesis work. Finally, an experimental evaluation involving the completion of several interaction tasks involving several volunteers proved to be possible to interact with the robot and perform the requested tasks. The evaluation includes a partial evaluation of features, an overall evaluation of the dialogue system and a usability evaluation

    Semantische Objektmodellierung mittels multimodaler Interaktion

    Get PDF
    Ein Konzept für eine interaktive semantische Objektmodellierung wird vorgeschlagen. Die flexible und erweiterbare Objektrepräsentation ermöglicht die Modellierung funktionaler und semantischer Objektinformationen durch die Darstellung von Eigenschaften, die menschliche Begriffe und Kategorien abbilden und die Verbindung von Objekten mit Handlungen und mit sensoriell erfassbaren Attributen herstellen. Das interaktive Modellierungssystem erlaubt die intuitive Erstellung semantischer Objektmodelle

    Cognitive architecture of multimodal multidimensional dialogue management

    Get PDF
    Numerous studies show that participants of real-life dialogues happen to get involved in rather dynamic non-sequential interactions. This challenges the dialogue system designs based on a reactive interlocutor paradigm and calls for dialog systems that can be characterised as a proactive learner, accomplished multitasking planner and adaptive decision maker. Addressing this call, the thesis brings innovative integration of cognitive models into the human-computer dialogue systems. This work utilises recent advances in Instance-Based Learning of Theory of Mind skills and the established Cognitive Task Analysis and ACT-R models. Cognitive Task Agents, producing detailed simulation of human learning, prediction, adaption and decision making, are integrated in the multi-agent Dialogue Man-ager. The manager operates on the multidimensional information state enriched with representations based on domain- and modality-specific semantics and performs context-driven dialogue acts interpretation and generation. The flexible technical framework for modular distributed dialogue system integration is designed and tested. The implemented multitasking Interactive Cognitive Tutor is evaluated as showing human-like proactive and adaptive behaviour in setting goals, choosing appropriate strategies and monitoring processes across contexts, and encouraging the user exhibit similar metacognitive competences
    corecore