79,851 research outputs found

    Mani-GPT: A Generative Model for Interactive Robotic Manipulation

    Full text link
    In real-world scenarios, human dialogues are multi-round and diverse. Furthermore, human instructions can be unclear and human responses are unrestricted. Interactive robots face difficulties in understanding human intents and generating suitable strategies for assisting individuals through manipulation. In this article, we propose Mani-GPT, a Generative Pre-trained Transformer (GPT) for interactive robotic manipulation. The proposed model has the ability to understand the environment through object information, understand human intent through dialogues, generate natural language responses to human input, and generate appropriate manipulation plans to assist the human. This makes the human-robot interaction more natural and humanized. In our experiment, Mani-GPT outperforms existing algorithms with an accuracy of 84.6% in intent recognition and decision-making for actions. Furthermore, it demonstrates satisfying performance in real-world dialogue tests with users, achieving an average response accuracy of 70%

    Decision-Oriented Dialogue for Human-AI Collaboration

    Full text link
    We describe a class of tasks called decision-oriented dialogues, in which AI assistants must collaborate with one or more humans via natural language to help them make complex decisions. We formalize three domains in which users face everyday decisions: (1) choosing an assignment of reviewers to conference papers, (2) planning a multi-step itinerary in a city, and (3) negotiating travel plans for a group of friends. In each of these settings, AI assistants and users have disparate abilities that they must combine to arrive at the best decision: assistants can access and process large amounts of information, while users have preferences and constraints external to the system. For each task, we build a dialogue environment where agents receive a reward based on the quality of the final decision they reach. Using these environments, we collect human-human dialogues with humans playing the role of assistant. To compare how current AI assistants communicate in these settings, we present baselines using large language models in self-play. Finally, we highlight a number of challenges models face in decision-oriented dialogues, ranging from efficient communication to reasoning and optimization, and release our environments as a testbed for future modeling work

    A Plan-Based Model for Response Generation in Collaborative Task-Oriented Dialogues

    Full text link
    This paper presents a plan-based architecture for response generation in collaborative consultation dialogues, with emphasis on cases in which the system (consultant) and user (executing agent) disagree. Our work contributes to an overall system for collaborative problem-solving by providing a plan-based framework that captures the {\em Propose-Evaluate-Modify} cycle of collaboration, and by allowing the system to initiate subdialogues to negotiate proposed additions to the shared plan and to provide support for its claims. In addition, our system handles in a unified manner the negotiation of proposed domain actions, proposed problem-solving actions, and beliefs proposed by discourse actions. Furthermore, it captures cooperative responses within the collaborative framework and accounts for why questions are sometimes never answered.Comment: 8 pages, to appear in the Proceedings of AAAI-94. LaTeX source file, requires aaai.sty and epsf.tex. Figures included in separate file

    Specification Techniques for Multi-Modal Dialogues in the U-Wish Project

    Get PDF
    In this paper we describe the development of a specification\ud technique for specifying interactive web-based services. We\ud wanted to design a language that can be a means of\ud communication between designers and developers of interactive services, that makes it easier to develop web-based services fitted to the users and that shortens the pathway from design to implementation. The language, still under development, is based on process algebra and can be\ud connected to the results of task analysis. We have been\ud working on the automatic generation of executable prototypes\ud out of the specifications. In this way the specification\ud language can establish a connection between users, design\ud and implementation. A first version of this language is\ud available as well as prototype tools for executing the specifications. Ideas will be given as to how to make the connection between specifications and task analysis

    Supporting Constructive Learning with a Feedback Planner

    Get PDF
    A promising approach to constructing more effective computer tutors is implementing tutorial strategies that extend over multiple turns. This means that computer tutors must deal with (1) failure, (2) interruptions, (3) the need to revise their tactics, and (4) basic dialogue phenomena such as acknowledgment. To deal with these issues, we need to combine ITS technology with advances from robotics and computational linguistics. We can use reactive planning techniques from robotics to allow us to modify tutorial plans, adapting them to student input. Computational linguistics will give us guidance in handling communication management as well as building a reusable architecture for tutorial dialogue systems. A modular and reusable architecture is critical given the difficulty in constructing tutorial dialogue systems and the many domains to which we would like to apply them. In this paper, we propose such an architecture and discuss how a reactive planner in the context of this architecture can implement multi-turn tutorial strategies

    Multimodal agent interfaces and system architectures for health and fitness companions

    Get PDF
    Multimodal conversational spoken dialogues using physical and virtual agents provide a potential interface to motivate and support users in the domain of health and fitness. In this paper we present how such multimodal conversational Companions can be implemented to support their owners in various pervasive and mobile settings. In particular, we focus on different forms of multimodality and system architectures for such interfaces
    • …
    corecore