1,432 research outputs found

    Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions

    Full text link
    Comprehension of spoken natural language is an essential component for robots to communicate with human effectively. However, handling unconstrained spoken instructions is challenging due to (1) complex structures including a wide variety of expressions used in spoken language and (2) inherent ambiguity in interpretation of human instructions. In this paper, we propose the first comprehensive system that can handle unconstrained spoken language and is able to effectively resolve ambiguity in spoken instructions. Specifically, we integrate deep-learning-based object detection together with natural language processing technologies to handle unconstrained spoken instructions, and propose a method for robots to resolve instruction ambiguity through dialogue. Through our experiments on both a simulated environment as well as a physical industrial robot arm, we demonstrate the ability of our system to understand natural instructions from human operators effectively, and how higher success rates of the object picking task can be achieved through an interactive clarification process.Comment: 9 pages. International Conference on Robotics and Automation (ICRA) 2018. Accompanying videos are available at the following links: https://youtu.be/_Uyv1XIUqhk (the system submitted to ICRA-2018) and http://youtu.be/DGJazkyw0Ws (with improvements after ICRA-2018 submission

    Service humanoid robotics : a novel interactive system based on bionic-companionship framework

    Get PDF
    At present, industrial robotics focuses more on motion control and vision, whereas humanoid service robotics (HSRs) are increasingly being investigated and researched in the field of speech interaction. The problem and quality of human-robot interaction (HRI) has become a widely debated topic in academia. Especially when HSRs are applied in the hospitality industry, some researchers believe that the current HRI model is not well adapted to the complex social environment. HSRs generally lack the ability to accurately recognize human intentions and understand social scenarios. This study proposes a novel interactive framework suitable for HSRs. The proposed framework is grounded on the novel integration of Trevarthen ’s (2001) companionship theory and neural image captioning (NIC) generation algorithm. By integrating image-to-natural interactivity generation and communicating with the environment to better interact with the stakeholder, thereby changing from interaction to a bionic-companionship. Compared to previous research a novel interactive system is developed based on the bionic-companionship framework. The humanoid service robot was integrated with the system to conduct preliminary tests. The results show that the interactive system based on the bionic-companionship framework can help the service humanoid robot to effectively respond to changes in the interactive environment, for example give different responses to the same character in different scenes

    Evolution of Natural Language Processing Technology: Not Just Language Processing Towards General Purpose AI

    Full text link
    Since the invention of computers, communication through natural language (actual human language) has been a dream technology. However, natural language is extremely difficult to mathematically formulate, making it difficult to realize as an algorithm without considering programming. While there have been numerous technological developments, one cannot say that any results allowing free utilization have been achieved thus far. In the case of language learning in humans, for instance when learning one's mother tongue or foreign language, one must admit that this process is similar to the adage "practice makes perfect" in principle, even though the learning method is significant up to a point. Deep learning has played a central role in contemporary AI technology in recent years. When applied to natural language processing (NLP), this produced unprecedented results. Achievements exceeding the initial predictions have been reported from the results of learning vast amounts of textual data using deep learning. For instance, four arithmetic operations could be performed without explicit learning, thereby enabling the explanation of complex images and the generation of images from corresponding explanatory texts. It is an accurate example of the learner embodying the concept of "practice makes perfect" by using vast amounts of textual data. This report provides a technological explanation of how cutting-edge NLP has made it possible to realize the "practice makes perfect" principle. Additionally, examples of how this can be applied to business are provided. We reported in June 2022 in Japanese on the NLP movement from late 2021 to early 2022. We would like to summarize this as a memorandum since this is just the initial movement leading to the current large language models (LLMs).Comment: 40 page

    Agent AI: Surveying the Horizons of Multimodal Interaction

    Full text link
    Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions. In particular, we explore systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. We argue that by developing agentic AI systems in grounded environments, one can also mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs. The emerging field of Agent AI subsumes the broader embodied and agentic aspects of multimodal interactions. Beyond agents acting and interacting in the physical world, we envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment

    Teleoperated visual inspection and surveillance with unmanned ground and aerial vehicles,” Int

    Get PDF
    Abstract—This paper introduces our robotic system named UGAV (Unmanned Ground-Air Vehicle) consisting of two semi-autonomous robot platforms, an Unmanned Ground Vehicle (UGV) and an Unmanned Aerial Vehicles (UAV). The paper focuses on three topics of the inspection with the combined UGV and UAV: (A) teleoperated control by means of cell or smart phones with a new concept of automatic configuration of the smart phone based on a RKI-XML description of the vehicles control capabilities, (B) the camera and vision system with the focus to real time feature extraction e.g. for the tracking of the UAV and (C) the architecture and hardware of the UAV

    On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective

    Get PDF
    Our situated environment is full of uncertainty and highly dynamic, thus hindering the widespread adoption of machine-led Intelligent Decision-Making (IDM) in real world scenarios. This means IDM should have the capability of continuously learning new skills and efficiently generalizing across wider applications. IDM benefits from any new approaches and theoretical breakthroughs that exhibit Artificial General Intelligence (AGI) breaking the barriers between tasks and applications. Recent research has well-examined neural architecture, Transformer, as a backbone foundation model and its generalization to various tasks, including computer vision, natural language processing, and reinforcement learning. We therefore argue that a foundation decision model (FDM) can be established by formulating various decision-making tasks as a sequence decoding task using the Transformer architecture; this would be a promising solution to advance the applications of IDM in more complex real world tasks. In this paper, we elaborate on how a foundation decision model improves the efficiency and generalization of IDM. We also discuss potential applications of a FDM in multi-agent game AI, production scheduling, and robotics tasks. Finally, through a case study, we demonstrate our realization of the FDM, DigitalBrain (DB1) with 1.2 billion parameters, which achieves human-level performance over 453 tasks, including text generation, images caption, video games playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 would be a baby step towards more autonomous and efficient real world IDM applications.Comment: 26 pages, 4 figure

    Service Humanoid Robotics: Review and Design of A Novel Bionic-Companionship Framework

    Get PDF
    At present, industrial robotics focused more on motion control and vision; whereas Humanoid Service Robotics (HSRs) are increasingly being investigated among researchers' and practitioners' field of speech interactions. The problematic and quality of human-robot interaction (HRI) has become one of the hot potatoes concerned in academia. This paper proposes a novel interactive framework suitable for HSRs. The proposed framework is grounded on the novel integration of Trevarthen Companionship Theory and neural image generation algorithm in computer vision. By integrating the image-to-natural interactivities generation, and communicate with the environment to better interact with the stakeholder, thereby changing from interaction to a bionic-companionship. In addition, the article also reviews the research of neural image generation algorithms and summarizes the application cases of the algorithm structure in the field of robotics from a critical perspective. We believe that the new interactive bionic-companionship framework can enable HSRs to further develop towards robot companions

    Visual telemetry transmission in marine environment using Robot Operating System platform

    Get PDF
    Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) “Συστήματα Αυτοματισμού
    corecore