691 research outputs found

    Teaching robots parametrized executable plans through spoken interaction

    Get PDF
    While operating in domestic environments, robots will necessarily face difficulties not envisioned by their developers at programming time. Moreover, the tasks to be performed by a robot will often have to be specialized and/or adapted to the needs of specific users and specific environments. Hence, learning how to operate by interacting with the user seems a key enabling feature to support the introduction of robots in everyday environments. In this paper we contribute a novel approach for learning, through the interaction with the user, task descriptions that are defined as a combination of primitive actions. The proposed approach makes a significant step forward by making task descriptions parametric with respect to domain specific semantic categories. Moreover, by mapping the task representation into a task representation language, we are able to express complex execution paradigms and to revise the learned tasks in a high-level fashion. The approach is evaluated in multiple practical applications with a service robot

    Roger that! Learning How Laypersons Teach New Functions to Intelligent Systems

    Get PDF
    Intelligent systems are rather smart today but still limited to built-in functionality. To break through this barrier, future systems must allow users to easily adapt the system by themselves. For humans the most natural way to communicate is talking. But what if users want to extend the systems’ functionality with nothing but natural language? Then intelligent systems must understand how laypersons teach new skills. To grasp the semantics of such teaching sequences, we have defined a hierarchical classification task. On the first level, we consider the existence of a teaching intent in an utterance; on the second, we classify the distinct semantic parts of teaching sequences: declaration of a new function, specification of intermediate steps, and superfluous information. We evaluate twelve machine learning techniques with multiple configurations tailored to this task ranging from classical approaches such as naı̈ve-bayes to modern techniques such as bidirectional LSTMs and task-oriented adaptations. On the first level convolutional neural networks achieve the best accuracy (96.6%). For the second task, bidirectional LSTMs are the most accurate (98.8%). With the additional adaptations we are able to improve both classifications distinctly (up to 1.8%)

    An Exploratory Study to Determine the Effects Conversational Repetition Has on Perceived Workload and User Experience Quality in an Online Human-Robot Interaction

    Get PDF
    Human-robot interaction studies in the Caribbean currently face two challenges. First, the robots used in these studies have difficulty understanding many of the regional accents spoken study participants. Secondly, the global pandemic has made in-person HRI studies in the Caribbean more challenging due to the physical and social distancing mandates. This paper reports on our exploratory study to determine what kind of impact these two challenges have on HRI by evaluating the effect conversational repetition has on a human-robot conversation done using video conferencing software. Using network analysis, the results obtained suggest that conversational repetition has several subtle relationships on perceived workload. One interesting finding is that frustration and effort are indirectly affected by conversational repetition. Results from the short User Experience Questionnaire indicate that the overall quality of the user experience is perceived as positive-neutral. This encouraging result indicates that video conferencing may be a suitable interaction modality for HRI studies in the Caribbean

    Towards Programming in Natural Language: Learning New Functions from Spoken Utterances

    Get PDF
    Systems with conversational interfaces are rather popular nowadays. However, their full potential is not yet exploited. For the time being, users are restricted to calling predefined functions. Soon, users will expect to customize systems to their needs and create own functions using nothing but spoken instructions. Thus, future systems must understand how laypersons teach new functionality to intelligent systems. The understanding of natural language teaching sequences is a first step toward comprehensive end-user programming in natural language. We propose to analyze the semantics of spoken teaching sequences with a hierarchical classification approach. First, we classify whether an utterance constitutes an effort to teach a new function or not. Afterward, a second classifier locates the distinct semantic parts of teaching efforts: declaration of a new function, specification of intermediate steps, and superfluous information. For both tasks we implement a broad range of machine learning techniques: classical approaches, such as NaĂŻve Bayes, and neural network configurations of various types and architectures, such as bidirectional LSTMs. Additionally, we introduce two heuristic-based adaptations that are tailored to the task of understanding teaching sequences. As data basis we use 3168 descriptions gathered in a user study. For the first task convolutional neural networks obtain the best results (accuracy: 96.6%); bidirectional LSTMs excel in the second (accuracy: 98.8%). The adaptations improve the first-level classification considerably (plus 2.2% points)

    Core Challenges in Embodied Vision-Language Planning

    Full text link
    Recent advances in the areas of multimodal machine learning and artificial intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Embodied AI. Whereas many approaches and previous survey pursuits have characterised one or two of these dimensions, there has not been a holistic analysis at the center of all three. Moreover, even when combinations of these topics are considered, more focus is placed on describing, e.g., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field. In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. Finally, we present the core challenges that we believe new EVLP works should seek to address, and we advocate for task construction that enables model generalizability and furthers real-world deployment.Comment: 35 page

    Sensory Computing and Object Processing Entity: Assistive Robotics for Healthcare

    Get PDF
    Team SCOPE has created an assistive robot for healthcare delivery. The robot is mobile, responds to spoken commands, and possesses Artificial Intelligence (AI). It extracts meanings about the patient’s health from conversations and visual interactions. It summarizes these observations into reports that could be merged with the patient’s Electronic Health Records (EHRs). This process aids healthcare professionals in delivering better care by augmenting attendance, increasing accuracy of patient information collection, aiding in diagnosis, streamlining data collection, and automating the process of ingesting and incorporating this information into EHR systems. SCOPE’s solution uses cloud-based AI services along with local processing. Using VEX Robotics parts and an Arduino microcontroller, SCOPE created a mobile platform for the robot. The robotic platform implements basic motions and obstacle avoidance. These separate systems are integrated using a Java master program, Node-Red, and IBM Watson cloud services. The resulting AI can be expanded for different applications within healthcare delivery

    A survey of technologies supporting design of a multimodal interactive robot for military communication

    Get PDF
    Purpose – This paper presents a survey of research into interactive robotic systems for the purpose of identifying the state of the art capabilities as well as the extant gaps in this emerging field. Communication is multimodal. Multimodality is a representation of many modes chosen from rhetorical aspects for its communication potentials. The author seeks to define the available automation capabilities in communication using multimodalities that will support a proposed Interactive Robot System (IRS) as an AI mounted robotic platform to advance the speed and quality of military operational and tactical decision making. Design/methodology/approach – This review will begin by presenting key developments in the robotic interaction field with the objective of identifying essential technological developments that set conditions for robotic platforms to function autonomously. After surveying the key aspects in Human Robot Interaction (HRI), Unmanned Autonomous System (UAS), visualization, Virtual Environment (VE) and prediction, the paper then proceeds to describe the gaps in the application areas that will require extension and integration to enable the prototyping of the IRS. A brief examination of other work in HRI-related fields concludes with a recapitulation of the IRS challenge that will set conditions for future success. Findings – Using insights from a balanced cross section of sources from the government, academic, and commercial entities that contribute to HRI a multimodal IRS in military communication is introduced. Multimodal IRS (MIRS) in military communication has yet to be deployed. Research limitations/implications – Multimodal robotic interface for the MIRS is an interdisciplinary endeavour. This is not realistic that one can comprehend all expert and related knowledge and skills to design and develop such multimodal interactive robotic interface. In this brief preliminary survey, the author has discussed extant AI, robotics, NLP, CV, VDM, and VE applications that is directly related to multimodal interaction. Each mode of this multimodal communication is an active research area. Multimodal human/military robot communication is the ultimate goal of this research. Practical implications – A multimodal autonomous robot in military communication using speech, images, gestures, VST and VE has yet to be deployed. Autonomous multimodal communication is expected to open wider possibilities for all armed forces. Given the density of the land domain, the army is in a position to exploit the opportunities for human–machine teaming (HMT) exposure. Naval and air forces will adopt platform specific suites for specially selected operators to integrate with and leverage this emerging technology. The possession of a flexible communications means that readily adapts to virtual training will enhance planning and mission rehearsals tremendously. Social implications – Interaction, perception, cognition and visualization based multimodal communication system is yet missing. Options to communicate, express and convey information in HMT setting with multiple options, suggestions and recommendations will certainly enhance military communication, strength, engagement, security, cognition, perception as well as the ability to act confidently for a successful mission. Originality/value – The objective is to develop a multimodal autonomous interactive robot for military communications. This survey reports the state of the art, what exists and what is missing, what can be done and possibilities of extension that support the military in maintaining effective communication using multimodalities. There are some separate ongoing progresses, such as in machine-enabled speech, image recognition, tracking, visualizations for situational awareness, and virtual environments. At this time, there is no integrated approach for multimodal human robot interaction that proposes a flexible and agile communication. The report briefly introduces the research proposal about multimodal interactive robot in military communication
    • 

    corecore