6,407 research outputs found

    Early Turn-taking Prediction with Spiking Neural Networks for Human Robot Collaboration

    Full text link
    Turn-taking is essential to the structure of human teamwork. Humans are typically aware of team members' intention to keep or relinquish their turn before a turn switch, where the responsibility of working on a shared task is shifted. Future co-robots are also expected to provide such competence. To that end, this paper proposes the Cognitive Turn-taking Model (CTTM), which leverages cognitive models (i.e., Spiking Neural Network) to achieve early turn-taking prediction. The CTTM framework can process multimodal human communication cues (both implicit and explicit) and predict human turn-taking intentions in an early stage. The proposed framework is tested on a simulated surgical procedure, where a robotic scrub nurse predicts the surgeon's turn-taking intention. It was found that the proposed CTTM framework outperforms the state-of-the-art turn-taking prediction algorithms by a large margin. It also outperforms humans when presented with partial observations of communication cues (i.e., less than 40% of full actions). This early prediction capability enables robots to initiate turn-taking actions at an early stage, which facilitates collaboration and increases overall efficiency.Comment: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 201

    Cooperative Interactive Distributed Guidance on Mobile Devices

    Get PDF
    Mobiles device are quickly becoming an indispensable part of our society. Equipped with numerous communication capabilities, they are increasingly being examined as potential tools for civilian and military usage to aide in distributed remote collaboration for dynamic decision making and physical task completion. With an ever growing mobile workforce, the need for remote assistance in aiding field workers who are confronted with situations outside their expertise certainly increases. Enhanced capabilities in using mobile devices could significantly improve numerous components of a task\u27s completion (i.e. accuracy, timing, etc.). This dissertation considers the design of mobile implementation of technology and communication capabilities to support interactive collaboration between distributed team members. Specifically, this body of research seeks to explore and understand how various multimodal remote assistances affect both the human user\u27s performance and the mobile device\u27s effectiveness when used during cooperative tasks. Additionally, power effects are additionally studied to assess the energy demands on a mobile device supporting multimodal communication. In a series of applied experiments and demonstrations, the effectiveness of a mobile device facilitating multimodal collaboration is analyzed through both empirical data collection and subjective exploration. The utility of the mobile interactive system and its configurations are examined to assess the impact on distributed task performance and collaborative dialogue between pairs. The dissertation formulates and defends an argument that multimodal communication capabilities should be incorporated into mobile communication channels to provide collaborating partners salient perspectives with a goal of reaching a mutual understanding of task procedures. The body of research discusses the findings of this investigation and highlight these findings they may influence future mobile research seeking to enhance interactive distributed guidance

    Augmenting Situated Spoken Language Interaction with Listener Gaze

    Get PDF
    Collaborative task solving in a shared environment requires referential success. Human speakers follow the listener’s behavior in order to monitor language comprehension (Clark, 1996). Furthermore, a natural language generation (NLG) system can exploit listener gaze to realize an effective interaction strategy by responding to it with verbal feedback in virtual environments (Garoufi, Staudte, Koller, & Crocker, 2016). We augment situated spoken language interaction with listener gaze and investigate its role in human-human and human-machine interactions. Firstly, we evaluate its impact on prediction of reference resolution using a mulitimodal corpus collection from virtual environments. Secondly, we explore if and how a human speaker uses listener gaze in an indoor guidance task, while spontaneously referring to real-world objects in a real environment. Thirdly, we consider an object identification task for assembly under system instruction. We developed a multimodal interactive system and two NLG systems that integrate listener gaze in the generation mechanisms. The NLG system “Feedback” reacts to gaze with verbal feedback, either underspecified or contrastive. The NLG system “Installments” uses gaze to incrementally refer to an object in the form of installments. Our results showed that gaze features improved the accuracy of automatic prediction of reference resolution. Further, we found that human speakers are very good at producing referring expressions, and showing listener gaze did not improve performance, but elicited more negative feedback. In contrast, we showed that an NLG system that exploits listener gaze benefits the listener’s understanding. Specifically, combining a short, ambiguous instruction with con- trastive feedback resulted in faster interactions compared to underspecified feedback, and even outperformed following long, unambiguous instructions. Moreover, alternating the underspecified and contrastive responses in an interleaved manner led to better engagement with the system and an effcient information uptake, and resulted in equally good performance. Somewhat surprisingly, when gaze was incorporated more indirectly in the generation procedure and used to trigger installments, the non-interactive approach that outputs an instruction all at once was more effective. However, if the spatial expression was mentioned first, referring in gaze-driven installments was as efficient as following an exhaustive instruction. In sum, we provide a proof of concept that listener gaze can effectively be used in situated human-machine interaction. An assistance system using gaze cues is more attentive and adapts to listener behavior to ensure communicative success

    Recent Advancements in Augmented Reality for Robotic Applications: A Survey

    Get PDF
    Robots are expanding from industrial applications to daily life, in areas such as medical robotics, rehabilitative robotics, social robotics, and mobile/aerial robotics systems. In recent years, augmented reality (AR) has been integrated into many robotic applications, including medical, industrial, human–robot interactions, and collaboration scenarios. In this work, AR for both medical and industrial robot applications is reviewed and summarized. For medical robot applications, we investigated the integration of AR in (1) preoperative and surgical task planning; (2) image-guided robotic surgery; (3) surgical training and simulation; and (4) telesurgery. AR for industrial scenarios is reviewed in (1) human–robot interactions and collaborations; (2) path planning and task allocation; (3) training and simulation; and (4) teleoperation control/assistance. In addition, the limitations and challenges are discussed. Overall, this article serves as a valuable resource for working in the field of AR and robotic research, offering insights into the recent state of the art and prospects for improvement

    Spatial Programming for Industrial Robots through Task Demonstration

    Get PDF
    We present an intuitive system for the programming of industrial robots using markerless gesture recognition and mobile augmented reality in terms of programming by demonstration. The approach covers gesture-based task definition and adaption by human demonstration, as well as task evaluation through augmented reality. A 3D motion tracking system and a handheld device establish the basis for the presented spatial programming system. In this publication, we present a prototype toward the programming of an assembly sequence consisting of several pick-and-place tasks. A scene reconstruction provides pose estimation of known objects with the help of the 2D camera of the handheld. Therefore, the programmer is able to define the program through natural bare-hand manipulation of these objects with the help of direct visual feedback in the augmented reality application. The program can be adapted by gestures and transmitted subsequently to an arbitrary industrial robot controller using a unified interface. Finally, we discuss an application of the presented spatial programming approach toward robot-based welding tasks
    • 

    corecore