6,799 research outputs found

    Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions

    Full text link
    In this paper, we present a general framework for learning social affordance grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human interactions, and transfer the grammar to humanoids to enable a real-time motion inference for human-robot interaction (HRI). Based on Gibbs sampling, our weakly supervised grammar learning can automatically construct a hierarchical representation of an interaction with long-term joint sub-tasks of both agents and short term atomic actions of individual agents. Based on a new RGB-D video dataset with rich instances of human interactions, our experiments of Baxter simulation, human evaluation, and real Baxter test demonstrate that the model learned from limited training data successfully generates human-like behaviors in unseen scenarios and outperforms both baselines.Comment: The 2017 IEEE International Conference on Robotics and Automation (ICRA

    Team Plan Recognition: A Review of the State of the Art

    Full text link
    There is an increasing need to develop artificial intelligence systems that assist groups of humans working on coordinated tasks. These systems must recognize and understand the plans and relationships between actions for a team of humans working toward a common objective. This article reviews the literature on team plan recognition and surveys the most recent logic-based approaches for implementing it. First, we provide some background knowledge, including a general definition of plan recognition in a team setting and a discussion of implementation challenges. Next, we explain our reasoning for focusing on logic-based methods. Finally, we survey recent approaches from two primary classes of logic-based methods (plan library-based and domain theory-based). We aim to bring more attention to this sparse but vital topic and inspire new directions for implementing team plan recognition.Comment: 10 pages, 1 figure, 1 table. Abstract accepted, paper submitted to 14th International Conference on Applied Human Factors and Ergonomics (AHFE 2023

    MISER: Mise-En-Scène Region Support for Staging Narrative Actions in Interactive Storytelling

    Get PDF
    The recent increase in interest in Interactive Storytelling systems, spurred on by the emergence of affordable virtual reality technology, has brought with it a need to address the way in which narrative content is visualized through the complex staging of multiple narrative agents' behaviors within virtual story worlds. In this work we address the challenge of automating several aspects of staging the activities of a population of narrative agents and their interactions, where agents can have differing levels of narrative relevance within the situated narrative actions. Our solution defines an approach that integrates the use of multiple dynamic regions within a virtual story world, specified via a semantic representation that is able to support the staging of narrative actions through the behaviors of the primary and background agents' that are involved. This encompasses both the mechanics of dealing with the narrative discourse level as well as the interaction with the narrative generation layer to account for any dynamic modifications of the virtual story world. We refer to this approach as mise-en-scène region (miser) support. In this paper, we describe our approach and its integration as part of a fully implemented Interactive Storytelling system. We illustrate the work through detailed examples of short narrative instantiations. We present the results of our evaluation which clearly demonstrate the potential of the miser approach, as well as its scalability

    End-to-end Learning of Driving Models from Large-scale Video Datasets

    Full text link
    Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm.Comment: camera ready for CVPR201
    • …
    corecore