6,799 research outputs found
Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions
In this paper, we present a general framework for learning social affordance
grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human
interactions, and transfer the grammar to humanoids to enable a real-time
motion inference for human-robot interaction (HRI). Based on Gibbs sampling,
our weakly supervised grammar learning can automatically construct a
hierarchical representation of an interaction with long-term joint sub-tasks of
both agents and short term atomic actions of individual agents. Based on a new
RGB-D video dataset with rich instances of human interactions, our experiments
of Baxter simulation, human evaluation, and real Baxter test demonstrate that
the model learned from limited training data successfully generates human-like
behaviors in unseen scenarios and outperforms both baselines.Comment: The 2017 IEEE International Conference on Robotics and Automation
(ICRA
Team Plan Recognition: A Review of the State of the Art
There is an increasing need to develop artificial intelligence systems that
assist groups of humans working on coordinated tasks. These systems must
recognize and understand the plans and relationships between actions for a team
of humans working toward a common objective. This article reviews the
literature on team plan recognition and surveys the most recent logic-based
approaches for implementing it. First, we provide some background knowledge,
including a general definition of plan recognition in a team setting and a
discussion of implementation challenges. Next, we explain our reasoning for
focusing on logic-based methods. Finally, we survey recent approaches from two
primary classes of logic-based methods (plan library-based and domain
theory-based). We aim to bring more attention to this sparse but vital topic
and inspire new directions for implementing team plan recognition.Comment: 10 pages, 1 figure, 1 table. Abstract accepted, paper submitted to
14th International Conference on Applied Human Factors and Ergonomics (AHFE
2023
MISER: Mise-En-Scène Region Support for Staging Narrative Actions in Interactive Storytelling
The recent increase in interest in Interactive Storytelling systems, spurred on by the emergence of affordable virtual reality technology, has brought with it a need to address the way in which narrative content is visualized through the complex staging of multiple narrative agents' behaviors within virtual story worlds. In this work we address the challenge of automating several aspects of staging the activities of a population of narrative agents and their interactions, where agents can have differing levels of narrative relevance within the situated narrative actions. Our solution defines an approach that integrates the use of multiple dynamic regions within a virtual story world, specified via a semantic representation that is able to support the staging of narrative actions through the behaviors of the primary and background agents' that are involved. This encompasses both the mechanics of dealing with the narrative discourse level as well as the interaction with the narrative generation layer to account for any dynamic modifications of the virtual story world. We refer to this approach as mise-en-scène region (miser) support. In this paper, we describe our approach and its integration as part of a fully implemented Interactive Storytelling system. We illustrate the work through detailed examples of short narrative instantiations. We present the results of our evaluation which clearly demonstrate the potential of the miser approach, as well as its scalability
End-to-end Learning of Driving Models from Large-scale Video Datasets
Robust perception-action models should be learned from training data with
diverse visual appearances and realistic behaviors, yet current approaches to
deep visuomotor policy learning have been generally limited to in-situ models
learned from a single vehicle or a simulation environment. We advocate learning
a generic vehicle motion model from large scale crowd-sourced video data, and
develop an end-to-end trainable architecture for learning to predict a
distribution over future vehicle egomotion from instantaneous monocular camera
observations and previous vehicle state. Our model incorporates a novel
FCN-LSTM architecture, which can be learned from large-scale crowd-sourced
vehicle action data, and leverages available scene segmentation side tasks to
improve performance under a privileged learning paradigm.Comment: camera ready for CVPR201
- …