32 research outputs found
CERN: Confidence-Energy Recurrent Network for Group Activity Recognition
This work is about recognizing human activities occurring in videos at
distinct semantic levels, including individual actions, interactions, and group
activities. The recognition is realized using a two-level hierarchy of Long
Short-Term Memory (LSTM) networks, forming a feed-forward deep architecture,
which can be trained end-to-end. In comparison with existing architectures of
LSTMs, we make two key contributions giving the name to our approach as
Confidence-Energy Recurrent Network -- CERN. First, instead of using the common
softmax layer for prediction, we specify a novel energy layer (EL) for
estimating the energy of our predictions. Second, rather than finding the
common minimum-energy class assignment, which may be numerically unstable under
uncertainty, we specify that the EL additionally computes the p-values of the
solutions, and in this way estimates the most confident energy minimum. The
evaluation on the Collective Activity and Volleyball datasets demonstrates: (i)
advantages of our two contributions relative to the common softmax and
energy-minimization formulations and (ii) a superior performance relative to
the state-of-the-art approaches.Comment: Accepted to IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions
In this paper, we present a general framework for learning social affordance
grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human
interactions, and transfer the grammar to humanoids to enable a real-time
motion inference for human-robot interaction (HRI). Based on Gibbs sampling,
our weakly supervised grammar learning can automatically construct a
hierarchical representation of an interaction with long-term joint sub-tasks of
both agents and short term atomic actions of individual agents. Based on a new
RGB-D video dataset with rich instances of human interactions, our experiments
of Baxter simulation, human evaluation, and real Baxter test demonstrate that
the model learned from limited training data successfully generates human-like
behaviors in unseen scenarios and outperforms both baselines.Comment: The 2017 IEEE International Conference on Robotics and Automation
(ICRA
Bayesian Inference of Recursive Sequences of Group Activities from Tracks
We present a probabilistic generative model for inferring a description of
coordinated, recursively structured group activities at multiple levels of
temporal granularity based on observations of individuals' trajectories. The
model accommodates: (1) hierarchically structured groups, (2) activities that
are temporally and compositionally recursive, (3) component roles assigning
different subactivity dynamics to subgroups of participants, and (4) a
nonparametric Gaussian Process model of trajectories. We present an MCMC
sampling framework for performing joint inference over recursive activity
descriptions and assignment of trajectories to groups, integrating out
continuous parameters. We demonstrate the model's expressive power in several
simulated and complex real-world scenarios from the VIRAT and UCLA Aerial Event
video data sets.Comment: 10 pages, 6 figures, in Proceedings of the 30th AAAI Conference on
Artificial Intelligence (AAAI'16), Phoenix, AZ, 201