38,491 research outputs found
Context-aware Human Motion Prediction
The problem of predicting human motion given a sequence of past observations
is at the core of many applications in robotics and computer vision. Current
state-of-the-art formulate this problem as a sequence-to-sequence task, in
which a historical of 3D skeletons feeds a Recurrent Neural Network (RNN) that
predicts future movements, typically in the order of 1 to 2 seconds. However,
one aspect that has been obviated so far, is the fact that human motion is
inherently driven by interactions with objects and/or other humans in the
environment. In this paper, we explore this scenario using a novel
context-aware motion prediction architecture. We use a semantic-graph model
where the nodes parameterize the human and objects in the scene and the edges
their mutual interactions. These interactions are iteratively learned through a
graph attention layer, fed with the past observations, which now include both
object and human body motions. Once this semantic graph is learned, we inject
it to a standard RNN to predict future movements of the human/s and object/s.
We consider two variants of our architecture, either freezing the contextual
interactions in the future of updating them. A thorough evaluation in the
"Whole-Body Human Motion Database" shows that in both cases, our context-aware
networks clearly outperform baselines in which the context information is not
considered.Comment: Accepted at CVPR2
Encouraging LSTMs to Anticipate Actions Very Early
In contrast to the widely studied problem of recognizing an action given a
complete sequence, action anticipation aims to identify the action from only
partially available videos. As such, it is therefore key to the success of
computer vision applications requiring to react as early as possible, such as
autonomous navigation. In this paper, we propose a new action anticipation
method that achieves high prediction accuracy even in the presence of a very
small percentage of a video sequence. To this end, we develop a multi-stage
LSTM architecture that leverages context-aware and action-aware features, and
introduce a novel loss function that encourages the model to predict the
correct class as early as possible. Our experiments on standard benchmark
datasets evidence the benefits of our approach; We outperform the
state-of-the-art action anticipation methods for early prediction by a relative
increase in accuracy of 22.0% on JHMDB-21, 14.0% on UT-Interaction and 49.9% on
UCF-101.Comment: 13 Pages, 7 Figures, 11 Tables. Accepted in ICCV 2017. arXiv admin
note: text overlap with arXiv:1611.0552
- …