94,693 research outputs found
Learning Human Motion Models for Long-term Predictions
We propose a new architecture for the learning of predictive spatio-temporal
motion models from data alone. Our approach, dubbed the Dropout Autoencoder
LSTM, is capable of synthesizing natural looking motion sequences over long
time horizons without catastrophic drift or motion degradation. The model
consists of two components, a 3-layer recurrent neural network to model
temporal aspects and a novel auto-encoder that is trained to implicitly recover
the spatial structure of the human skeleton via randomly removing information
about joints during training time. This Dropout Autoencoder (D-AE) is then used
to filter each predicted pose of the LSTM, reducing accumulation of error and
hence drift over time. Furthermore, we propose new evaluation protocols to
assess the quality of synthetic motion sequences even for which no ground truth
data exists. The proposed protocols can be used to assess generated sequences
of arbitrary length. Finally, we evaluate our proposed method on two of the
largest motion-capture datasets available to date and show that our model
outperforms the state-of-the-art on a variety of actions, including cyclic and
acyclic motion, and that it can produce natural looking sequences over longer
time horizons than previous methods
Flow-based Autoregressive Structured Prediction of Human Motion
A new method is proposed for human motion predition by learning temporal and
spatial dependencies in an end-to-end deep neural network. The joint
connectivity is explicitly modeled using a novel autoregressive structured
prediction representation based on flow-based generative models. We learn a
latent space of complex body poses in consecutive frames which is conditioned
on the high-dimensional structure input sequence. To construct each latent
variable, the general and local smoothness of the joint positions are
considered in a generative process using conditional normalizing flows. As a
result, all frame-level and joint-level continuities in the sequence are
preserved in the model. This enables us to parameterize the inter-frame and
intra-frame relationships and joint connectivity for robust long-term
predictions as well as short-term prediction. Our experiments on two
challenging benchmark datasets of Human3.6M and AMASS demonstrate that our
proposed method is able to effectively model the sequence information for
motion prediction and outperform other techniques in 42 of the 48 total
experiment scenarios to set a new state-of-the-art
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
On human motion prediction using recurrent neural networks
Human motion modelling is a classical problem at the intersection of graphics
and computer vision, with applications spanning human-computer interaction,
motion synthesis, and motion prediction for virtual and augmented reality.
Following the success of deep learning methods in several computer vision
tasks, recent work has focused on using deep recurrent neural networks (RNNs)
to model human motion, with the goal of learning time-dependent representations
that perform tasks such as short-term motion prediction and long-term human
motion synthesis. We examine recent work, with a focus on the evaluation
methodologies commonly used in the literature, and show that, surprisingly,
state-of-the-art performance can be achieved by a simple baseline that does not
attempt to model motion at all. We investigate this result, and analyze recent
RNN methods by looking at the architectures, loss functions, and training
procedures used in state-of-the-art approaches. We propose three changes to the
standard RNN models typically used for human motion, which result in a simple
and scalable RNN architecture that obtains state-of-the-art performance on
human motion prediction.Comment: Accepted at CVPR 1
Deep representation learning for human motion prediction and classification
Generative models of 3D human motion are often restricted to a small number
of activities and can therefore not generalize well to novel movements or
applications. In this work we propose a deep learning framework for human
motion capture data that learns a generic representation from a large corpus of
motion capture data and generalizes well to new, unseen, motions. Using an
encoding-decoding network that learns to predict future 3D poses from the most
recent past, we extract a feature representation of human motion. Most work on
deep learning for sequence prediction focuses on video and speech. Since
skeletal data has a different structure, we present and evaluate different
network architectures that make different assumptions about time dependencies
and limb correlations. To quantify the learned features, we use the output of
different layers for action classification and visualize the receptive fields
of the network units. Our method outperforms the recent state of the art in
skeletal motion prediction even though these use action specific training data.
Our results show that deep feedforward networks, trained from a generic mocap
database, can successfully be used for feature extraction from human motion
data and that this representation can be used as a foundation for
classification and prediction.Comment: This paper is published at the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 201
- …