9,731 research outputs found
Recognising facial expressions in video sequences
We introduce a system that processes a sequence of images of a front-facing human face and recognises a set of facial expressions. We use an efficient appearance-based face tracker to locate the face in the image sequence and estimate the deformation of its non-rigid components. The tracker works in real-time. It is robust to strong illumination changes and factors out changes in appearance caused by illumination from changes due to face deformation. We adopt a model-based approach for facial expression recognition. In our model, an image of a face is represented by a point in a deformation space. The variability of the classes of images associated to facial expressions are represented by a set of samples which model a low-dimensional manifold in the space of deformations. We introduce a probabilistic procedure based on a nearest-neighbour approach to combine the information provided by the incoming image sequence with the prior information stored in the expression manifold in order to compute a posterior probability associated to a facial expression. In the experiments conducted we show that this system is able to work in an unconstrained environment with strong changes in illumination and face location. It achieves an 89\% recognition rate in a set of 333 sequences from the Cohn-Kanade data base
HP-GAN: Probabilistic 3D human motion prediction via GAN
Predicting and understanding human motion dynamics has many applications,
such as motion synthesis, augmented reality, security, and autonomous vehicles.
Due to the recent success of generative adversarial networks (GAN), there has
been much interest in probabilistic estimation and synthetic data generation
using deep neural network architectures and learning algorithms.
We propose a novel sequence-to-sequence model for probabilistic human motion
prediction, trained with a modified version of improved Wasserstein generative
adversarial networks (WGAN-GP), in which we use a custom loss function designed
for human motion prediction. Our model, which we call HP-GAN, learns a
probability density function of future human poses conditioned on previous
poses. It predicts multiple sequences of possible future human poses, each from
the same input sequence but a different vector z drawn from a random
distribution. Furthermore, to quantify the quality of the non-deterministic
predictions, we simultaneously train a motion-quality-assessment model that
learns the probability that a given skeleton sequence is a real human motion.
We test our algorithm on two of the largest skeleton datasets: NTURGB-D and
Human3.6M. We train our model on both single and multiple action types. Its
predictive power for long-term motion estimation is demonstrated by generating
multiple plausible futures of more than 30 frames from just 10 frames of input.
We show that most sequences generated from the same input have more than 50\%
probabilities of being judged as a real human sequence. We will release all the
code used in this paper to Github
Learning Human Motion Models for Long-term Predictions
We propose a new architecture for the learning of predictive spatio-temporal
motion models from data alone. Our approach, dubbed the Dropout Autoencoder
LSTM, is capable of synthesizing natural looking motion sequences over long
time horizons without catastrophic drift or motion degradation. The model
consists of two components, a 3-layer recurrent neural network to model
temporal aspects and a novel auto-encoder that is trained to implicitly recover
the spatial structure of the human skeleton via randomly removing information
about joints during training time. This Dropout Autoencoder (D-AE) is then used
to filter each predicted pose of the LSTM, reducing accumulation of error and
hence drift over time. Furthermore, we propose new evaluation protocols to
assess the quality of synthetic motion sequences even for which no ground truth
data exists. The proposed protocols can be used to assess generated sequences
of arbitrary length. Finally, we evaluate our proposed method on two of the
largest motion-capture datasets available to date and show that our model
outperforms the state-of-the-art on a variety of actions, including cyclic and
acyclic motion, and that it can produce natural looking sequences over longer
time horizons than previous methods
Unsupervised Discovery of Parts, Structure, and Dynamics
Humans easily recognize object parts and their hierarchical structure by
watching how they move; they can then predict how each part moves in the
future. In this paper, we propose a novel formulation that simultaneously
learns a hierarchical, disentangled object representation and a dynamics model
for object parts from unlabeled videos. Our Parts, Structure, and Dynamics
(PSD) model learns to, first, recognize the object parts via a layered image
representation; second, predict hierarchy via a structural descriptor that
composes low-level concepts into a hierarchical structure; and third, model the
system dynamics by predicting the future. Experiments on multiple real and
synthetic datasets demonstrate that our PSD model works well on all three
tasks: segmenting object parts, building their hierarchical structure, and
capturing their motion distributions.Comment: ICLR 2019. The first two authors contributed equally to this wor
- …