148,947 research outputs found
Differential Recurrent Neural Networks for Action Recognition
The long short-term memory (LSTM) neural network is capable of processing
complex sequential information since it utilizes special gating schemes for
learning representations from long input sequences. It has the potential to
model any sequential time-series data, where the current hidden state has to be
considered in the context of the past hidden states. This property makes LSTM
an ideal choice to learn the complex dynamics of various actions.
Unfortunately, the conventional LSTMs do not consider the impact of
spatio-temporal dynamics corresponding to the given salient motion patterns,
when they gate the information that ought to be memorized through time. To
address this problem, we propose a differential gating scheme for the LSTM
neural network, which emphasizes on the change in information gain caused by
the salient motions between the successive frames. This change in information
gain is quantified by Derivative of States (DoS), and thus the proposed LSTM
model is termed as differential Recurrent Neural Network (dRNN). We demonstrate
the effectiveness of the proposed model by automatically recognizing actions
from the real-world 2D and 3D human action datasets. Our study is one of the
first works towards demonstrating the potential of learning complex time-series
representations via high-order derivatives of states
Detect-and-Track: Efficient Pose Estimation in Videos
This paper addresses the problem of estimating and tracking human body
keypoints in complex, multi-person video. We propose an extremely lightweight
yet highly effective approach that builds upon the latest advancements in human
detection and video understanding. Our method operates in two-stages: keypoint
estimation in frames or short clips, followed by lightweight tracking to
generate keypoint predictions linked over the entire video. For frame-level
pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D
extension of this model, which leverages temporal information over small clips
to generate more robust frame predictions. We conduct extensive ablative
experiments on the newly released multi-person video pose estimation benchmark,
PoseTrack, to validate various design choices of our model. Our approach
achieves an accuracy of 55.2% on the validation and 51.8% on the test set using
the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art
performance on the ICCV 2017 PoseTrack keypoint tracking challenge.Comment: In CVPR 2018. Ranked first in ICCV 2017 PoseTrack challenge (keypoint
tracking in videos). Code: https://github.com/facebookresearch/DetectAndTrack
and webpage: https://rohitgirdhar.github.io/DetectAndTrack
Learning Human Motion Models for Long-term Predictions
We propose a new architecture for the learning of predictive spatio-temporal
motion models from data alone. Our approach, dubbed the Dropout Autoencoder
LSTM, is capable of synthesizing natural looking motion sequences over long
time horizons without catastrophic drift or motion degradation. The model
consists of two components, a 3-layer recurrent neural network to model
temporal aspects and a novel auto-encoder that is trained to implicitly recover
the spatial structure of the human skeleton via randomly removing information
about joints during training time. This Dropout Autoencoder (D-AE) is then used
to filter each predicted pose of the LSTM, reducing accumulation of error and
hence drift over time. Furthermore, we propose new evaluation protocols to
assess the quality of synthetic motion sequences even for which no ground truth
data exists. The proposed protocols can be used to assess generated sequences
of arbitrary length. Finally, we evaluate our proposed method on two of the
largest motion-capture datasets available to date and show that our model
outperforms the state-of-the-art on a variety of actions, including cyclic and
acyclic motion, and that it can produce natural looking sequences over longer
time horizons than previous methods
Human activity recognition making use of long short-term memory techniques
The optimisation and validation of a classifiers performance when applied to real
world problems is not always effectively shown. In much of the literature describing
the application of artificial neural network architectures to Human Activity
Recognition (HAR) problems, postural transitions are grouped together and treated as
a singular class. This paper proposes, investigates and validates the development of
an optimised artificial neural network based on Long-Short Term Memory techniques
(LSTM), with repeated cross validation used to validate the performance of the
classifier. The results of the optimised LSTM classifier are comparable or better to
that of previous research making use of the same dataset, achieving 95% accuracy
under repeated 10-fold cross validation using grouped postural transitions. The work
in this paper also achieves 94% accuracy under repeated 10-fold cross validation
whilst treating each common postural transition as a separate class (and thus
providing more context to each activity)
- …