1,525 research outputs found
An Improvements of Deep Learner Based Human Activity Recognition with the Aid of Graph Convolution Features
Many researchers are now focusing on Human Action Recognition (HAR), which is based on various deep-learning features related to body joints and their trajectories from videos. Among many schemes, Joints and Trajectory-pooled 3D-Deep Geometric Positional Attention-based Hierarchical Bidirectional Recurrent convolutional Descriptors (JTDGPAHBRD) can provide a video descriptor by learning geometric features and trajectories of the body joints. But the spatial-temporal dynamics of the different geometric features of the skeleton structure were not explored deeper. To solve this problem, this article develops the Graph Convolutional Network (GCN) in addition to the JTDGPAHBRD to create a video descriptor for HAR. The GCN can obtain complementary information, such as higher-level spatial-temporal features, between consecutive frames for enhancing end-to-end learning. In addition, to improve feature representation ability, a search space with several adaptive graph components is created. Then, a sampling and computation-effective evolution scheme are applied to explore this space. Moreover, the resultant GCN provides the temporal dynamics of the skeleton pattern, which are fused with the geometric features of the skeleton body joints and trajectory coordinates from the JTDGPAHBRD to create a more effective video descriptor for HAR. Finally, extensive experiments show that the JTDGPAHBRD-GCN model outperforms the existing HAR models on the Penn Action Dataset (PAD)
3D PersonVLAD: Learning Deep Global Representations for Video-based Person Re-identification
In this paper, we introduce a global video representation to video-based
person re-identification (re-ID) that aggregates local 3D features across the
entire video extent. Most of the existing methods rely on 2D convolutional
networks (ConvNets) to extract frame-wise deep features which are pooled
temporally to generate the video-level representations. However, 2D ConvNets
lose temporal input information immediately after the convolution, and a
separate temporal pooling is limited in capturing human motion in shorter
sequences. To this end, we present a \textit{global} video representation (3D
PersonVLAD), complementary to 3D ConvNets as a novel layer to capture the
appearance and motion dynamics in full-length videos. However, encoding each
video frame in its entirety and computing an aggregate global representation
across all frames is tremendously challenging due to occlusions and
misalignments. To resolve this, our proposed network is further augmented with
3D part alignment module to learn local features through soft-attention module.
These attended features are statistically aggregated to yield
identity-discriminative representations. Our global 3D features are
demonstrated to achieve state-of-the-art results on three benchmark datasets:
MARS \cite{MARS}, iLIDS-VID \cite{VideoRanking}, and PRID 2011Comment: Accepted to appear at IEEE Transactions on Neural Networks and
Learning System
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
- …