Search CORE

1,525 research outputs found

An Improvements of Deep Learner Based Human Activity Recognition with the Aid of Graph Convolution Features

Author: Radha N.
Srilakshmi N.
Publication venue: Auricle Global Society of Education and Research
Publication date: 07/10/2023
Field of study

Many researchers are now focusing on Human Action Recognition (HAR), which is based on various deep-learning features related to body joints and their trajectories from videos. Among many schemes, Joints and Trajectory-pooled 3D-Deep Geometric Positional Attention-based Hierarchical Bidirectional Recurrent convolutional Descriptors (JTDGPAHBRD) can provide a video descriptor by learning geometric features and trajectories of the body joints. But the spatial-temporal dynamics of the different geometric features of the skeleton structure were not explored deeper. To solve this problem, this article develops the Graph Convolutional Network (GCN) in addition to the JTDGPAHBRD to create a video descriptor for HAR. The GCN can obtain complementary information, such as higher-level spatial-temporal features, between consecutive frames for enhancing end-to-end learning. In addition, to improve feature representation ability, a search space with several adaptive graph components is created. Then, a sampling and computation-effective evolution scheme are applied to explore this space. Moreover, the resultant GCN provides the temporal dynamics of the skeleton pattern, which are fused with the geometric features of the skeleton body joints and trajectory coordinates from the JTDGPAHBRD to create a more effective video descriptor for HAR. Finally, extensive experiments show that the JTDGPAHBRD-GCN model outperforms the existing HAR models on the Penn Action Dataset (PAD)

International Journal on Recent and Innovation Trends in Computing and Communication

3D PersonVLAD: Learning Deep Global Representations for Video-based Person Re-identification

Author: Shao Ling
Wang Meng
Wang Yang
Wu Lin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/02/2019
Field of study

In this paper, we introduce a global video representation to video-based person re-identification (re-ID) that aggregates local 3D features across the entire video extent. Most of the existing methods rely on 2D convolutional networks (ConvNets) to extract frame-wise deep features which are pooled temporally to generate the video-level representations. However, 2D ConvNets lose temporal input information immediately after the convolution, and a separate temporal pooling is limited in capturing human motion in shorter sequences. To this end, we present a \textit{global} video representation (3D PersonVLAD), complementary to 3D ConvNets as a novel layer to capture the appearance and motion dynamics in full-length videos. However, encoding each video frame in its entirety and computing an aggregate global representation across all frames is tremendously challenging due to occlusions and misalignments. To resolve this, our proposed network is further augmented with 3D part alignment module to learn local features through soft-attention module. These attended features are statistically aggregated to yield identity-discriminative representations. Our global 3D features are demonstrated to achieve state-of-the-art results on three benchmark datasets: MARS \cite{MARS}, iLIDS-VID \cite{VideoRanking}, and PRID 2011Comment: Accepted to appear at IEEE Transactions on Neural Networks and Learning System

arXiv.org e-Print Archive

University of Queensland eSpace

Discriminatively Trained Latent Ordinal Model for Video Classification

Author: Sharma Gaurav
Sikka Karan
Publication venue
Publication date: 14/08/2017
Field of study

We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

arXiv.org e-Print Archive

MPG.PuRe