Search CORE

26 research outputs found

Part Affinity Field based Activity Recognition

Author: Golda Thomas
Publication venue: KIT Scientific Publishing
Publication date: 01/01/2020
Field of study

This report presents work and results on Activity Recognition using Part Affinity Fields for real-time surveillance applications. Starting with a short introduction to the motivation, this report gives a detailed overview over the key idea of the pursued approach and explains the basic ideas. In addition a variety of experiments on various subjects are presented, like i) the impact of the number of input frames, ii) the impact of different simple dimensionality reduction approaches, and iii) a comparison on how multi-class and binary problem formulation influence the performance

KITopen

Improved Actor Relation Graph based Group Activity Recognition

Author: Kuang Zijian
Tie Xinran
Publication venue
Publication date: 29/12/2020
Field of study

Video understanding is to recognize and classify different actions or activities appearing in the video. A lot of previous work, such as video captioning, has shown promising performance in producing general video understanding. However, it is still challenging to generate a fine-grained description of human actions and their interactions using state-of-the-art video captioning techniques. The detailed description of human actions and group activities is essential information, which can be used in real-time CCTV video surveillance, health care, sports video analysis, etc. This study proposes a video understanding method that mainly focused on group activity recognition by learning the pair-wise actor appearance similarity and actor positions. We propose to use Normalized cross-correlation (NCC) and the sum of absolute differences (SAD) to calculate the pair-wise appearance similarity and build the actor relationship graph to allow the graph convolution network to learn how to classify group activities. We also propose to use MobileNet as the backbone to extract features from each video frame. A visualization model is further introduced to visualize each input video frame with predicted bounding boxes on each human object and predict individual action and collective activity

arXiv.org e-Print Archive

Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos

Author: A Newell
B Xiao
D Yu
G Ning
L Bertinetto
M Kocabas
T-Y Lin
Z Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/06/2020
Field of study

Video annotation is expensive and time consuming. Consequently, datasets for multi-person pose estimation and tracking are less diverse and have more sparse annotations compared to large scale image datasets for human pose estimation. This makes it challenging to learn deep learning based models for associating keypoints across frames that are robust to nuisance factors such as motion blur and occlusions for the task of multi-person pose tracking. To address this issue, we propose an approach that relies on keypoint correspondences for associating persons in videos. Instead of training the network for estimating keypoint correspondences on video data, it is trained on a large scale image datasets for human pose estimation using self-supervision. Combined with a top-down framework for human pose estimation, we use keypoints correspondences to (i) recover missed pose detections (ii) associate pose detections across video frames. Our approach achieves state-of-the-art results for multi-frame pose estimation and multi-person pose tracking on the PosTrack

2017

and PoseTrack

2018

data sets.Comment: Submitted to ECCV 202

arXiv.org e-Print Archive

Crossref