Search CORE

1,144 research outputs found

Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition

Author: Li Zechao
Qi Guo-Jun
Shu Xiangbo
Song Yan
Tang Jinhui
Zhang Liyan
Publication venue
Publication date: 03/06/2017
Field of study

Recently, Long Short-Term Memory (LSTM) has become a popular choice to model individual dynamics for single-person action recognition due to its ability of modeling the temporal information in various ranges of dynamic contexts. However, existing RNN models only focus on capturing the temporal dynamics of the person-person interactions by naively combining the activity dynamics of individuals or modeling them as a whole. This neglects the inter-related dynamics of how person-person interactions change over time. To this end, we propose a novel Concurrence-Aware Long Short-Term Sub-Memories (Co-LSTSM) to model the long-term inter-related dynamics between two interacting people on the bounding boxes covering people. Specifically, for each frame, two sub-memory units store individual motion information, while a concurrent LSTM unit selectively integrates and stores inter-related motion information between interacting people from these two sub-memory units via a new co-memory cell. Experimental results on the BIT and UT datasets show the superiority of Co-LSTSM compared with the state-of-the-art methods

arXiv.org e-Print Archive

Actor-Transformers for Group Activity Recognition

Author: Gavrilyuk Kirill
Javan Mehrsan
Sanford Ryan
Snoek Cees G. M.
Publication venue
Publication date: 01/01/2020
Field of study

This paper strives to recognize individual actions and group activities from videos. While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition. We feed the transformer with rich actor-specific static and dynamic representations expressed by features from a 2D pose network and 3D CNN, respectively. We empirically study different ways to combine these representations and show their complementary benefits. Experiments show what is important to transform and how it should be transformed. What is more, actor-transformers achieve state-of-the-art results on two publicly available benchmarks for group activity recognition, outperforming the previous best published results by a considerable margin.Comment: CVPR 202

arXiv.org e-Print Archive

UvA-DARE