Search CORE

93 research outputs found

VideoLSTM Convolves, Attends and Flows for Action Recognition

Author: Gavrilyuk K.
Gavves E.
Jain M.
Li Z.
Snoek C.G.M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Exploring locality for visual recognition

Author: Li Z.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Memory-Augmented Temporal Dynamic Learning for Action Recognition

Author: Wang Dong
Wang Qi
Yuan Yuan
Publication venue
Publication date: 30/04/2019
Field of study

Human actions captured in video sequences contain two crucial factors for action recognition, i.e., visual appearance and motion dynamics. To model these two aspects, Convolutional and Recurrent Neural Networks (CNNs and RNNs) are adopted in most existing successful methods for recognizing actions. However, CNN based methods are limited in modeling long-term motion dynamics. RNNs are able to learn temporal motion dynamics but lack effective ways to tackle unsteady dynamics in long-duration motion. In this work, we propose a memory-augmented temporal dynamic learning network, which learns to write the most evident information into an external memory module and ignore irrelevant ones. In particular, we present a differential memory controller to make a discrete decision on whether the external memory module should be updated with current feature. The discrete memory controller takes in the memory history, context embedding and current feature as inputs and controls information flow into the external memory module. Additionally, we train this discrete memory controller using straight-through estimator. We evaluate this end-to-end system on benchmark datasets (UCF101 and HMDB51) of human action recognition. The experimental results show consistent improvements on both datasets over prior works and our baselines.Comment: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Cross-Modal Message Passing for Two-stream Fusion

Author: Wang Dong
Wang Qi
Yuan Yuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/04/2019
Field of study

Processing and fusing information among multi-modal is a very useful technique for achieving high performance in many computer vision problems. In order to tackle multi-modal information more effectively, we introduce a novel framework for multi-modal fusion: Cross-modal Message Passing (CMMP). Specifically, we propose a cross-modal message passing mechanism to fuse two-stream network for action recognition, which composes of an appearance modal network (RGB image) and a motion modal (optical flow image) network. The objectives of individual networks in this framework are two-fold: a standard classification objective and a competing objective. The classification object ensures that each modal network predicts the true action category while the competing objective encourages each modal network to outperform the other one. We quantitatively show that the proposed CMMP fuses the traditional two-stream network more effectively, and outperforms all existing two-stream fusion method on UCF-101 and HMDB-51 datasets.Comment: 2018 IEEE International Conference on Acoustics, Speech and Signal Processin

arXiv.org e-Print Archive

Crossref