Search CORE

2,607 research outputs found

Spatiotemporal saliency for human action recognition

Author: Oikonomopoulos A
Pantic M
Patras I
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Crossref

Spiral - Imperial College Digital Repository

Excitation Backprop for RNNs

Author: Bargal Sarah Adel
Kim Donghyun
Murino Vittorio
Sclaroff Stan
Zhang Jianming
Zunino Andrea
Publication venue
Publication date: 08/03/2018
Field of study

Deep models are state-of-the-art for many vision tasks including video action recognition and video captioning. Models are trained to caption or classify activity in videos, but little is known about the evidence used to make such decisions. Grounding decisions made by deep networks has been studied in spatial visual content, giving more insight into model predictions for images. However, such studies are relatively lacking for models of spatiotemporal visual content - videos. In this work, we devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep model's classification/captioning output using the model's internal representation. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks.Comment: CVPR 2018 Camera Ready Versio

arXiv.org e-Print Archive

Crossref

Saliency guided local and global descriptors for effective action recognition

Author: Abdulmunem Ashwan
Lai Yukun
Sun Xianfang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This paper presents a novel framework for human action recognition based on salient object detection and a new combination of local and global descriptors. We first detect salient objects in video frames and only extract features for such objects. We then use a simple strategy to identify and process only those video frames that contain salient objects. Processing salient objects instead of all frames not only makes the algorithm more efficient, but more importantly also suppresses the interference of background pixels. We combine this approach with a new combination of local and global descriptors, namely 3D-SIFT and histograms of oriented optical flow (HOOF), respectively. The resulting saliency guided 3D-SIFT–HOOF (SGSH) feature is used along with a multi-class support vector machine (SVM) classifier for human action recognition. Experiments conducted on the standard KTH and UCF-Sports action benchmarks show that our new method outperforms the competing state-of-the-art spatiotemporal feature-based human action recognition metho

Crossref

Online Research @ Cardiff

Springer - Publisher Connector