Search CORE

7,147 research outputs found

Action Recognition in Still Images: Confluence of Multilinear Methods and Deep Learning

Author: Safaei Marjaneh
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

Motion is a missing information in an image, however, it is a valuable cue for action recognition. Thus, lack of motion information in a single image makes action recognition for still images inherently a very challenging problem in computer vision. In this dissertation, we show that both spatial and temporal patterns provide crucial information for recognizing human actions. Therefore, action recognition depends not only on the spatially-salient pixels, but also on the temporal patterns of those pixels. To address the challenge caused by the absence of temporal information in a single image, we introduce five effective action classification methodologies along with a new still image action recognition dataset. These include (1) proposing a new Spatial-Temporal Convolutional Neural Network, STCNN, trained by fine-tuning a CNN model, pre-trained on appearance-based classification only, over a novel latent space-time domain, named Ranked Saliency Map and Predicted Optical Flow, or RankSM-POF for short, (2) introducing a novel unsupervised Zero-shot approach based on low-rank Tensor Decomposition, named ZTD, (3) proposing the concept of temporal image, a compact representation of hypothetical sequence of images and then using it to design a new hierarchical deep learning network, TICNN, for still image action recognition, (4) introducing a dataset for STill image Action Recognition (STAR), containing over 1M images across 50 different human body-motion action categories. UCF-STAR is the largest dataset in the literature for action recognition in still images, exposing the intrinsic difficulty of action recognition through its realistic scene and action complexity. Moreover, TSSTN, a two-stream spatiotemporal network, is introduced to model the latent temporal information in a single image, and using it as prior knowledge in a two-stream deep network, (5) proposing a parallel heterogeneous meta- learning method to combine STCNN and ZTD through a stacking approach into an ensemble classifier of the proposed heterogeneous base classifiers. Altogether, this work demonstrates benefits of UCF-STAR as a large-scale still images dataset, and show the role of latent motion information in recognizing human actions in still images by presenting approaches relying on predicting temporal information, yielding higher accuracy on widely-used datasets

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Going Deeper into Action Recognition: A Survey

Author: Harandi Mehrtash
Herath Samitha
Porikli Fatih
Publication venue
Publication date: 01/01/2017
Field of study

Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

arXiv.org e-Print Archive

The Australian National University

Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

Author: Andrews Walter
Luisier Florian
Salakhutdinov Ruslan
Srivastava Nitish
Zha Shengxin
Publication venue
Publication date: 01/01/2015
Field of study

We conduct an in-depth exploration of different strategies for doing event detection in videos using convolutional neural networks (CNNs) trained for image classification. We study different ways of performing spatial and temporal pooling, feature normalization, choice of CNN layers as well as choice of classifiers. Making judicious choices along these dimensions led to a very significant increase in performance over more naive approaches that have been used till now. We evaluate our approach on the challenging TRECVID MED'14 dataset with two popular CNN architectures pretrained on ImageNet. On this MED'14 dataset, our methods, based entirely on image-trained CNN features, can outperform several state-of-the-art non-CNN models. Our proposed late fusion of CNN- and motion-based features can further increase the mean average precision (mAP) on MED'14 from 34.95% to 38.74%. The fusion approach achieves the state-of-the-art classification performance on the challenging UCF-101 dataset

arXiv.org e-Print Archive

Crossref