Search CORE

7,115 research outputs found

Action recognition based on efficient deep feature learning in the spatio-temporal domain

Author: Dellen Babette
Husain Syed Farzad
Torras Carme
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Improving a 3-D Convolutional Neural Network Model Reinvented from VGG16 with Batch Normalization

Author: Malekmohamadi Hossein
Pattanajak Nontawat
Publication venue
Publication date: 26/04/2019
Field of study

It is challenging to build and train a Convolutional Neural Network model that can achieve a high accuracy rate for the first time. There are many variables to consider such as initial parameters, learning rate, and batch size. Unsuccessfully training a model is one of the most inevitable problems. In some cases, the model struggles to find a lower Loss Function value which results in a poor performance. Batch Normalization is considered as a remedy to overcome this problem. In this paper, two models reinvented from VGG16 are created with and without using Batch Normalization to evaluate their model performance. It is clear that the model using Batch Normalization provides a better result in terms of Loss Function value and model accuracy, which also achieves a very high accuracy rate. It also reaches the saturation point of the highest model accuracy faster than the model without Batch Normalization. This paper also finds that the accuracy of 3D Convolutional Neural Network model reinvented from VGG16 with Batch Normalization is at 91.2% which can beat many benchmarking results on UCF101 such as IDT [5], Two-Stream [10], and Dynamic Image Networks IDT [4]. The technique introduced in this paper shows a fast, reliable and accurate estimation of human activity type and could be used in smart environments

Crossref

De Montfort University Open Research Archive

Activity Recognition based on a Magnitude-Orientation Stream Network

Author: Caetano Carlos
de Melo Victor H. C.
Santos Jefersson A. dos
Schwartz William Robson
Publication venue
Publication date: 22/08/2017
Field of study

The temporal component of videos provides an important clue for activity recognition, as a number of activities can be reliably recognized based on the motion information. In view of that, this work proposes a novel temporal stream for two-stream convolutional networks based on images computed from the optical flow magnitude and orientation, named Magnitude-Orientation Stream (MOS), to learn the motion in a better and richer manner. Our method applies simple nonlinear transformations on the vertical and horizontal components of the optical flow to generate input images for the temporal stream. Experimental results, carried on two well-known datasets (HMDB51 and UCF101), demonstrate that using our proposed temporal stream as input to existing neural network architectures can improve their performance for activity recognition. Results demonstrate that our temporal stream provides complementary information able to improve the classical two-stream methods, indicating the suitability of our approach to be used as a temporal video representation.Comment: 8 pages, SIBGRAPI 201

arXiv.org e-Print Archive

Crossref

Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks

Author: Adali Sancar
Jiang Zhuolin
Rozgic Viktor
Publication venue
Publication date: 18/05/2017
Field of study

Infrared (IR) imaging has the potential to enable more robust action recognition systems compared to visible spectrum cameras due to lower sensitivity to lighting conditions and appearance variability. While the action recognition task on videos collected from visible spectrum imaging has received much attention, action recognition in IR videos is significantly less explored. Our objective is to exploit imaging data in this modality for the action recognition task. In this work, we propose a novel two-stream 3D convolutional neural network (CNN) architecture by introducing the discriminative code layer and the corresponding discriminative code loss function. The proposed network processes IR image and the IR-based optical flow field sequences. We pretrain the 3D CNN model on the visible spectrum Sports-1M action dataset and finetune it on the Infrared Action Recognition (InfAR) dataset. To our best knowledge, this is the first application of the 3D CNN to action recognition in the IR domain. We conduct an elaborate analysis of different fusion schemes (weighted average, single and double-layer neural nets) applied to different 3D CNN outputs. Experimental results demonstrate that our approach can achieve state-of-the-art average precision (AP) performances on the InfAR dataset: (1) the proposed two-stream 3D CNN achieves the best reported 77.5% AP, and (2) our 3D CNN model applied to the optical flow fields achieves the best reported single stream 75.42% AP

arXiv.org e-Print Archive

Crossref

Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition

Author: Al-Faris Mahmoud
Chiverton John P.
Ndzi David L.
Yang Yanyan
Publication venue
Publication date: 12/04/2019
Field of study

Human action recognition remains an important yet challenging task. This work proposes a novel action recognition system. It uses a novel Multiple View Region Adaptive Multi-resolution in time Depth Motion Map (MV-RAMDMM) formulation combined with appearance information. Multiple stream 3D Convolutional Neural Networks (CNNs) are trained on the different views and time resolutions of the region adaptive Depth Motion Maps. Multiple views are synthesised to enhance the view invariance. The region adaptive weights, based on localised motion, accentuate and differentiate parts of actions possessing faster motion. Dedicated 3D CNN streams for multi-time resolution appearance information (RGB) are also included. These help to identify and differentiate between small object interactions. A pre-trained 3D-CNN is used here with fine-tuning for each stream along with multiple class Support Vector Machines (SVM)s. Average score fusion is used on the output. The developed approach is capable of recognising both human action and human-object interaction. Three public domain datasets including: MSR 3D Action,Northwestern UCLA multi-view actions and MSR 3D daily activity are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the-art algorithms.Comment: 14 pages, 6 figures, 13 tables. Submitte

arXiv.org e-Print Archive

Research Repository and Portal - University of the West of Scotland