Search CORE

414,155 research outputs found

Multi-Label Zero-Shot Human Action Recognition via Joint Latent Ranking Embedding

Author: Chen Ke
Wang Qian
Publication venue
Publication date: 06/02/2019
Field of study

Human action recognition refers to automatic recognizing human actions from a video clip. In reality, there often exist multiple human actions in a video stream. Such a video stream is often weakly-annotated with a set of relevant human action labels at a global level rather than assigning each label to a specific video episode corresponding to a single action, which leads to a multi-label learning problem. Furthermore, there are many meaningful human actions in reality but it would be extremely difficult to collect/annotate video clips regarding all of various human actions, which leads to a zero-shot learning scenario. To the best of our knowledge, there is no work that has addressed all the above issues together in human action recognition. In this paper, we formulate a real-world human action recognition task as a multi-label zero-shot learning problem and propose a framework to tackle this problem in a holistic way. Our framework holistically tackles the issue of unknown temporal boundaries between different actions for multi-label learning and exploits the side information regarding the semantic relationship between different human actions for knowledge transfer. Consequently, our framework leads to a joint latent ranking embedding for multi-label zero-shot human action recognition. A novel neural architecture of two component models and an alternate learning algorithm are proposed to carry out the joint latent ranking embedding learning. Thus, multi-label zero-shot recognition is done by measuring relatedness scores of action labels to a test video clip in the joint latent visual and semantic embedding spaces. We evaluate our framework with different settings, including a novel data split scheme designed especially for evaluating multi-label zero-shot learning, on two datasets: Breakfast and Charades. The experimental results demonstrate the effectiveness of our framework.Comment: 27 pages, 10 figures and 7 tables. Technical report submitted to a journal. More experimental results/references were added and typos were correcte

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Human Action Recognition in Videos Using Transfer Learning

Author: Delany Sarah Jane
Huang Kaiqiang
McKeever Susan
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2019
Field of study

A variety of systems focus on detecting the actions and activities performed by humans, such as video surveillance and health monitoring systems. However, published labelled human action datasets for training supervised machine learning models are limited in number and expensive to produce. The use of transfer learning for the task of action recognition can help to address this issue by transferring or re-using the knowledge of existing trained models, in combination with minimal training data from the new target domain. Our focus in this paper is an investigation of video feature representations and machine learning algorithms for transfer learning for the task of action recognition in videos in a multi-class environment. Using four labelled datasets from the human action domain, we apply two SVM-based transfer-learning algorithms: adaptive support vector machine (A-SVM) and projective model transfer SVM (PMT-SVM). For feature representations, we compare the performance of two widely used video feature representations: space-time interest points (STIP) with Histograms of Oriented Gradients (HOG) and Histograms of Optical Flow (HOF), and improved dense trajectory (iDT) to explore which feature is more suitable for action recognition from videos using transfer learning. Our results show that A-SVM and PMT-SVM can help transfer action knowledge across multiple datasets with limited labelled training data; A-SVM outperforms PMT-SVM when the target dataset is derived from realistic non-lab environments; iDT has a greater ability to perform transfer learning in action recognition

Arrow@TUDublin

TransNet: A Transfer Learning-Based Network for Human Action Recognition

Author: Alomar K.
Cai X.
Publication venue
Publication date: 13/09/2023
Field of study

Human action recognition (HAR) is a high-level and significant research area in computer vision due to its ubiquitous applications. The main limitations of the current HAR models are their complex structures and lengthy training time. In this paper, we propose a simple yet versatile and effective end-to-end deep learning architecture, coined as TransNet, for HAR. TransNet decomposes the complex 3D-CNNs into 2D- and 1D-CNNs, where the 2D- and 1D-CNN components extract spatial features and temporal patterns in videos, respectively. Benefiting from its concise architecture, TransNet is ideally compatible with any pretrained state-of-the-art 2D-CNN models in other fields, being transferred to serve the HAR task. In other words, it naturally leverages the power and success of transfer learning for HAR, bringing huge advantages in terms of efficiency and effectiveness. Extensive experimental results and the comparison with the state-of-the-art models demonstrate the superior performance of the proposed TransNet in HAR in terms of flexibility, model complexity, training speed and classification accuracy

arXiv.org e-Print Archive

Hierarchical transfer learning for online recognition of compound actions

Author: Aggarwal
Bloom
Bloom
Bloom
Bloom
Cao
Card
Chaaraoui
Darby
Dimitrios Makris
Farhadi
Fothergill
Han
Han
Kanungo
Kviatkovsky
Laptev
Lewandowski
Liu
Liu
Liu
Ma
Moutzouris
Nowozin
Pan
Raskin
Rodriguez
Schuldt
Shotton
Song
Tian
Vasileios Argyriou
Victoria Bloom
Wang
Weinland
Zhao
Publication venue: 'Elsevier BV'
Publication date: 17/12/2015
Field of study

Recognising human actions in real-time can provide users with a natural user interface (NUI) enabling a range of innovative and immersive applications. A NUI application should not restrict users’ movements; it should allow users to transition between actions in quick succession, which we term as compound actions. However, the majority of action recognition researchers have focused on individual actions, so their approaches are limited to recognising single actions or multiple actions that are temporally separated. This paper proposes a novel online action recognition method for fast detection of compound actions. A key contribution is our hierarchical body model that can be automatically configured to detect actions based on the low level body parts that are the most discriminative for a particular action. Another key contribution is a transfer learning strategy to allow the tasks of action segmentation and whole body modelling to be performed on a related but simpler dataset, combined with automatic hierarchical body model adaption on a more complex target dataset. Experimental results on a challenging and realistic dataset show an improvement in action recognition performance of 16% due to the introduction of our hierarchical transfer learning. The proposed algorithm is fast with an average latency of just 2 frames (66ms) and outperforms state of the art action recognition algorithms that are capable of fast online action recognition

Crossref

Coventry University Pure Portal

Kingston University Research Repository

SVMDnet: A Novel Framework for Elderly Activity Recognition based on Transfer Learning

Author: Gupta Shikha
Nagpal Diana
Publication venue: Auricle Global Society of Education and Research
Publication date: 21/09/2023
Field of study

Elderly Activity Recognition has become very crucial now-a-days because majority of elderly people are living alone and are vulnerable. Despite the fact that several researchers employ ML (machine learning) and DL (deep learning) techniques to recognize elderly actions, relatively lesser research specifically aimed on transfer learning based elderly activity recognition. Even transfer learning is not sufficient to handle the complexity levels in the HAR related problems because it is a more general approach. A novel transfer leaning based framework SVMDnet is proposed in which pre-trained deep neural network extracts essential action features and to classify actions, Support Vector Machine (SVM) is used as a classifier. The proposed model is evaluated on Stanford-40 Dataset and self-made dataset. The older volunteers over the age of 60 were recruited for the main dataset, which was compiled from their responses in a uniform environment with 10 kinds of activities. Results from SVMDnet on the two datasets shows that our model behaves well with human recognition and human-object interactions as well

International Journal on Recent and Innovation Trends in Computing and Communication

Dual viewpoint passenger state classification using 3D CNNs

Author: Bhalerao Abhir
Griffiths Nathan
Mouzakitis Alexandros
Muñoz Delgado Mauricio
Popham Thomas
Thomason Alasdair
Tu Ian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2018
Field of study

The rise of intelligent vehicle systems will lead to more human-machine interactions and so there is a need to create a bridge between the system and the actions and behaviours of the people inside the vehicle. In this paper, we propose a dual camera setup to monitor the actions and behaviour of vehicle passengers and a deep learning architecture which can utilise video data to classify a range of actions. The method incorporates two different views as input to a 3D convolutional network and uses transfer learning from other action recognition data. The performance of this method is evaluated using an in-vehicle dataset, which contains video recordings of people performing a range of common in-vehicle actions. We show that the combination of transfer learning and using dual viewpoints in a 3D action recognition network offers an increase in classification accuracy of action classes with distinct poses, e.g. mobile phone use and sleeping, whilst it does not apply as well for classifying those actions with small movements, such as talking and eating

Crossref

Warwick Research Archives Portal Repository

Ensembles of Deep Neural Networks for Action Recognition in Still Images

Author: Majelan Sina Ghofrani
Mohammadi Sina
Shokouhi Shahriar B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/03/2020
Field of study

Despite the fact that notable improvements have been made recently in the field of feature extraction and classification, human action recognition is still challenging, especially in images, in which, unlike videos, there is no motion. Thus, the methods proposed for recognizing human actions in videos cannot be applied to still images. A big challenge in action recognition in still images is the lack of large enough datasets, which is problematic for training deep Convolutional Neural Networks (CNNs) due to the overfitting issue. In this paper, by taking advantage of pre-trained CNNs, we employ the transfer learning technique to tackle the lack of massive labeled action recognition datasets. Furthermore, since the last layer of the CNN has class-specific information, we apply an attention mechanism on the output feature maps of the CNN to extract more discriminative and powerful features for classification of human actions. Moreover, we use eight different pre-trained CNNs in our framework and investigate their performance on Stanford 40 dataset. Finally, we propose using the Ensemble Learning technique to enhance the overall accuracy of action classification by combining the predictions of multiple models. The best setting of our method is able to achieve 93.17

\%

accuracy on the Stanford 40 dataset.Comment: 5 pages, 2 figures, 3 tables, Accepted by ICCKE 201

arXiv.org e-Print Archive

Crossref

When Kernel Methods meet Feature Learning: Log-Covariance Network for Action Recognition from Skeletal Data

Author: Cavazza Jacopo
Morerio Pietro
Murino Vittorio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/08/2017
Field of study

Human action recognition from skeletal data is a hot research topic and important in many open domain applications of computer vision, thanks to recently introduced 3D sensors. In the literature, naive methods simply transfer off-the-shelf techniques from video to the skeletal representation. However, the current state-of-the-art is contended between to different paradigms: kernel-based methods and feature learning with (recurrent) neural networks. Both approaches show strong performances, yet they exhibit heavy, but complementary, drawbacks. Motivated by this fact, our work aims at combining together the best of the two paradigms, by proposing an approach where a shallow network is fed with a covariance representation. Our intuition is that, as long as the dynamics is effectively modeled, there is no need for the classification network to be deep nor recurrent in order to score favorably. We validate this hypothesis in a broad experimental analysis over 6 publicly available datasets.Comment: 2017 IEEE Computer Vision and Pattern Recognition (CVPR) Workshop

arXiv.org e-Print Archive

Crossref