142,763 research outputs found

    Multiple Action Recognition for Video Games (MARViG)

    Get PDF
    Action recognition research historically has focused on increasing accuracy on datasets in highly controlled environments. Perfect or near perfect offline action recognition accuracy on scripted datasets has been achieved. The aim of this thesis is to deal with the more complex problem of online action recognition with low latency in real world scenarios. To fulfil this aim two new multi-modal gaming datasets were captured and three novel algorithms for online action recognition were proposed. Two new gaming datasets, G3D and G3Di for real-time action recognition with multiple actions and multi-modal data were captured and publicly released. Furthermore, G3Di was captured using a novel game-sourcing method so the actions are realistic. Three novel algorithms for online action recognition with low latency were proposed. Firstly, Dynamic Feature Selection, which combines the discriminative power of Random Forests for feature selection with an ensemble of AdaBoost classifiers for dynamic classification. Secondly, Clustered Spatio-Temporal Manifolds, which modelled the dynamics of human actions with style invariant action templates that were combined with Dynamic Time Warping for execution rate invariance. Finally, a Hierarchical Transfer Learning framework, comprised of a novel transfer learning algorithm to detect compound actions in addition to hierarchical interaction detection to recognise the actions and interactions of multiple subjects. The proposed algorithms run in real-time with low latency ensuring they are suitable for a wide range of natural user interface applications including gaming. State-of-the art results were achieved for online action recognition. Experimental results indicate higher complexity of the G3Di dataset in comparison to the existing gaming datasets, highlighting the importance of this dataset for designing algorithms suitable for realistic interactive applications. This thesis has advanced the study of realistic action recognition and is expected to serve as a basis for further study within the research community

    Video-based similar gesture action recognition using deep learning and GAN-based approaches

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Human action is not merely a matter of presenting patterns of motion of different parts of the body, in addition, it is also a description of intention, emotion and thoughts of the person. Hence, it has become a crucial component in human behavior analysis and understanding. Human action recognition has a wide variety of applications such as surveillance, robotics, health care, video searching and human-computer interaction. Analysing human actions manually is tedious and easily prone to errors. Therefore, computer scientists have been trying to bring the abilities of cognitive video understanding to human action recognition systems by using computer vision techniques. However, human action recognition is a complex task in computer vision because of the camera motion, occlusion, background cluttering, viewpoint variation, execution rate and similar gestures. These challenges significantly degrade the performance of the human action recognition system. The purpose of this research is to propose solutions based on traditional machine learning methods as well as the state-of-the-art deep learning methods to automatically process video-based human action recognition. This thesis investigates three research areas of video-based human action recognition: traditional human action recognition, similar gesture action recognition, and data augmentation for human action recognition. To start with, the feature-based methods using classic machine learning algorithms have been studied. Recently, deep convolutional neural networks (CNN) have taken their place in the computer vision and human action recognition research areas and have achieved tremendous success in comparison to traditional machine learning techniques. Current state-of-the-art deep convolutional neural networks were used for the human action recognition task. Furthermore, recurrent neural networks (RNN) and its variation of long-short term memory (LSTM) are used to process the time series features which are handcrafted features or extracted from the CNN. However, these methods suffer from similar gestures, which appear in the human action videos. Thus, a hierarchical classification framework is proposed for similar gesture action recognition, and the performance is improved by the multi-stage classification approach. Additionally, the framework has been modified into an end-to-end system, therefore, the similar gestures can be processed by the system automatically. In this study, a novel data augmentation framework for action recognition has been proposed, the objective is to generate well learnt video frames from action videos which can enlarge the dataset size as well as the feature bias. It is very important for a human action recognition system to recognize the actions with similar gestures as accurately as possible. For such a system, a generative adversarial net (GAN) is applied to learn the original video datasets and generate video frames by playing an adversarial game. Furthermore, a framework is developed for classifying the original dataset in the first place to obtain the confusion matrix using a CNN. The similar gesture actions will be paired based on the confusion matrix results. The final classification result will be applied on the fusion dataset which contains both original and generated video frames. This study will provide realtime and practical solutions for autonomous human action recognition system. The analysis of similar gesture actions will improve the performance of the existing CNN-based approaches. In addition, the GAN-based approaches from computer vision have been applied to the graph embedding area, because graph embedding is similar to image embedding but used for different purposes. Unlike the purpose of the GAN in computer vision for generating the images, the GAN in graph embedding can be used to regularize the embedding. So the proposed methods are able to reconstruct both structural characteristics and node features, which naturally possess the interaction between these two sources of information while learning the embedding

    Hierarchical Attention Network for Action Segmentation

    Full text link
    The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the capacity to effectively map the temporal relationships in between the frames as they only capture a limited span of temporal dependencies. To this end we propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time, thus improving the overall segmentation performance. The proposed hierarchical recurrent attention framework analyses the input video at multiple temporal scales, to form embeddings at frame level and segment level, and perform fine-grained action segmentation. This generates a simple, lightweight, yet extremely effective architecture for segmenting continuous video streams and has multiple application domains. We evaluate our system on multiple challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets, and achieves state-of-the-art performance. The evaluated datasets encompass numerous video capture settings which are inclusive of static overhead camera views and dynamic, ego-centric head-mounted camera views, demonstrating the direct applicability of the proposed framework in a variety of settings.Comment: Published in Pattern Recognition Letter
    • …
    corecore