24,664 research outputs found

    Hierarchical transfer learning for online recognition of compound actions

    Get PDF
    Recognising human actions in real-time can provide users with a natural user interface (NUI) enabling a range of innovative and immersive applications. A NUI application should not restrict users’ movements; it should allow users to transition between actions in quick succession, which we term as compound actions. However, the majority of action recognition researchers have focused on individual actions, so their approaches are limited to recognising single actions or multiple actions that are temporally separated. This paper proposes a novel online action recognition method for fast detection of compound actions. A key contribution is our hierarchical body model that can be automatically configured to detect actions based on the low level body parts that are the most discriminative for a particular action. Another key contribution is a transfer learning strategy to allow the tasks of action segmentation and whole body modelling to be performed on a related but simpler dataset, combined with automatic hierarchical body model adaption on a more complex target dataset. Experimental results on a challenging and realistic dataset show an improvement in action recognition performance of 16% due to the introduction of our hierarchical transfer learning. The proposed algorithm is fast with an average latency of just 2 frames (66ms) and outperforms state of the art action recognition algorithms that are capable of fast online action recognition

    Multiple Action Recognition for Video Games (MARViG)

    Get PDF
    Action recognition research historically has focused on increasing accuracy on datasets in highly controlled environments. Perfect or near perfect offline action recognition accuracy on scripted datasets has been achieved. The aim of this thesis is to deal with the more complex problem of online action recognition with low latency in real world scenarios. To fulfil this aim two new multi-modal gaming datasets were captured and three novel algorithms for online action recognition were proposed. Two new gaming datasets, G3D and G3Di for real-time action recognition with multiple actions and multi-modal data were captured and publicly released. Furthermore, G3Di was captured using a novel game-sourcing method so the actions are realistic. Three novel algorithms for online action recognition with low latency were proposed. Firstly, Dynamic Feature Selection, which combines the discriminative power of Random Forests for feature selection with an ensemble of AdaBoost classifiers for dynamic classification. Secondly, Clustered Spatio-Temporal Manifolds, which modelled the dynamics of human actions with style invariant action templates that were combined with Dynamic Time Warping for execution rate invariance. Finally, a Hierarchical Transfer Learning framework, comprised of a novel transfer learning algorithm to detect compound actions in addition to hierarchical interaction detection to recognise the actions and interactions of multiple subjects. The proposed algorithms run in real-time with low latency ensuring they are suitable for a wide range of natural user interface applications including gaming. State-of-the art results were achieved for online action recognition. Experimental results indicate higher complexity of the G3Di dataset in comparison to the existing gaming datasets, highlighting the importance of this dataset for designing algorithms suitable for realistic interactive applications. This thesis has advanced the study of realistic action recognition and is expected to serve as a basis for further study within the research community

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    Learning recurrent representations for hierarchical behavior modeling

    Get PDF
    We propose a framework for detecting action patterns from motion sequences and modeling the sensory-motor relationship of animals, using a generative recurrent neural network. The network has a discriminative part (classifying actions) and a generative part (predicting motion), whose recurrent cells are laterally connected, allowing higher levels of the network to represent high level phenomena. We test our framework on two types of data, fruit fly behavior and online handwriting. Our results show that 1) taking advantage of unlabeled sequences, by predicting future motion, significantly improves action detection performance when training labels are scarce, 2) the network learns to represent high level phenomena such as writer identity and fly gender, without supervision, and 3) simulated motion trajectories, generated by treating motion prediction as input to the network, look realistic and may be used to qualitatively evaluate whether the model has learnt generative control rules
    • …
    corecore