24,664 research outputs found
Hierarchical transfer learning for online recognition of compound actions
Recognising human actions in real-time can provide users with a natural user interface (NUI) enabling a range of innovative and immersive applications. A NUI application should not restrict users’ movements; it should allow users to transition between actions in quick succession, which we term as compound actions. However, the majority of action recognition researchers have focused on individual actions, so their approaches are limited to recognising single actions or multiple actions that are temporally separated.
This paper proposes a novel online action recognition method for fast detection of compound actions. A key contribution is our hierarchical body model that can be automatically configured to detect actions based on the low level body parts that are the most discriminative for a particular action. Another key contribution is a transfer learning strategy to allow the tasks of action segmentation and whole body modelling to be performed on a related but simpler dataset, combined with automatic hierarchical body model adaption on a more complex target dataset.
Experimental results on a challenging and realistic dataset show an improvement in action recognition performance of 16% due to the introduction of our hierarchical transfer learning. The proposed algorithm is fast with an average latency of just 2 frames (66ms) and outperforms state of the art action recognition algorithms that are capable of fast online action recognition
Multiple Action Recognition for Video Games (MARViG)
Action recognition research historically has focused on increasing accuracy on datasets in
highly controlled environments. Perfect or near perfect offline action recognition
accuracy on scripted datasets has been achieved. The aim of this thesis is to deal with the
more complex problem of online action recognition with low latency in real world
scenarios. To fulfil this aim two new multi-modal gaming datasets were captured and
three novel algorithms for online action recognition were proposed.
Two new gaming datasets, G3D and G3Di for real-time action recognition with multiple
actions and multi-modal data were captured and publicly released. Furthermore, G3Di
was captured using a novel game-sourcing method so the actions are realistic. Three novel
algorithms for online action recognition with low latency were proposed. Firstly,
Dynamic Feature Selection, which combines the discriminative power of Random Forests
for feature selection with an ensemble of AdaBoost classifiers for dynamic classification.
Secondly, Clustered Spatio-Temporal Manifolds, which modelled the dynamics of human
actions with style invariant action templates that were combined with Dynamic Time
Warping for execution rate invariance. Finally, a Hierarchical Transfer Learning
framework, comprised of a novel transfer learning algorithm to detect compound actions
in addition to hierarchical interaction detection to recognise the actions and interactions
of multiple subjects.
The proposed algorithms run in real-time with low latency ensuring they are suitable for
a wide range of natural user interface applications including gaming. State-of-the art
results were achieved for online action recognition. Experimental results indicate higher
complexity of the G3Di dataset in comparison to the existing gaming datasets,
highlighting the importance of this dataset for designing algorithms suitable for realistic
interactive applications. This thesis has advanced the study of realistic action recognition
and is expected to serve as a basis for further study within the research community
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Learning recurrent representations for hierarchical behavior modeling
We propose a framework for detecting action patterns from motion sequences
and modeling the sensory-motor relationship of animals, using a generative
recurrent neural network. The network has a discriminative part (classifying
actions) and a generative part (predicting motion), whose recurrent cells are
laterally connected, allowing higher levels of the network to represent high
level phenomena. We test our framework on two types of data, fruit fly behavior
and online handwriting. Our results show that 1) taking advantage of unlabeled
sequences, by predicting future motion, significantly improves action detection
performance when training labels are scarce, 2) the network learns to represent
high level phenomena such as writer identity and fly gender, without
supervision, and 3) simulated motion trajectories, generated by treating motion
prediction as input to the network, look realistic and may be used to
qualitatively evaluate whether the model has learnt generative control rules
- …