105 research outputs found
Action classification using a discriminative non-parametric hidden Markov model
We classify human actions occurring in videos, using the skeletal joint positions extracted from a depth image sequence as features. Each action class is represented by a non-parametric Hidden Markov Model (NP-HMM) and the model parameters are learnt in a discriminative way. Specifically, we use a Bayesian framework based on Hierarchical Dirichlet Process (HDP) to automatically infer the cardinality of hidden states and formulate a discriminative function based on distance between Gaussian distributions to improve classification performance. We use elliptical slice sampling to efficiently sample parameters from the complex posterior distribution induced by our discriminative likelihood function. We illustrate our classification results for action class models trained using this technique
Activity recognition using a supervised non-parametric hierarchical HMM
The problem of classifying human activities occurring in depth image sequences is addressed. The 3D joint positions of a human skeleton and the local depth image pattern around these joint positions define the features. A two level hierarchical Hidden Markov Model (H-HMM), with independent Markov chains for the joint positions and depth image pattern, is used to model the features. The states corresponding to the H-HMM bottom level characterize the granular poses while the top level characterizes the coarser actions associated with the activities. Further, the H-HMM is based on a Hierarchical Dirichlet Process (HDP), and is fully non-parametric with the number of pose and action states inferred automatically from data. This is a significant advantage over classical HMM and its extensions. In order to perform classification, the relationships between the actions and the activity labels are captured using multinomial logistic regression. The proposed inference procedure ensures alignment of actions from activities with similar labels. Our construction enables information sharing, allows incorporation of unlabelled examples and provides a flexible factorized representation to include multiple data channels. Experiments with multiple real world datasets show the efficacy of our classification approach
An infinite adaptive online learning model for segmentation and classification of streaming data
© 2014 IEEE. In recent years, the desire and need to understand streaming data has been increasing. Along with the constant flow of data, it is critical to classify and segment the observations on-the-fly without being limited to a rigid number of classes. In other words, the system needs to be adaptive to the streaming data and capable of updating its parameters to comply with natural changes. This interesting problem, however, is poorly addressed in the literature, as many of the common studies focus on offline classification over a pre-defined class set. In this paper, we propose a novel adaptive online system based on Markov switching models with hierarchical Dirichlet process priors. This infinite adaptive online approach is capable of segmenting and classifying the streaming data over infinite classes, while meeting the memory and delay constraints of streaming contexts. The model is further enhanced by a 'predictive batching' mechanism, that is able to divide the flowing data into batches of variable size, imitating the ground-truth segments. Experiments on two video datasets show significant performance of the proposed approach in frame-level accuracy, segmentation recall and precision, while determining the accurate number of classes in acceptable computational time
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Learning Behavioural Context
The original publication is available at www.springerlink.co
Non-parametric hidden conditional random fields for action classification
Conditional Random Fields (CRF), a structured prediction method, combines probabilistic graphical models and discriminative classification techniques in order to predict class labels in sequence recognition problems. Its extension the Hidden Conditional Random Fields (HCRF) uses hidden state variables in order to capture intermediate structures. The number of hidden states in an HCRF must be specified a priori. This number is often not known in advance. A non-parametric extension to the HCRF, with the number of hidden states automatically inferred from data, is proposed here. This is a significant advantage over the classical HCRF since it avoids ad hoc model selection procedures. Further, the training and inference procedure is fully Bayesian eliminating the over fitting problem associated with frequentist methods. In particular, our construction is based on scale mixtures of Gaussians as priors over the HCRF parameters and makes use of Hierarchical Dirichlet Process (HDP) and Laplace distribution. The proposed inference procedure uses elliptical slice sampling, a Markov Chain Monte Carlo (MCMC) method, in order to sample optimal and sparse posterior HCRF parameters. The above technique is applied for classifying human actions that occur in depth image sequences – a challenging computer vision problem. Experiments with real world video datasets confirm the efficacy of our classification approach
Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation
In this work, we address the problem of spatio-temporal action detection in
temporally untrimmed videos. It is an important and challenging task as finding
accurate human actions in both temporal and spatial space is important for
analyzing large-scale video data. To tackle this problem, we propose a cascade
proposal and location anticipation (CPLA) model for frame-level action
detection. There are several salient points of our model: (1) a cascade region
proposal network (casRPN) is adopted for action proposal generation and shows
better localization accuracy compared with single region proposal network
(RPN); (2) action spatio-temporal consistencies are exploited via a location
anticipation network (LAN) and thus frame-level action detection is not
conducted independently. Frame-level detections are then linked by solving an
linking score maximization problem, and temporally trimmed into spatio-temporal
action tubes. We demonstrate the effectiveness of our model on the challenging
UCF101 and LIRIS-HARL datasets, both achieving state-of-the-art performance.Comment: Accepted at BMVC 2017 (oral
- …