733 research outputs found
Histogram-based training initialisation of hidden Markov models for human action recognition
Human action recognition is often addressed by use of latent-state models such as the hidden Markov model and similar graphical models. As such models require Expectation-Maximisation training, arbitrary choices must be made for training initialisation, with major impact on the final recognition accuracy. In this paper, we propose a histogram-based deterministic initialisation and compare it with both random and a time-based deterministic initialisations. Experiments on a human action dataset show that the accuracy of the proposed method proved higher than that of the other tested methods. © 2010 IEEE
Towards practical automated human action recognition
University of Technology, Sydney. Faculty of Engineering and Information Technology.Modern video surveillance requires addressing high-level concepts such as humans' actions and
activities. Automated human action recognition is an interesting research area, as well as one of the
main trends in the automated video survei1lance industry. The typical goal of action recognition is that
of labelling an image sequence (video) using one out of a set of action labels. In general, it requires
the extraction of a feature set from the relevant video, fo1lowed by the classification of the extracted
features. Despite the many approaches for feature set extraction and classification proposed to date,
some barriers for practical action recognition sti11 exist. We argue that recognition accuracy, speed,
robustness and the required hardware are the main factors to build a practical human action
recognition system to be run on a typical PC for a real-time video surveillance application. For
example, a computationally-heavy set of measurements may prevent practical implementation on
common platforms.
The main focus of this thesis is challenging the main difficulties and proposing solution. towards a
practical action recognition system. The main outstanding difficulties that we have challenged in this
thesis include 1) initialisation issues with model training: 2) feature sets of limited computational
weight sui table for real-ti me application; 3) model robustness to outliers; and 4) pending issues with
the standardisation of software interfaces. In the following, we provide a description of our
contributions to the resolution of these issues.
Amongst the different classification approaches for classifying action , graphical model such as
the hidden Markov model (HMM) have been widely exploited by many researchers. Such models
include observation probabilities which are generally modelled by mixtures of Gaussian components.
When learning an HMM by way of Expectation-Maximisation (EM) algorithms, arbitrary choices
must be made for their initial parameters. The initial choices have a major impact on the parameters at
convergence and, in turn, on the recognition accuracy. This dependence forces us to repeat training
with different initial parameters until satisfactory cross-validation accuracy is attained. Such a process
is overall empirical and time consuming.
We argue that one-off initialisation can offer a better trade-off between training time and accuracy,
and as one of the main contributions of this thesis, we propose two methods for deterministic
initialisation of the Gaussian components' centres. The first method is a time segmentation-based
approach which divides each training sequence into the requested number of clusters (product of the
number of HMM states and the number of Gaussian components in each state) in the time domain.
Then, clusters' centres are averaged among all the training sequences to compute the initial centre for
each Gaussian component. The second approach is a histogram-based approach which tries to
initialise the components' centres with the more popular values among the training data in terms of
density (similar to mode seeking approaches). The histogram-based approach is performed
incrementally, considering each feature at a time. Either centre initialisation approach is followed by
dispatching the resulting Gaussian components onto HMM states. The reference component
dispatching method exploits the arbitrary order for dispatching. In contrast, we again propose two
more intelligent methods based on the effort to put components with closer centres in the same state
which can improve the co1Tect recognition rate.
Experiments over three human action video datasets (Weizmann [1 ], MuHAVi [2] and Hollywood
[3]) prove that our proposed deterministic initialisation methods are capable of achieving accuracy
above the average of repeated random initialisations (about 1 per cent to 3 per cent in 6 random run
experiment) and comparable to the best. At the same time, one-off deterministic initialisation can save
the required training time substantially compared to repeated random initialisations, e.g. up to 83% in
the case of 6 runs of random initialisation. The proposed methods are general as they naturally extend
to other models where observation densities are conditioned on discrete latent variables, such as
dynamic Bayesian networks (DBNs) and switching models .
As another contribution, we propose a simple and computationally lightweight feature set, named
sectorial extreme points, which requires only 1.6 ms per frame for extraction on a reference PC. We
believe a lightweight feature set is more appropriate for the task of action recognition in real-time
surveillance applications with the usual requirement of processing 25 frames per second (PAL video
rate). The proposed feature set represents the coordinates of the extreme points in the contour of a
subject's foreground mask. The various experiments prove the strength of the proposed feature set in
terms of classification accuracy, compared to similar feature sets, such as the star skeleton [4] (by
more than 3%) and the well-known projection histograms (up to 7%).
Another main issue in density modelling of the extracted features is the outlier problem. The
extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can
severely affect density modelling when the Gaussian distribution is used as the model since it is short-tailed
and highly sensitive to outliers. Hence, outliers can affect the classification accuracy of the
HMM-based action recognition approaches that exploit Gaussian distribution as the base component.
In contrast, the Student' s t-distribution is more robust to outliers thanks to its longer tail and can be
exploited for density modelling to improve the recognition rate in the presence of abnormal data. As
another main contribution, we present an HMM which uses mixtures of t-distributions as observation
probabilities and apply it for the recognition task. The conducted experiments over the Weizmann and
MuHAVi datasets with various feature sets report a remarkable improvement of up to 9% in
classification accuracy by using HMM with mixtures of t-distributions instead of mixture of
Gaussians. Using our own proposed sectorial extreme points feature set, we have achieved the
maximum possible classification accuracy (100%) over the Weizmann dataset. This achievement
should be considered jointly with the fact that we have used a lightweight feature set.
On a different ground, and from the implementation viewpoint, surveillance software for
automated human action recognition requires portability over a variety of platforms, from servers to
mobile devices. The current products mainly target low level video analysis tasks, e.g. video
annotation, instead of higher level ones, such as action recognition. Therefore, we explore the
potential of the MPEG-7 standard to provide a standard interface platform (through descriptors and
architectures) for human action recognition from surveillance cameras. As the last contribution of this
work, we present two novel MPEG-7 descriptors, one symbolic and the other feature-based, alongside
two different architectures: the server-intensive which is more suitable for "thin" client devices , such
as PDAs and the client-intensive that is more appropriate for ''thick" clients, such as desktops. We
evaluate the proposed descriptors and architectures by way of a scenario analysis.
We believe that through the four contributions of this thesis, human action recognition systems
have become more practical. While some contributions are specific to generative models such as the
HMM, other contributions are more general and can be exploited with other classification approaches.
We acknowledge that the entire area of human action recognition is progressing at an enormous pace,
and that other outstanding issues are being resolved by research groups world-wide. We hope that the
reader will enjoy the content of this work
Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome
The article presents an application of Hidden Markov Models (HMMs) for
pattern recognition on genome sequences. We apply HMM for identifying genes
encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma
brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa
causative agents of sleeping sickness and several diseases in domestic and wild
animals. These parasites have a peculiar strategy to evade the host's immune
system that consists in periodically changing their predominant cellular
surface protein (VSG). The motivation for using patterns recognition methods to
identify these genes, instead of traditional homology based ones, is that the
levels of sequence identity (amino acid and DNA sequence) amongst these genes
is often below of what is considered reliable in these methods. Among pattern
recognition approaches, HMM are particularly suitable to tackle this problem
because they can handle more naturally the determination of gene edges. We
evaluate the performance of the model using different number of states in the
Markov model, as well as several performance metrics. The model is applied
using public genomic data. Our empirical results show that the VSG genes on T.
brucei can be safely identified (high sensitivity and low rate of false
positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications,
Springer. The article contains 23 pages, 4 figures, 8 tables and 51
reference
Human-aware Collaborative Manipulation with Reaching Motion Prediction
This dissertations presents a possible approach to improve human-robot interaction in an
industrial collaborative situation, where the human operator and a collaborative industrial
robot work within a shared work-space. The approach presented in this dissertation
focuses on a situation where part of the assembly process needs to be carried out by a
human operator, whose assembly station is located on a work-bench, and a robot is used
to pick and place products in specific locations on the operator’s work station. Because
those locations can be accessed both by the robot or the human operator at any time,
collisions can occur and should be avoided in order to make the process more natural
for the human operator as well as to avoid the emergency stop of the collaborative robot
which has to be restarted and thus decreases productivity.
In order to prevent those collisions the proposed system defines key-areas in each of
the locations as well as other relevant positions for the collaborative task. The system uses
a Kinect Sensor and a neural network to track the user’s hand over time and Gaussian
Mixture Models to make predictions regarding the possible destination key-area given
the observed trajectory until that moment. If a collision is predicted the robot pauses the
task being executed at the moment in order to prevent it and, once the conflict has been
resolved, resumes operation.Esta dissertação apresenta uma possível aproximação para melhorar a interação humanorobot
em situações industrias colaborativas, onde um operador humano e um robot industrial
colaborativo trabalham num espaço partilhado. A aproximação apresentada nesta
dissertação foca situações onde parte do processo de produção deve ser realizado por um
operador humano cuja área de trabalho se localiza numa mesa. É utilizado um robot de
forma a colocar e retirar produtos de locais especificos da mesa de trabalho do operador.
Uma vez que estes locais podem ser acedidos pelo utilizador e pelo robot a qualquer
momento é possivel que ocorram colisões que devem ser evitadas, de forma a tornar a
interação mais natural para o humano e evitar paragens de emergencia, que requerem que
o robot colaborativo seja reiniciado manualmente e, portanto, diminuem a produtividade.
De forma a prevenir essas colisões, o sistema proposto define áreas-chave nos locais
onde podem ocorrer colisões e em outras localisões relevantes para a tarefa colaborativa
a ser executada. A solução proposta utiliza um sensor Kinect, juntamente com uma rede
neuronal para seguir a mão do operador ao longo do tempo e usa Gaussian Mixture Models
para fazer previsões relativas à área de destino dada a trajetoria observada até ao momento.
Se for prevista uma colisão o robot interrompe a execução da tarefa programada
de forma a evitar a colisão. Uma vez o conflito resolvido, o robot retoma a tarefa do ponto
onde parou
Tracking interacting targets in multi-modal sensors
PhDObject tracking is one of the fundamental tasks in various applications such as surveillance,
sports, video conferencing and activity recognition. Factors such as occlusions,
illumination changes and limited field of observance of the sensor make tracking a challenging
task. To overcome these challenges the focus of this thesis is on using multiple
modalities such as audio and video for multi-target, multi-modal tracking. Particularly,
this thesis presents contributions to four related research topics, namely, pre-processing of
input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking,
and interaction recognition.
To improve the performance of detection algorithms, especially in the presence
of noise, this thesis investigate filtering of the input data through spatio-temporal feature
analysis as well as through frequency band analysis. The pre-processed data from multiple
modalities is then fused within Particle filtering (PF). To further minimise the discrepancy
between the real and the estimated positions, we propose a strategy that associates the
hypotheses and the measurements with a real target, using a Weighted Probabilistic Data
Association (WPDA). Since the filtering involved in the detection process reduces the
available information and is inapplicable on low signal-to-noise ratio data, we investigate
simultaneous detection and tracking approaches and propose a multi-target track-beforedetect
Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses
the detection step and performs tracking in the raw signal. Finally, we apply the proposed
multi-modal tracking to recognise interactions between targets in regions within, as well
as outside the cameras’ fields of view.
The efficiency of the proposed approaches are demonstrated on large uni-modal,
multi-modal and multi-sensor scenarios from real world detections, tracking and event
recognition datasets and through participation in evaluation campaigns
Descriptive temporal template features for visual motion recognition
In this paper, a human action recognition system is proposed. The system is based on new, descriptive `temporal template' features in order to achieve high-speed recognition in real-time, embedded applications. The limitations of the well known `Motion History Image' (MHI) temporal template are addressed and a new `Motion History Histogram' (MHH) feature is proposed to capture more motion information in the video. MHH not only provides rich motion information, but also remains computationally inexpensive. To further improve classification performance, we combine both MHI and MHH into a low dimensional feature vector which is processed by a support vector machine (SVM). Experimental results show that our new representation can achieve a significant improvement in the performance of human action recognition over existing comparable methods, which use 2D temporal template based representations
Omnidirectional Vision Based Topological Navigation
Goedemé T., Van Gool L., ''Omnidirectional vision based topological navigation'', Mobile robots navigation, pp. 172-196, Barrera Alejandra, ed., March 2010, InTech.status: publishe
- …