5 research outputs found
Component-level aggregation of probabilistic PCA mixtures using variational-Bayes
Technical Report. This report of an extended version of our ICPR'2010 paper.This paper proposes a technique for aggregating mixtures of probabilistic principal component analyzers, which are a powerful probabilistic generative model for coping with a high-dimensional, non linear, data set. Aggregation is carried out through Bayesian estimation with a specific prior and an original variational scheme. We demonstrate how such models may be aggregated by accessing model parameters only, rather than original data, which can be advantageous for learning from distributed data sets. Experimental results illustrate the effectiveness of the proposal
Robust gait recognition under variable covariate conditions
PhDGait is a weak biometric when compared to face, fingerprint or iris because it can be easily
affected by various conditions. These are known as the covariate conditions and include clothing,
carrying, speed, shoes and view among others. In the presence of variable covariate conditions
gait recognition is a hard problem yet to be solved with no working system reported.
In this thesis, a novel gait representation, the Gait Flow Image (GFI), is proposed to extract
more discriminative information from a gait sequence. GFI extracts the relative motion of body
parts in different directions in separate motion descriptors. Compared to the existing model-free
gait representations, GFI is more discriminative and robust to changes in covariate conditions.
In this thesis, gait recognition approaches are evaluated without the assumption on cooperative
subjects, i.e. both the gallery and the probe sets consist of gait sequences under different
and unknown covariate conditions. The results indicate that the performance of the existing approaches
drops drastically under this more realistic set-up. It is argued that selecting the gait
features which are invariant to changes in covariate conditions is the key to developing a gait
recognition system without subject cooperation. To this end, the Gait Entropy Image (GEnI) is
proposed to perform automatic feature selection on each pair of gallery and probe gait sequences.
Moreover, an Adaptive Component and Discriminant Analysis is formulated which seamlessly
integrates the feature selection method with subspace analysis for fast and robust recognition.
Among various factors that affect the performance of gait recognition, change in viewpoint
poses the biggest problem and is treated separately. A novel approach to address this problem is
proposed in this thesis by using Gait Flow Image in a cross view gait recognition framework with
the view angle of a probe gait sequence unknown. A Gaussian Process classification technique
is formulated to estimate the view angle of each probe gait sequence. To measure the similarity
of gait sequences across view angles, the correlation of gait sequences from different views is
modelled using Canonical Correlation Analysis and the correlation strength is used as a similarity
measure. This differs from existing approaches, which reconstruct gait features in different views
through 2D view transformation or 3D calibration. Without explicit reconstruction, the proposed
method can cope with feature mis-match across view and is more robust against feature noise
Spatial and temporal background modelling of non-stationary visual scenes
PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent
in recent years. Applications are to be found in medical scanning, automated manufacture, and
perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic
management all employ and benefit from an unprecedented quantity of video cameras for monitoring
purposes. But the high cost and limited effectiveness of employing humans as the final
link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques.
Whilst the field of machine vision has enjoyed consistent rapid development in the last
20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner.
Central to a great many vision applications is the concept of segmentation, and in particular,
most practical systems perform background subtraction as one of the first stages of video
processing. This involves separation of ‘interesting foreground’ from the less informative but
persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and
liable to be application specific. Furthermore, the background may be interpreted as including
the visual appearance of normal activity of any agents present in the scene, human or otherwise.
Thus a background model might be called upon to absorb lighting changes, moving trees and
foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in
‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails
of the computer vision field, and consequently the subject has received considerable attention.
This thesis sets out to address some of the limitations of contemporary methods of background
segmentation by investigating methods of inducing local mutual support amongst pixels
in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm
time domain, and (3) locality in the domain of cyclic repetition frequency.
Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no
spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose
a structure in which every image pixel bears the same relation to every other pixel. But Markov
Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and
3
are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence
of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple
learned local pattern hypotheses, whilst relying solely on monochrome image data.
Many background models enforce temporal consistency constraints on a pixel in attempt to
confirm background membership before being accepted as part of the model, and typically some
control over this process is exercised by a learning rate parameter. But in busy scenes, a true
background pixel may be visible for a relatively small fraction of the time and in a temporally
fragmented fashion, thus hindering such background acquisition. However, support in terms of
temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm
background estimates which induce a similar consistency, but are considerably more robust
to disturbance. A novel technique is presented here in which the short-term estimates act as
‘pre-filtered’ data from which a far more compact eigen-background may be constructed.
Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions
employing traffic signals are among these, yet little is to be found amongst the literature regarding
the explicit modelling of such periodic processes in a scene. Previous work focussing on gait
recognition has demonstrated approaches based on recurrence of self-similarity by which local
periodicity may be identified. The present work harnesses and extends this method in order
to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal
model. The model may then be used to highlight abnormality in scene activity. Furthermore, a
Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to
maintain correct synchronization with scene activity in spite of noise and drift of periodicity.
This thesis contends that these three approaches are all manifestations of the same broad
underlying concept: local support in each of the space, time and frequency domains, and furthermore,
that the support can be harnessed practically, as will be demonstrated experimentally
Individual and group dynamic behaviour patterns in bound spaces
The behaviour analysis of individual and group dynamics in closed spaces is a subject of extensive research in both academia and industry. However, despite recent technological advancements the problem of implementing the existing methods for visual behaviour data analysis in production systems remains difficult and the applications are available only in special cases in which the resourcing is not a problem. Most of the approaches concentrate on direct extraction and classification of the visual features from the video footage for recognising the dynamic behaviour directly from the source. The adoption of such an approach allows recognising directly the elementary actions of moving objects, which is a difficult task on its own. The major factor that impacts the performance of the methods for video analytics is the necessity to combine processing of enormous volume of video data with complex analysis of this data using and computationally resourcedemanding analytical algorithms. This is not feasible for many applications, which must work in real time. In this research, an alternative simulation-based approach for behaviour analysis has been adopted. It can potentially reduce the requirements for extracting information from real video footage for the purpose of the analysis of the dynamic behaviour. This can be achieved by combining only limited data extracted from the original video footage with a symbolic data about the events registered on the scene, which is generated by 3D simulation synchronized with the original footage. Additionally, through incorporating some physical laws and the logics of dynamic behaviour directly in the 3D model of the visual scene, this framework allows to capture the behavioural patterns using simple syntactic pattern recognition methods. The extensive experiments with the prototype implementation prove in a convincing manner that the 3D simulation generates sufficiently rich data to allow analysing the dynamic behaviour in real-time with sufficient adequacy without the need to use precise physical data, using only a limited data about the objects on the scene, their location and dynamic characteristics. This research can have a wide applicability in different areas where the video analytics is necessary, ranging from public safety and video surveillance to marketing research to computer games and animation. Its limitations are linked to the dependence on some preliminary processing of the video footage which is still less detailed and computationally demanding than the methods which use directly the video frames of the original footage
DOI: 10.1007/s11263-005-5024-8 Model Selection for Unsupervised Learning of Visual Context
Abstract. This study addresses the problem of choosing the most suitable probabilistic model selection criterion for unsupervised learning of visual context of a dynamic scene using mixture models. A rectified Bayesian Information Criterion (BICr) and a Completed Likelihood Akaike’s Information Criterion (CL-AIC) are formulated to estimate the optimal model order (complexity) for a given visual scene. Both criteria are designed to overcome poor model selection by existing popular criteria when the data sample size varies from small to large and the true mixture distribution kernel functions differ from the assumed ones. Extensive experiments on learning visual context for dynamic scene modelling are carried out to demonstrate the effectiveness of BICr and CL-AIC, compared to that of existing popular model selection criteria including BIC, AIC and Integrated Completed Likelihood (ICL). Our study suggests that for learning visual context using a mixture model, BICr is the most appropriate criterion given sparse data, while CL-AIC should be chosen given moderate or large data sample sizes