8,847 research outputs found
FROM VISUAL SALIENCY TO VIDEO BEHAVIOUR UNDERSTANDING
In a world of ever increasing amounts of video data, we are forced to abandon traditional
methods of scene interpretation by fully manual means. Under such circumstances, some form
of automation is highly desirable but this can be a very open ended issue with high complexity.
Dealing with such large amounts of data is a non-trivial task that requires efficient selective
extraction of parts of a scene which have the potential to develop a higher semantic meaning,
alone, or in combination with others. In particular, the types of video data that are in
need of automated analysis tend to be outdoor scenes with high levels of activity generated
from either foreground or background. Such dynamic scenes add considerable complexity
to the problem since we cannot rely on motion energy alone to detect regions of interest.
Furthermore, the behaviour of these regions of motion can differ greatly, while still being
highly dependent, both spatially and temporally on the movement of other objects within
the scene. Modelling these dependencies, whilst eliminating as much redundancy from the
feature extraction process as possible are the challenges addressed by this thesis.
In the first half, finding the right mechanism to extract and represent meaningful features
from dynamic scenes with no prior knowledge is investigated. Meaningful or salient information
is treated as the parts of a scene that stand out or seem unusual or interesting to
us. The novelty of the work is that it is able to select salient scales in both space and time
in which a particular spatio-temporal volume is considered interesting relative to the rest of
the scene. By quantifying the temporal saliency values of regions of motion, it is possible to
consider their importance in terms of both the long and short-term. Variations in entropy
over spatio-temporal scales are used to select a context dependent measure of the local scene
dynamics. A method of quantifying temporal saliency is devised based on the variation of
the entropy of the intensity distribution in a spatio-temporal volume over incraeasing scales.
Entropy is used over traditional filter methods since the stability or predictability of the intensity
distribution over scales of a local spatio-temporal region can be defined more robustly
relative to the context of its neighbourhood, even for regions exhibiting high intensity variation
due to being extremely textured. Results show that it is possible to extract both locally
salient features as well as globally salient temporal features from contrasting scenerios.
In the second part of the thesis, focus will shift towards binding these spatio-temporally
salient features together so that some semantic meaning can be inferred from their interaction.
Interaction in this sense, refers to any form of temporally correlated behaviour between
any salient regions of motion in a scene. Feature binding as a mechanism for interactive
behaviour understanding is particularly important if we consider that regions of interest may
not be treated as particularly significant individually, but represent much more semantically
when considered in combination. Temporally correlated behaviour is identified and classified
using accumulated co-occurrences of salient features at two levels. Firstly, co-occurrences are
accumulated for spatio-temporally proximate salient features to form a local representation.
Then, at the next level, the co-occurrence of these locally spatio-temporally bound features
are accumulated again in order to discover unusual behaviour in the scene. The novelty of
this work is that there are no assumptions made about whether interacting regions should be
spatially proximate. Furthermore, no prior knowledge of the scene topology is used. Results
show that it is possible to detect unusual interactions between regions of motion, which can
visually infer higher levels of semantics.
In the final part of the thesis, a more specific investigation of human behaviour is addressed
through classification and detection of interactions between 2 human subjects. Here, further
modifications are made to the feature extraction process in order to quantify the spatiotemporal
saliency of a region of motion. These features are then grouped to find the people
in the scene. Then, a loose pose distribution model is extracted for each person for finding
salient correlations between poses of two interacting people using canonical correlation
analysis. These canonical factors can be formed into trajectories and used for classification.
Levenshtein distance is then used to categorise the features. The novelty of the work is that
the interactions do not have to be spatially connected or proximate for them to be recognised.
Furthermore, the data used is outdoors and cluttered with non-stationary background. Results
show that co-occurrence techniques have the potential to provide a more generalised,
compact, and meaningful representation of dynamic interactive scene behaviour.EPRSC, part-funded by QinetiQ Ltd and a travel grant was also contributed by RAEng
Learning object behaviour models
The human visual system is capable of interpreting a remarkable variety of often subtle, learnt, characteristic behaviours. For instance we can determine the gender of a distant walking figure from their gait, interpret a facial expression as that of surprise, or identify suspicious behaviour in the movements of an individual within a car-park. Machine vision systems wishing to exploit such behavioural knowledge have been limited by the inaccuracies inherent in hand-crafted models and the absence of a unified framework for the perception of powerful behaviour models.
The research described in this thesis attempts to address these limitations, using a statistical modelling approach to provide a framework in which detailed behavioural knowledge is acquired from the observation of long image sequences. The core of the behaviour modelling framework is an optimised sample-set representation of the probability density in a behaviour space defined by a novel temporal pattern formation strategy.
This representation of behaviour is both concise and accurate and facilitates the recognition of actions or events and the assessment of behaviour typicality. The inclusion of generative capabilities is achieved via the addition of a learnt stochastic process model, thus facilitating the generation of predictions and realistic sample behaviours. Experimental results demonstrate the acquisition of behaviour models and suggest a variety of possible applications, including automated visual surveillance, object tracking, gesture recognition, and the generation of realistic object behaviours within animations, virtual worlds, and computer generated film sequences.
The utility of the behaviour modelling framework is further extended through the modelling of object interaction. Two separate approaches are presented, and a technique is developed which, using learnt models of joint behaviour together with a stochastic tracking algorithm, can be used to equip a virtual object with the ability to interact in a natural way. Experimental results demonstrate the simulation of a plausible virtual partner during interaction between a user and the machine
Aprendizagem automática aplicada à deteção de pessoas baseada em radar
The present dissertation describes the development and implementation of a
radar-based system with the purpose of being able to detect people amidst
other objects that are moving in an indoor scenario. The detection methods
implemented exploit radar data that is processed by a system that includes the
data acquisition, the pre-processing of the data, the feature extraction, and the
application of these data to machine learning models specifically designed to
attain the objective of target classification.
Beyond the basic theoretical research necessary for its sucessful development,
the work contamplates an important component of software development
and experimental tests. Among others, the following topics were covered
in this dissertation: the study of radar working principles and hardware; radar
signal processing; techniques of clutter removal, feature exctraction, and data
clustering applied to radar signals; implementation and hyperparameter tuning
of machine learning classification systems; study of multi-target detection and
tracking methods.
The people detection application was tested in different indoor scenarios that
include a static radar and a radar dynamically deployed by a mobile robot. This
application can be executed in real time and perform multiple target detection
and classification using basic clustering and tracking algorithms. A study of
the effects of the detection of multiple targets in the performance of the application,
as well as an assessment of the efficiency of the different classification
methods is presented.
The envisaged applications of the proposed detection system include intrusion
detection in indoor environments and acquisition of anonymized data for
people tracking and counting in public spaces such as hospitals and schools.A presente dissertação descreve o desenvolvimento e implementação de um
sistema baseado em radar que tem como objetivo detetar e distinguir pessoas
de outros objetos que se movem num ambiente interior. Os métodos de deteção
e distinção exploram os dados de radar que são processados por um
sistema que abrange a aquisição e pré-processamento dos dados, a extração
de características, e a aplicação desses dados a modelos de aprendizagem
automática especificamente desenhados para atingir o objetivo de classificação
de alvos.
Além do estudo da teoria básica de radar para o desenvolvimento bem sucedido
desta dissertação, este trabalho contempla uma componente importante
de desenvolvimento de software e testes experimentais. Entre outros,
os seguintes tópicos foram abordados nesta dissertação: o estudo dos
princípios básicos do funcionamento do radar e do seu equipamento; processamento
de sinal do radar; técnicas de remoção de ruído, extração de
características, e segmentação de dados aplicada ao sinal de radar; implementação
e calibração de hiper-parâmetros dos modelos de aprendizagem
automática para sistemas de classificação; estudo de métodos de deteção e
seguimento de múltiplos alvos.
A aplicação para deteção de pessoas foi testada em diferentes cenários interiores
que incluem o radar estático ou transportado por um robot móvel.
Esta aplicação pode ser executada em tempo real e realizar deteção e classificação
de múltiplos alvos usando algoritmos básicos de segmentação e
seguimento. O estudo do impacto da deteção de múltiplos alvos no funcionamento
da aplicação é apresentado, bem como a avaliação da eficiência dos
diferentes métodos de classificação usados.
As possíveis aplicações do sistema de deteção proposto incluem a deteção
de intrusão em ambientes interiores e aquisição de dados anónimos para
seguimento e contagem de pessoas em espaços públicos tais como hospitais
ou escolas.Mestrado em Engenharia de Computadores e Telemátic
Vision-based techniques for gait recognition
Global security concerns have raised a proliferation of video surveillance
devices. Intelligent surveillance systems seek to discover possible threats
automatically and raise alerts. Being able to identify the surveyed object can
help determine its threat level. The current generation of devices provide
digital video data to be analysed for time varying features to assist in the
identification process. Commonly, people queue up to access a facility and
approach a video camera in full frontal view. In this environment, a variety of
biometrics are available - for example, gait which includes temporal features
like stride period. Gait can be measured unobtrusively at a distance. The video
data will also include face features, which are short-range biometrics. In this
way, one can combine biometrics naturally using one set of data. In this paper
we survey current techniques of gait recognition and modelling with the
environment in which the research was conducted. We also discuss in detail the
issues arising from deriving gait data, such as perspective and occlusion
effects, together with the associated computer vision challenges of reliable
tracking of human movement. Then, after highlighting these issues and
challenges related to gait processing, we proceed to discuss the frameworks
combining gait with other biometrics. We then provide motivations for a novel
paradigm in biometrics-based human recognition, i.e. the use of the
fronto-normal view of gait as a far-range biometrics combined with biometrics
operating at a near distance
Achieving illumination invariance using image filters
In this chapter we described a novel framework for automatic face recognition in the presence of varying illumination, primarily applicable to matching face sets or sequences. The framework is based on simple image processing filters that compete with unprocessed greyscale input to yield a single matching score between individuals. By performing all numerically consuming computation offline, our method both (i) retains the matching efficiency of simple image filters, but (ii) with a greatly increased robustness, as all online processing is performed in closed-form. Evaluated on a large, real-world data corpus, the proposed framework was shown to be successful in video-based recognition across a wide range of illumination, pose and face motion pattern change
- …