365 research outputs found
A review of vision-based gait recognition methods for human identification
Human identification by gait has created a great deal of interest in computer vision community due to its advantage of inconspicuous recognition at a relatively far distance. This paper provides a comprehensive survey of recent developments on gait recognition approaches. The survey emphasizes on three major issues involved in a general gait recognition system, namely gait image representation, feature dimensionality reduction and gait classification. Also, a review of the available public gait datasets is presented. The concluding discussions outline a number of research challenges and provide promising future directions for the field
Intelligent Recognition of Acoustic and Vibration Threats for Security Breach Detection, Close Proximity Danger Identification, and Perimeter Protection
This article appeared in Homeland Security Affairs (March 2011), Supplement no.3The protection of perimeters in national, agricultural, airport, prison, and military sites, and residential areas against dangerous approaching human and vehicles when using human agents to provide security is expensive or unsafe. Because of this, acoustic/vibration signature identification of approaching human and vehicles threats has attracted increased attention. This paper addresses the development and deployment of three types of acoustic and vibration based smart sensors to identify and report sequential approaching threats prior to the intrusion. More specifically, we have developed: a) acoustic based long range sensor with which vehicles' engine sound and type can be identified, b) vibration based seismic analyzer which discriminates between human footsteps and other seismic events such as those caused by animals, and c) fence breaching vibration sensor which can detect intentional disturbances on the fence and discriminate between climb, kick, rattle, and lean. All of these sensors were designed with several issues in mind, namely, optimized low power usage, a low number of false positives, small size, secure radio communication, and military specifications. The developed vibration based system was installed in an airport with unprotected shore lines in the vicinity of taxi-and run-ways. The system reported an average of less than two false positives per week and zero false negative for the duration of forty-five days. Six fence sensors were installed on the terminal area and end-of runway chain-link fences where there was possibility of intentional fence climbing. The fence sensors reported no false positives for the duration of forty-five days which included several days of seasonal storms.Approved for public release; distribution is unlimited
Inferring Facial and Body Language
Machine analysis of human facial and body language is a challenging topic in computer
vision, impacting on important applications such as human-computer interaction and visual
surveillance. In this thesis, we present research building towards computational frameworks
capable of automatically understanding facial expression and behavioural body language.
The thesis work commences with a thorough examination in issues surrounding facial
representation based on Local Binary Patterns (LBP). Extensive experiments with different
machine learning techniques demonstrate that LBP features are efficient and effective for
person-independent facial expression recognition, even in low-resolution settings. We then
present and evaluate a conditional mutual information based algorithm to efficiently learn the
most discriminative LBP features, and show the best recognition performance is obtained by
using SVM classifiers with the selected LBP features. However, the recognition is performed
on static images without exploiting temporal behaviors of facial expression.
Subsequently we present a method to capture and represent temporal dynamics of facial
expression by discovering the underlying low-dimensional manifold. Locality Preserving Projections
(LPP) is exploited to learn the expression manifold in the LBP based appearance
feature space. By deriving a universal discriminant expression subspace using a supervised
LPP, we can effectively align manifolds of different subjects on a generalised expression manifold.
Different linear subspace methods are comprehensively evaluated in expression subspace
learning. We formulate and evaluate a Bayesian framework for dynamic facial expression
recognition employing the derived manifold representation. However, the manifold representation
only addresses temporal correlations of the whole face image, does not consider
spatial-temporal correlations among different facial regions. We then employ Canonical Correlation Analysis (CCA) to capture correlations among face
parts. To overcome the inherent limitations of classical CCA for image data, we introduce
and formalise a novel Matrix-based CCA (MCCA), which can better measure correlations in
2D image data. We show this technique can provide superior performance in regression and
recognition tasks, whilst requiring significantly fewer canonical factors. All the above work
focuses on facial expressions. However, the face is usually perceived not as an isolated object
but as an integrated part of the whole body, and the visual channel combining facial and
bodily expressions is most informative.
Finally we investigate two understudied problems in body language analysis, gait-based
gender discrimination and affective body gesture recognition. To effectively combine face
and body cues, CCA is adopted to establish the relationship between the two modalities, and
derive a semantic joint feature space for the feature-level fusion. Experiments on large data
sets demonstrate that our multimodal systems achieve the superior performance in gender
discrimination and affective state analysis.Research studentship of Queen Mary, the International Travel Grant of the Royal Academy of Engineering,
and the Royal Society International Joint Project
Person re-Identification over distributed spaces and time
PhDReplicating the human visual system and cognitive abilities that the brain uses to process the
information it receives is an area of substantial scientific interest. With the prevalence of video
surveillance cameras a portion of this scientific drive has been into providing useful automated
counterparts to human operators. A prominent task in visual surveillance is that of matching
people between disjoint camera views, or re-identification. This allows operators to locate people
of interest, to track people across cameras and can be used as a precursory step to multi-camera
activity analysis. However, due to the contrasting conditions between camera views and their
effects on the appearance of people re-identification is a non-trivial task. This thesis proposes
solutions for reducing the visual ambiguity in observations of people between camera views
This thesis first looks at a method for mitigating the effects on the appearance of people under
differing lighting conditions between camera views. This thesis builds on work modelling
inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer
Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited
training samples. Unlike previous methods that use a mean-based representation for a set of
training samples, the cumulative nature of the CBTF retains colour information from underrepresented
samples in the training set. Additionally, the bi-directionality of the mapping function
is explored to try and maximise re-identification accuracy by ensuring samples are accurately
mapped between cameras.
Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing
lighting conditions within a single camera. As the CBTF requires manually labelled training
samples it is limited to static lighting conditions and is less effective if the lighting changes. This
Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting
change over time, or rely on camera transition time information to update. By utilising contextual
information drawn from the background in each camera view, an estimation of the lighting
change within a single camera can be made. This background lighting model allows the mapping
of colour information back to the original training conditions and thus remove the need for
3
retraining.
Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous
methods use a score based on a direct distance measure of set features to form a correct/incorrect
match result. Rather than offering an operator a single outcome, the ranking paradigm is to give
the operator a ranked list of possible matches and allow them to make the final decision. By utilising
a Support Vector Machine (SVM) ranking method, a weighting on the appearance features
can be learned that capitalises on the fact that not all image features are equally important to
re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues
by separating the training samples into smaller subsets and boosting the trained models.
Finally, the thesis looks at a practical application of the ranking paradigm in a real world application.
The system encompasses both the re-identification stage and the precursory extraction
and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined
to extract relevant information from the video, while several combinations of matching
techniques are combined with temporal priors to form a more comprehensive overall matching
criteria.
The effectiveness of the proposed approaches is tested on datasets obtained from a variety
of challenging environments including offices, apartment buildings, airports and outdoor public
spaces
Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection
AbstractβThis paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure video-to-video volume similarity by extending Canonical Correlation Analysis (CCA), a principled tool to inspect linear relations between two sets of vectors, to that of two multiway data arrays (or tensors). The proposed method analyzes video volumes as inputs avoiding the difficult problem of explicit motion estimation required in traditional methods and provides a way of spatiotemporal pattern matching that is robust to intraclass variations of actions. The proposed matching is demonstrated for action classification by a simple Nearest Neighbor classifier. We, moreover, propose an automatic action detection method, which performs 3D window search over an input video with action exemplars. The search is speeded up by dynamic learning of subspaces in the proposed CCA. Experiments on a public action data set (KTH) and a self-recorded hand gesture data showed that the proposed method is significantly better than various state-ofthe-art methods with respect to accuracy. Our method has low time complexity and does not require any major tuning parameters. Index TermsβAction categorization, gesture recognition, canonical correlation analysis, tensor, action detection, incremental subspace learning, spatiotemporal pattern classification. Γ
- β¦