428 research outputs found
Learning Discriminative Feature Representations for Visual Categorization
Learning discriminative feature representations has attracted a great deal of attention due to its potential value and wide usage in a variety of areas, such as image/video recognition and retrieval, human activities analysis, intelligent surveillance and human-computer
interaction.
In this thesis we first introduce a new boosted key-frame selection scheme for action recognition. Specifically, we propose to select a subset of key poses for the representation of each action via AdaBoost and a new classifier, namely WLNBNN, is then developed for final classification. The experimental results of the proposed method are 0.6% - 13.2% better than previous work. After that, a domain-adaptive learning approach based on multiobjective genetic programming (MOGP) has been developed for image classification. In this method, a set of primitive 2-D operators are randomly combined to construct feature descriptors through the MOGP evolving and then evaluated by two objective fitness criteria,
i.e., the classification error and the tree complexity. Later, the (near-)optimal feature descriptor can be obtained. The proposed approach can achieve 0.9% ∼ 25.9% better performance compared with state-of-the-art methods. Moreover, effective dimensionality reduction algorithms have also been widely used for obtaining better representations. In this thesis, we have proposed a novel linear unsupervised algorithm, termed Discriminative Partition Sparsity Analysis (DPSA), explicitly considering different probabilistic distributions that exist over the data points, simultaneously preserving the natural locality relationship among the data. All these above methods have been systematically evaluated on several public datasets, showing their accurate and robust performance (0.44% - 6.69% better than the previous) for action and image categorization. Targeting efficient image classification
, we also introduce a novel unsupervised framework termed evolutionary compact embedding (ECE) which can automatically learn the task-specific binary hash codes. It is regarded as an optimization algorithm which combines the genetic programming (GP) and a boosting trick. The experimental results manifest ECE significantly outperform others by 1.58% - 2.19% for classification tasks. In addition, a supervised framework, bilinear local feature hashing (BLFH), has also been proposed to learn highly discriminative binary codes on the local descriptors for large-scale image similarity search. We address it as a nonconvex optimization problem to seek orthogonal projection matrices for hashing, which can successfully preserve the pairwise similarity between different local features and simultaneously take image-to-class (I2C) distances into consideration. BLFH produces outstanding results (0.017% - 0.149% better) compared to the state-of-the-art hashing techniques
GESTURE RECOGNITION FOR PENCAK SILAT TAPAK SUCI REAL-TIME ANIMATION
The main target in this research is a design of a virtual martial arts training system in real-time and as a tool in learning martial arts independently using genetic algorithm methods and dynamic time warping. In this paper, it is still in the initial stages, which is focused on taking data sets of martial arts warriors using 3D animation and the Kinect sensor cameras, there are 2 warriors x 8 moves x 596 cases/gesture = 9,536 cases. Gesture Recognition Studies are usually distinguished: body gesture and hand and arm gesture, head and face gesture, and, all three can be studied simultaneously in martial arts pencak silat, using martial arts stance detection with scoring methods. Silat movement data is recorded in the form of oni files using the OpenNI â„¢ (OFW) framework and BVH (Bio Vision Hierarchical) files as well as plug-in support software on Mocap devices. Responsiveness is a measure of time responding to interruptions, and is critical because the system must be able to meet the demand
A fuzzy probabilistic inference methodology for constrained 3D human motion classification
Enormous uncertainties in unconstrained human motions lead to a fundamental challenge that many recognising algorithms have to face in practice: efficient and correct motion recognition is a demanding task, especially when human kinematic motions are subject to variations of execution in the spatial and temporal domains, heavily overlap with each other,and are occluded. Due to the lack of a good solution to these problems, many existing methods tend to be either effective but computationally intensive or efficient but vulnerable to misclassification. This thesis presents a novel inference engine for recognising occluded 3D human motion assisted by the recognition context. First, uncertainties are wrapped into a fuzzy membership function via a novel method called Fuzzy Quantile Generation which employs metrics derived from the probabilistic quantile function. Then, time-dependent and context-aware rules are produced via a genetic programming to smooth the qualitative outputs represented by fuzzy membership functions. Finally, occlusion in motion recognition is taken care of by introducing new procedures for feature selection and feature reconstruction. Experimental results demonstrate the effectiveness of the proposed framework on motion capture data from real boxers in terms of fuzzy membership generation, context-aware rule generation, and motion occlusion. Future work might involve the extension of Fuzzy Quantile Generation in order to automate the choice of a probability distribution, the enhancement of temporal pattern recognition with probabilistic paradigms, the optimisation of the occlusion module, and the adaptation of the present framework to different application domains.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Uniscale and multiscale gait recognition in realistic scenario
The performance of a gait recognition method is affected by numerous challenging
factors that degrade its reliability as a behavioural biometrics for subject identification in
realistic scenario. Thus for effective visual surveillance, this thesis presents five gait recog-
nition methods that address various challenging factors to reliably identify a subject in
realistic scenario with low computational complexity. It presents a gait recognition method
that analyses spatio-temporal motion of a subject with statistical and physical parameters
using Procrustes shape analysis and elliptic Fourier descriptors (EFD). It introduces a part-
based EFD analysis to achieve invariance to carrying conditions, and the use of physical
parameters enables it to achieve invariance to across-day gait variation. Although spatio-
temporal deformation of a subject’s shape in gait sequences provides better discriminative
power than its kinematics, inclusion of dynamical motion characteristics improves the iden-
tification rate. Therefore, the thesis presents a gait recognition method which combines
spatio-temporal shape and dynamic motion characteristics of a subject to achieve robust-
ness against the maximum number of challenging factors compared to related state-of-the-
art methods. A region-based gait recognition method that analyses a subject’s shape in
image and feature spaces is presented to achieve invariance to clothing variation and carry-
ing conditions. To take into account of arbitrary moving directions of a subject in realistic
scenario, a gait recognition method must be robust against variation in view. Hence, the the-
sis presents a robust view-invariant multiscale gait recognition method. Finally, the thesis
proposes a gait recognition method based on low spatial and low temporal resolution video
sequences captured by a CCTV. The computational complexity of each method is analysed.
Experimental analyses on public datasets demonstrate the efficacy of the proposed methods
Recommended from our members
Artificial intelligence system for continuous affect estimation from naturalistic human expressions
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe analysis and automatic affect estimation system from human expression has been acknowledged as an active research topic in computer vision community. Most reported affect recognition systems, however, only consider subjects performing well-defined acted expression, in a very controlled condition, so they are not robust enough for real-life recognition tasks with subject variation, acoustic surrounding and illumination change. In this thesis, an artificial intelligence system is proposed to continuously (represented along a continuum e.g., from -1 to +1) estimate affect behaviour in terms of latent dimensions (e.g., arousal and valence) from naturalistic human expressions. To tackle the issues, feature representation and machine
learning strategies are addressed. In feature representation, human expression is represented by modalities such as audio, video, physiological signal and text modality. Hand- crafted features is extracted from each modality per frame, in order to match with consecutive affect label. However, the features extracted maybe missing information due to several factors such as background noise or lighting condition. Haar Wavelet Transform is employed to determine if noise cancellation mechanism in feature space should be considered in the design of affect estimation system. Other than hand-crafted features, deep learning features are also analysed in terms of the layer-wise; convolutional and fully connected layer. Convolutional Neural Network
such as AlexNet, VGGFace and ResNet has been selected as deep learning architecture to do feature extraction on top of facial expression images. Then, multimodal fusion scheme is applied by fusing deep learning feature and hand-crafted feature together to improve the performance. In machine learning strategies, two-stage regression approach is introduced. In the first stage, baseline regression methods such as Support Vector Regression are applied to estimate each affect per time. Then in the second stage, subsequent model such as Time Delay Neural Network, Long Short-Term Memory and Kalman Filter is proposed to model the
temporal relationships between consecutive estimation of each affect. In doing so, the temporal information employed by a subsequent model is not biased by high variability present in consecutive frame and at the same time, it allows the network to exploit the slow changing dynamic between emotional dynamic more efficiently. Following of two-stage regression approach for unimodal affect analysis, fusion information from different modalities is elaborated. Continuous emotion recognition in-the-wild is leveraged by investigating mathematical modelling for each emotion dimension. Linear Regression, Exponent Weighted Decision Fusion and Multi-Gene Genetic Programming are implemented to quantify the relationship between each modality. In summary, the research work presented in this thesis reveals a fundamental approach to automatically estimate affect value continuously from naturalistic human expression. The proposed system, which consists of feature smoothing, deep learning feature, two-stage regression framework and fusion using mathematical equation between modalities is demonstrated. It offers strong basis towards the development artificial intelligent system on estimation continuous affect estimation, and more broadly towards building a real-time
emotion recognition system for human-computer interaction.Majlis Amanah Rakyat (MARA), Malaysi
Artificial Intelligence Tools for Facial Expression Analysis.
Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier
- …