428 research outputs found

    Learning Discriminative Feature Representations for Visual Categorization

    Get PDF
    Learning discriminative feature representations has attracted a great deal of attention due to its potential value and wide usage in a variety of areas, such as image/video recognition and retrieval, human activities analysis, intelligent surveillance and human-computer interaction. In this thesis we first introduce a new boosted key-frame selection scheme for action recognition. Specifically, we propose to select a subset of key poses for the representation of each action via AdaBoost and a new classifier, namely WLNBNN, is then developed for final classification. The experimental results of the proposed method are 0.6% - 13.2% better than previous work. After that, a domain-adaptive learning approach based on multiobjective genetic programming (MOGP) has been developed for image classification. In this method, a set of primitive 2-D operators are randomly combined to construct feature descriptors through the MOGP evolving and then evaluated by two objective fitness criteria, i.e., the classification error and the tree complexity. Later, the (near-)optimal feature descriptor can be obtained. The proposed approach can achieve 0.9% ∼ 25.9% better performance compared with state-of-the-art methods. Moreover, effective dimensionality reduction algorithms have also been widely used for obtaining better representations. In this thesis, we have proposed a novel linear unsupervised algorithm, termed Discriminative Partition Sparsity Analysis (DPSA), explicitly considering different probabilistic distributions that exist over the data points, simultaneously preserving the natural locality relationship among the data. All these above methods have been systematically evaluated on several public datasets, showing their accurate and robust performance (0.44% - 6.69% better than the previous) for action and image categorization. Targeting efficient image classification , we also introduce a novel unsupervised framework termed evolutionary compact embedding (ECE) which can automatically learn the task-specific binary hash codes. It is regarded as an optimization algorithm which combines the genetic programming (GP) and a boosting trick. The experimental results manifest ECE significantly outperform others by 1.58% - 2.19% for classification tasks. In addition, a supervised framework, bilinear local feature hashing (BLFH), has also been proposed to learn highly discriminative binary codes on the local descriptors for large-scale image similarity search. We address it as a nonconvex optimization problem to seek orthogonal projection matrices for hashing, which can successfully preserve the pairwise similarity between different local features and simultaneously take image-to-class (I2C) distances into consideration. BLFH produces outstanding results (0.017% - 0.149% better) compared to the state-of-the-art hashing techniques

    GESTURE RECOGNITION FOR PENCAK SILAT TAPAK SUCI REAL-TIME ANIMATION

    Get PDF
    The main target in this research is a design of a virtual martial arts training system in real-time and as a tool in learning martial arts independently using genetic algorithm methods and dynamic time warping. In this paper, it is still in the initial stages, which is focused on taking data sets of martial arts warriors using 3D animation and the Kinect sensor cameras, there are 2 warriors x 8 moves x 596 cases/gesture = 9,536 cases. Gesture Recognition Studies are usually distinguished: body gesture and hand and arm gesture, head and face gesture, and, all three can be studied simultaneously in martial arts pencak silat, using martial arts stance detection with scoring methods. Silat movement data is recorded in the form of oni files using the OpenNI â„¢ (OFW) framework and BVH (Bio Vision Hierarchical) files as well as plug-in support software on Mocap devices. Responsiveness is a measure of time responding to interruptions, and is critical because the system must be able to meet the demand

    A fuzzy probabilistic inference methodology for constrained 3D human motion classification

    Get PDF
    Enormous uncertainties in unconstrained human motions lead to a fundamental challenge that many recognising algorithms have to face in practice: efficient and correct motion recognition is a demanding task, especially when human kinematic motions are subject to variations of execution in the spatial and temporal domains, heavily overlap with each other,and are occluded. Due to the lack of a good solution to these problems, many existing methods tend to be either effective but computationally intensive or efficient but vulnerable to misclassification. This thesis presents a novel inference engine for recognising occluded 3D human motion assisted by the recognition context. First, uncertainties are wrapped into a fuzzy membership function via a novel method called Fuzzy Quantile Generation which employs metrics derived from the probabilistic quantile function. Then, time-dependent and context-aware rules are produced via a genetic programming to smooth the qualitative outputs represented by fuzzy membership functions. Finally, occlusion in motion recognition is taken care of by introducing new procedures for feature selection and feature reconstruction. Experimental results demonstrate the effectiveness of the proposed framework on motion capture data from real boxers in terms of fuzzy membership generation, context-aware rule generation, and motion occlusion. Future work might involve the extension of Fuzzy Quantile Generation in order to automate the choice of a probability distribution, the enhancement of temporal pattern recognition with probabilistic paradigms, the optimisation of the occlusion module, and the adaptation of the present framework to different application domains.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Uniscale and multiscale gait recognition in realistic scenario

    Get PDF
    The performance of a gait recognition method is affected by numerous challenging factors that degrade its reliability as a behavioural biometrics for subject identification in realistic scenario. Thus for effective visual surveillance, this thesis presents five gait recog- nition methods that address various challenging factors to reliably identify a subject in realistic scenario with low computational complexity. It presents a gait recognition method that analyses spatio-temporal motion of a subject with statistical and physical parameters using Procrustes shape analysis and elliptic Fourier descriptors (EFD). It introduces a part- based EFD analysis to achieve invariance to carrying conditions, and the use of physical parameters enables it to achieve invariance to across-day gait variation. Although spatio- temporal deformation of a subject’s shape in gait sequences provides better discriminative power than its kinematics, inclusion of dynamical motion characteristics improves the iden- tification rate. Therefore, the thesis presents a gait recognition method which combines spatio-temporal shape and dynamic motion characteristics of a subject to achieve robust- ness against the maximum number of challenging factors compared to related state-of-the- art methods. A region-based gait recognition method that analyses a subject’s shape in image and feature spaces is presented to achieve invariance to clothing variation and carry- ing conditions. To take into account of arbitrary moving directions of a subject in realistic scenario, a gait recognition method must be robust against variation in view. Hence, the the- sis presents a robust view-invariant multiscale gait recognition method. Finally, the thesis proposes a gait recognition method based on low spatial and low temporal resolution video sequences captured by a CCTV. The computational complexity of each method is analysed. Experimental analyses on public datasets demonstrate the efficacy of the proposed methods

    Artificial Intelligence Tools for Facial Expression Analysis.

    Get PDF
    Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier
    • …
    corecore