236 research outputs found

    Contribution to supervised representation learning: algorithms and applications.

    Get PDF
    278 p.In this thesis, we focus on supervised learning methods for pattern categorization. In this context, itremains a major challenge to establish efficient relationships between the discriminant properties of theextracted features and the inter-class sparsity structure.Our first attempt to address this problem was to develop a method called "Robust Discriminant Analysiswith Feature Selection and Inter-class Sparsity" (RDA_FSIS). This method performs feature selectionand extraction simultaneously. The targeted projection transformation focuses on the most discriminativeoriginal features while guaranteeing that the extracted (or transformed) features belonging to the sameclass share a common sparse structure, which contributes to small intra-class distances.In a further study on this approach, some improvements have been introduced in terms of theoptimization criterion and the applied optimization process. In fact, we proposed an improved version ofthe original RDA_FSIS called "Enhanced Discriminant Analysis with Class Sparsity using GradientMethod" (EDA_CS). The basic improvement is twofold: on the first hand, in the alternatingoptimization, we update the linear transformation and tune it with the gradient descent method, resultingin a more efficient and less complex solution than the closed form adopted in RDA_FSIS.On the other hand, the method could be used as a fine-tuning technique for many feature extractionmethods. The main feature of this approach lies in the fact that it is a gradient descent based refinementapplied to a closed form solution. This makes it suitable for combining several extraction methods andcan thus improve the performance of the classification process.In accordance with the above methods, we proposed a hybrid linear feature extraction scheme called"feature extraction using gradient descent with hybrid initialization" (FE_GD_HI). This method, basedon a unified criterion, was able to take advantage of several powerful linear discriminant methods. Thelinear transformation is computed using a descent gradient method. The strength of this approach is thatit is generic in the sense that it allows fine tuning of the hybrid solution provided by different methods.Finally, we proposed a new efficient ensemble learning approach that aims to estimate an improved datarepresentation. The proposed method is called "ICS Based Ensemble Learning for Image Classification"(EM_ICS). Instead of using multiple classifiers on the transformed features, we aim to estimate multipleextracted feature subsets. These were obtained by multiple learned linear embeddings. Multiple featuresubsets were used to estimate the transformations, which were ranked using multiple feature selectiontechniques. The derived extracted feature subsets were concatenated into a single data representationvector with strong discriminative properties.Experiments conducted on various benchmark datasets ranging from face images, handwritten digitimages, object images to text datasets showed promising results that outperformed the existing state-ofthe-art and competing methods

    Multi-view Data Analysis

    Get PDF
    Multi-view data analysis is a key technology for making effective decisions by leveraging information from multiple data sources. The process of data acquisition across various sensory modalities gives rise to the heterogeneous property of data. In my thesis, multi-view data representations are studied towards exploiting the enriched information encoded in different domains or feature types, and novel algorithms are formulated to enhance feature discriminability. Extracting informative data representation is a critical step in visual recognition and data mining tasks. Multi-view embeddings provide a new way of representation learning to bridge the semantic gap between the low-level observations and high-level human comprehensible knowledge beneļ¬tting from enriched information in multiple modalities.Recent advances on multi-view learning have introduced a new paradigm in jointly modeling cross-modal data. Subspace learning method, which extracts compact features by exploiting a common latent space and fuses multi-view information, has emerged proiminent among different categories of multi-view learning techniques. This thesis provides novel solutions in learning compact and discriminative multi-view data representations by exploiting the data structures in low dimensional subspace. We also demonstrate the performance of the learned representation scheme on a number of challenging tasks in recognition, retrieval and ranking problems.The major contribution of the thesis is a uniļ¬ed solution for subspace learning methods, which is extensible for multiple views, supervised learning, and non-linear transformations. Traditional statistical learning techniques including Canonical Correlation Analysis, Partial Least Square regression and Linear Discriminant Analysis are studied by constructing graphs of speciļ¬c forms under the same framework. Methods using non-linear transforms based on kernels and (deep) neural networks are derived, which lead to superior performance compared to the linear ones. A novel multi-view discriminant embedding method is proposed by taking the view difference into consideration. Secondly, a multiview nonparametric discriminant analysis method is introduced by exploiting the class boundary structure and discrepancy information of the available views. This allows for multiple projecion directions, by relaxing the Gaussian distribution assumption of related methods. Thirdly, we propose a composite ranking method by keeping a close correlation with the individual rankings for optimal rank fusion. We propose a multi-objective solution to ranking problems by capturing inter-view and intra-view information using autoencoderlike networks. Finally, a novel end-to-end solution is introduced to enhance joint ranking with minimum view-speciļ¬c ranking loss, so that we can achieve the maximum global view agreements within a single optimization process.In summary, this thesis aims to address the challenges in representing multi-view data across different tasks. The proposed solutions have shown superior performance in numerous tasks, including object recognition, cross-modal image retrieval, face recognition and object ranking

    Novel Deep Learning Techniques For Computer Vision and Structure Health Monitoring

    Get PDF
    This thesis proposes novel techniques in building a generic framework for both the regression and classification tasks in vastly different applications domains such as computer vision and civil engineering. Many frameworks have been proposed and combined into a complex deep network design to provide a complete solution to a wide variety of problems. The experiment results demonstrate significant improvements of all the proposed techniques towards accuracy and efficiency

    Biometric information analyses using computer vision techniques.

    Get PDF
    Biometric information analysis is derived from the analysis of a series of physical and biological characteristics of a person. It is widely regarded as the most fundamental task in the realms of computer vision and machine learning. With the overwhelming power of computer vision techniques, biometric information analysis have received increasing attention in the past decades. Biometric information can be analyzed from many sources including iris, retina, voice, ļ¬ngerprint, facial image or even the way one walks with. Facial image and gait, because of their easy availability, are two preferable sources of biometric information analysis. In this thesis, we investigated the development of most recent computer vision techniques and proposed various state-of-the-art models to solve the four principle problems in biometric information analysis including the age estimation, age progression, face retrieval and gait recognition. For age estimation, the modeling has always been a challenge. Existing works model the age estimation problem as either a classiļ¬cation or a regression problem. However, these two types of models are not able to reveal the intrinsic nature of human age. To this end, we proposed a novel hierarchical framework and a ordinal metric learning based method. In the hierarchical framework, a random forest based clustering method is introduced to ļ¬nd an optimal age grouping protocol. In the ordinal metric learning approach, the age estimation is solved by learning an subspace where the ordinal structure of the data is preserved. Both of them have achieved state-of-the-art performance. For face retrieval, speciļ¬cally under a cross-age setting, we ļ¬rst proposed a novel task, that is given two images, ļ¬nding the target image which is supposed to have the same identity with the ļ¬rst input and the same age with the second input. To tackle this task, we proposed a joint manifold learning method that can disentangle the identity with the age information. Accompanied with two independent similarity measurements, the retrieval can be easily performed. For aging progression, we also proposed a novel task that has never been considered. We devoted to fuse the identity of one image with the age of another image. By proposing a novel framework based on generative adversarial networks, our model is able to generate close-to-realistic images. Lastly, although gait recognition is an ideal long-distance biometric information task that makes up the shortfall of facial image, existing works are not able to handle large scale data with various view angles. We proposed a generative model to solve this term and achieved promising results. Moreover, our model is able to generate evidences for forensic usage

    Enabling Auditing and Intrusion Detection of Proprietary Controller Area Networks

    Get PDF
    The goal of this dissertation is to provide automated methods for security researchers to overcome ā€˜security through obscurityā€™ used by manufacturers of proprietary Industrial Control Systems (ICS). `White hat\u27 security analysts waste significant time reverse engineering these systems\u27 opaque network configurations instead of performing meaningful security auditing tasks. Automating the process of documenting proprietary protocol configurations is intended to improve independent security auditing of ICS networks. The major contributions of this dissertation are a novel approach for unsupervised lexical analysis of binary network data flows and analysis of the time series data extracted as a result. We demonstrate the utility of these methods using Controller Area Network (CAN) data sampled from passenger vehicles

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. Ā© 2011 IEEE

    Artificial Intelligence Tools for Facial Expression Analysis.

    Get PDF
    Inner emotions show visibly upon the human face and are understood as a basic guide to an individualā€™s inner world. It is, therefore, possible to determine a personā€™s attitudes and the effects of othersā€™ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial featuresā€™ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier

    A novel approach for multimodal graph dimensionality reduction

    No full text
    This thesis deals with the problem of multimodal dimensionality reduction (DR), which arises when the input objects, to be mapped on a low-dimensional space, consist of multiple vectorial representations, instead of a single one. Herein, the problem is addressed in two alternative manners. One is based on the traditional notion of modality fusion, but using a novel approach to determine the fusion weights. In order to optimally fuse the modalities, the known graph embedding DR framework is extended to multiple modalities by considering a weighted sum of the involved affinity matrices. The weights of the sum are automatically calculated by minimizing an introduced notion of inconsistency of the resulting multimodal affinity matrix. The other manner for dealing with the problem is an approach to consider all modalities simultaneously, without fusing them, which has the advantage of minimal information loss due to fusion. In order to avoid fusion, the problem is viewed as a multi-objective optimization problem. The multiple objective functions are defined based on graph representations of the data, so that their individual minimization leads to dimensionality reduction for each modality separately. The aim is to combine the multiple modalities without the need to assign importance weights to them, or at least postpone such an assignment as a last step. The proposed approaches were experimentally tested in mapping multimedia data on low-dimensional spaces for purposes of visualization, classification and clustering. The no-fusion approach, namely Multi-objective DR, was able to discover mappings revealing the structure of all modalities simultaneously, which cannot be discovered by weight-based fusion methods. However, it results in a set of optimal trade-offs, from which one needs to be selected, which is not trivial. The optimal-fusion approach, namely Multimodal Graph Embedding DR, is able to easily extend unimodal DR methods to multiple modalities, but depends on the limitations of the unimodal DR method used. Both the no-fusion and the optimal-fusion approaches were compared to state-of-the-art multimodal dimensionality reduction methods and the comparison showed performance improvement in visualization, classification and clustering tasks. The proposed approaches were also evaluated for different types of problems and data, in two diverse application fields, a visual-accessibility-enhanced search engine and a visualization tool for mobile network security data. The results verified their applicability in different domains and suggested promising directions for future advancements.Open Acces
    • ā€¦
    corecore