7 research outputs found

    Learning Density Models via Structured Latent Variables

    Get PDF
    As one principal approach to machine learning and cognitive science, the probabilistic framework has been continuously developed both theoretically and practically. Learning a probabilistic model can be thought of as inferring plausible models to explain observed data. The learning process exploits random variables as building blocks which are held together with probabilistic relationships. The key idea behind latent variable models is to introduce latent variables as powerful attributes (setting/instrument) to reveal data structures and explore underlying features which can sensitively describe the real-world data. The classical research approaches engage shallow architectures, including latent feature models and finite mixtures of latent variable models. Within the classical frameworks, we should make certain assumptions about the form, structure, and distribution of the data. Since the shallow form may not describe the data structures sufficiently, new types of latent structures are promptly developed with the probabilistic frameworks. In this line, three main research interests are sparked, including infinite latent feature models, mixtures of the mixture models, and deep models. This dissertation summarises our work which is advancing the state-of-the-art in both classical and emerging areas. In the first block, a finite latent variable model with the parametric priors is presented for clustering and is further extended into a two-layer mixture model for discrimination. These models embed the dimensionality reduction in their learning tasks by designing a latent structure called common loading. Referred to as the joint learning models, these models attain more appropriate low-dimensional space that better matches the learning task. Meanwhile, the parameters are optimised simultaneously for both the low-dimensional space and model learning. However, these joint learning models must assume the fixed number of features as well as mixtures, which are normally tuned and searched using a trial and error approach. In general, the simpler inference can be performed by fixing more parameters. However, the fixed parameters will limit the flexibility of models, and false assumptions could even derive incorrect inferences from the data. Thus, a richer model is allowed for reducing the number of assumptions. Therefore an infinite tri-factorisation structure is proposed with non-parametric priors in the second block. This model can automatically determine an optimal number of features and leverage the interrelation between data and features. In the final block, we introduce how to promote the shallow latent structures model to deep structures to handle the richer structured data. This part includes two tasks: one is a layer-wise-based model, another is a deep autoencoder-based model. In a deep density model, the knowledge of cognitive agents can be modelled using more complex probability distributions. At the same time, inference and parameter computation procedure are straightforward by using a greedy layer-wise algorithm. The deep autoencoder-based joint learning model is trained in an end-to-end fashion which does not require pre-training of the autoencoder network. Also, it can be optimised by standard backpropagation without the inference of maximum a posteriori. Deep generative models are much more efficient than their shallow architectures for unsupervised and supervised density learning tasks. Furthermore, they can also be developed and used in various practical applications

    A Novel Two-Stage Deep Learning Model for Efficient Network Intrusion Detection

    Get PDF
    The network intrusion detection system is an important tool for protecting computer networks against threats and malicious attacks. Many techniques have recently been proposed; however, these techniques face significant challenges due to the continuous emergence of new threats that are not recognized by the existing detection systems. In this paper, we propose a novel two-stage deep learning model based on a stacked auto-encoder with a soft-max classifier for efficient network intrusion detection. The model comprises two decision stages: an initial stage responsible for classifying network traffic as normal or abnormal using a probability score value. This is then used in the final decision stage as an additional feature for detecting the normal state and other classes of attacks. The proposed model is able to learn useful feature representations from large amounts of unlabeled data and classifies them automatically and efficiently. To evaluate and test the effectiveness of the proposed model, several experiments are conducted on two public datasets: an older benchmark dataset, the KDD99, and a newer one, the UNSW-NB15. The comparative experimental results demonstrate that our proposed model significantly outperforms the existing models and methods and achieves high recognition rates, up to 99.996% and 89.134%, for the KDD99 and UNSW-NB15 datasets, respectively. We conclude that our model has the potential to serve as a future benchmark for deep learning and network security research communities

    Discriminative and Generative Learning with Style Information

    Get PDF
    Conventional machine learning approaches usually assume that the patterns follow the identical and independent distribution (i.i.d.). However, in many empirical cases, such condition might be violated when data are equipped with diverse and inconsistent style information. The effectiveness of those traditional predictors may be limited due to the violation of the i.i.d. assumption brought by the existence of the style inconsistency. In this thesis, we investigate how the style information can be appropriately utilized for further lifting up the performance of machine learning models. It is fulfilled by not only introducing the style information into some state-of-the-art models, some new architectures, frameworks are also designed and implemented with specific purposes to make proper use of the style information. The main work is listed as the following summaries: First, the idea of the style averaging is initially introduced by an example of an image process based sunglasses recovery algorithm to perform robust one-shot facial expression recognition task. It is named as Style Elimination Transformation (SET). By recovering the pixels corrupted by the dark colors of the sunglasses brought by the proposed algorithm, the classification performance is promoted on several state-of-the-art machine learning classifiers even in a one-shot training setting. Then the investigation of the style normalization and style neutralization is investigated with both discriminative and generative machine learning approaches respectively. In discriminative learning models with style information, the style normalization transformation (SNT) is integrated into the support vector machines (SVM) for both classification and regression, named as the field support vector classification (F-SVC) and field support vector regression (F-SVR) respectively. The SNT can be represented with the nonlinearity by mapping the sufficiently complicated style information to the high-dimensional reproducing kernel Hilbert space. The learned SNT would normalize the inconsistent style information, producing i.i.d. examples, on which the SVM will be applied. Furthermore, a self-training based transductive framework will be introduced to incorporate with the unseen styles during training. The transductive SNT (T-SNT) is learned by transferring the trained styles to the unknown ones. Besides, in generative learning with style information, the style neutralization generative adversarial classifier (SN-GAC) is investigated to incorporate with the style information when performing the classification. As a neural network based framework, the SN-GAC enables the nonlinear mapping due to the nature of the nonlinearity of the neural network transformation with the generative manner. As a generalized and novel classification framework, it is capable of synthesizing style-neutralized high-quality humanunderstandable patterns given any style-inconsistent ones. Being learned with the adversarial training strategy in the first step, the final classification performance will be further promoted by fine-tuning the classifier when those style-neutralized examples can be well generated. Finally, the reversed task of the upon-mentioned style neutralization in the SN-GAC model, namely, the generation of arbitrary-style patterns, is also investigated in this thesis. By introducing the W-Net, a deep architecture upgraded from the famous U-Net model for image-to-image translation tasks, the few-shot (even the one-shot) arbitrary-style Chinese character generation task will be fulfilled. Same as the SN-GAC model, the W-Net is also trained with the adversarial training strategy proposed by the generative adversarial network. Such W-Net architecture is capable of generating any Chinese characters with the similar style as those given a few, or even one single, stylized examples. For all the proposed algorithms, frameworks, and models mentioned above for both the prediction and generation tasks, the inconsistent style information is taken into appropriate consideration. Inconsistent sunglasses information is eliminated by an image processing based sunglasses recovery algorithm in the SET, producing style-consistent patterns. The facial expression recognition is performed based on those transformed i.i.d. examples. The SNT is integrated into the SVM model, normalizing the inconsistent style information nonlinearly with the kernelized mapping. The T-SNT further enables the field prediction on those unseen styles during training. In the SN-GAC model, the style neutralization is performed by the neural network based upgraded U-Net architecture. Trained with separated steps with the adversarial optimization strategy included, it produces the high-quality style-neutralized i.i.d. patterns. The following classification is learned to produce superior performance with no additional computation involved. The W-Net architecture enables the free manipulation of the style data generation task with only a few, or even one single, style reference(s) available. It makes the Few-shot, or even the One-shot, Chinese Character Generation with the Arbitrary-style information task to be realized. Such appealing property is hardly seen in the literature

    Discriminative and Generative Learning with Style Information

    Get PDF
    Conventional machine learning approaches usually assume that the patterns follow the identical and independent distribution (i.i.d.). However, in many empirical cases, such condition might be violated when data are equipped with diverse and inconsistent style information. The effectiveness of those traditional predictors may be limited due to the violation of the i.i.d. assumption brought by the existence of the style inconsistency. In this thesis, we investigate how the style information can be appropriately utilized for further lifting up the performance of machine learning models. It is fulfilled by not only introducing the style information into some state-of-the-art models, some new architectures, frameworks are also designed and implemented with specific purposes to make proper use of the style information. The main work is listed as the following summaries: First, the idea of the style averaging is initially introduced by an example of an image process based sunglasses recovery algorithm to perform robust one-shot facial expression recognition task. It is named as Style Elimination Transformation (SET). By recovering the pixels corrupted by the dark colors of the sunglasses brought by the proposed algorithm, the classification performance is promoted on several state-of-the-art machine learning classifiers even in a one-shot training setting. Then the investigation of the style normalization and style neutralization is investigated with both discriminative and generative machine learning approaches respectively. In discriminative learning models with style information, the style normalization transformation (SNT) is integrated into the support vector machines (SVM) for both classification and regression, named as the field support vector classification (F-SVC) and field support vector regression (F-SVR) respectively. The SNT can be represented with the nonlinearity by mapping the sufficiently complicated style information to the high-dimensional reproducing kernel Hilbert space. The learned SNT would normalize the inconsistent style information, producing i.i.d. examples, on which the SVM will be applied. Furthermore, a self-training based transductive framework will be introduced to incorporate with the unseen styles during training. The transductive SNT (T-SNT) is learned by transferring the trained styles to the unknown ones. Besides, in generative learning with style information, the style neutralization generative adversarial classifier (SN-GAC) is investigated to incorporate with the style information when performing the classification. As a neural network based framework, the SN-GAC enables the nonlinear mapping due to the nature of the nonlinearity of the neural network transformation with the generative manner. As a generalized and novel classification framework, it is capable of synthesizing style-neutralized high-quality humanunderstandable patterns given any style-inconsistent ones. Being learned with the adversarial training strategy in the first step, the final classification performance will be further promoted by fine-tuning the classifier when those style-neutralized examples can be well generated. Finally, the reversed task of the upon-mentioned style neutralization in the SN-GAC model, namely, the generation of arbitrary-style patterns, is also investigated in this thesis. By introducing the W-Net, a deep architecture upgraded from the famous U-Net model for image-to-image translation tasks, the few-shot (even the one-shot) arbitrary-style Chinese character generation task will be fulfilled. Same as the SN-GAC model, the W-Net is also trained with the adversarial training strategy proposed by the generative adversarial network. Such W-Net architecture is capable of generating any Chinese characters with the similar style as those given a few, or even one single, stylized examples. For all the proposed algorithms, frameworks, and models mentioned above for both the prediction and generation tasks, the inconsistent style information is taken into appropriate consideration. Inconsistent sunglasses information is eliminated by an image processing based sunglasses recovery algorithm in the SET, producing style-consistent patterns. The facial expression recognition is performed based on those transformed i.i.d. examples. The SNT is integrated into the SVM model, normalizing the inconsistent style information nonlinearly with the kernelized mapping. The T-SNT further enables the field prediction on those unseen styles during training. In the SN-GAC model, the style neutralization is performed by the neural network based upgraded U-Net architecture. Trained with separated steps with the adversarial optimization strategy included, it produces the high-quality style-neutralized i.i.d. patterns. The following classification is learned to produce superior performance with no additional computation involved. The W-Net architecture enables the free manipulation of the style data generation task with only a few, or even one single, style reference(s) available. It makes the Few-shot, or even the One-shot, Chinese Character Generation with the Arbitrary-style information task to be realized. Such appealing property is hardly seen in the literature

    Learning Latent Features With Infinite Nonnegative Binary Matrix Trifactorization

    No full text
    Nonnegative matrix factorization (NMF) has been widely exploited in many computational intelligence and pattern recognition problems. In particular, it can be used to extract latent features from data. However, previous NMF models often assume a fixed number of features, which are normally tuned and searched using a trial and error approach. Learning binary features is also difficult, since the binary matrix posits a more challenging optimization problem. In this paper, we propose a new Bayesian model, termed the infinite nonnegative binary matrix trifactorization (iNBMT) model. This can automatically learn both latent binary features and feature numbers, based on the Indian buffet process (IBP). It exploits a trifactorization process that decomposes the nonnegative matrix into a product of three components: two binary matrices and a nonnegative real matrix. In contrast to traditional bifactorization, trifactorization can better reveal latent structures among samples and features. Specifically, an IBP prior is imposed on two infinite binary matrices, while a truncated Gaussian distribution is assumed on the weight matrix. To optimize the model, we develop a modified variational-Bayesian algorithm, with iteration complexity one order lower than the recently proposed maximization-expectation-IBP model [1] and the correlated IBP-IBP model [2]. A series of simulation experiments are carried out, both qualitatively and quantitatively, using benchmark feature extraction, reconstruction, and clustering tasks. Comparative results show that our proposed iNBMT model significantly outperforms state-of-the-art algorithms on a range of synthetic and real-world data. The new Bayesian model can thus serve as a benchmark technique for the computational intelligence research community
    corecore