2 research outputs found

    Bayesian Learning of Asymmetric Gaussian-Based Statistical Models using Markov Chain Monte Carlo Techniques

    Get PDF
    A novel unsupervised Bayesian learning framework based on asymmetric Gaussian mixture (AGM) statistical model is proposed since AGM is shown to be more effective compared to the classic Gaussian mixture. The Bayesian learning framework is developed by adopting sampling-based Markov chain Monte Carlo (MCMC) methodology. More precisely, the fundamental learning algorithm is a hybrid Metropolis-Hastings within Gibbs sampling solution which is integrated within a reversible jump MCMC (RJMCMC) learning framework, a self-adapted sampling-based MCMC implementation, that enables model transfer throughout the mixture parameters learning process, therefore, automatically converges to the optimal number of data groups. Furthermore, a feature selection technique is included to tackle the irrelevant and unneeded information from datasets. The performance comparison between AGM and other popular solutions is given and both synthetic and real data sets extracted from challenging applications such as intrusion detection, spam filtering and image categorization are evaluated to show the merits of the proposed approach

    Bounded Support Finite Mixtures for Multidimensional Data Modeling and Clustering

    Get PDF
    Data is ever increasing with today’s many technological advances in terms of both quantity and dimensions. Such inflation has posed various challenges in statistical and data analysis methods and hence requires the development of new powerful models for transforming the data into useful information. Therefore, it was necessary to explore and develop new ideas and techniques to keep pace with challenging learning applications in data analysis, modeling and pattern recognition. Finite mixture models have received considerable attention due to their ability to effectively and efficiently model high dimensional data. In mixtures, choice of distribution is a critical issue and it has been observed that in many real life applications, data exist in a bounded support region, whereas distributions adopted to model the data lie in unbounded support regions. Therefore, it was proposed to define bounded support distributions in mixtures and introduce a modified procedure for parameters estimation by considering the bounded support of underlying distributions. The main goal of this thesis is to introduce bounded support mixtures, their parameters estimation, automatic determination of number of mixture components and application of mixtures in feature extraction techniques to overall improve the learning pipeline. Five different unbounded support distributions are selected for applying the idea of bounded support mixtures and modified parameters estimation using maximum likelihood via Expectation-Maximization (EM). Probability density functions selected for this thesis include Gaussian, Laplace, generalized Gaussian, asymmetric Gaussian and asymmetric generalized Gaussian distributions, which are chosen due to their flexibility and broad applications in speech and image processing. The proposed bounded support mixtures are applied in various speech and images datasets to create leaning applications to demonstrate the effectiveness of proposed approach. Mixtures of bounded Gaussian and bounded Laplace are also applied in feature extraction and data representation techniques, which further improves the learning and modeling capability of underlying models. The proposed feature representation via bounded support mixtures is applied in both speech and images datasets to examine its performance. Automatic selection of number of mixture components is very important in clustering and parameter learning is highly dependent on model selection and it is proposed for mixture of bounded Gaussian and bounded asymmetric generalized Gaussian using minimum message length. Proposed model selection criterion and parameter learning are simultaneously applied in speech and images datasets for both models to examine the model selection performance in clustering
    corecore