141 research outputs found

    Finite Bivariate and Multivariate Beta Mixture Models Learning and Applications

    Get PDF
    Finite mixture models have been revealed to provide flexibility for data clustering. They have demonstrated high competence and potential to capture hidden structure in data. Modern technological progresses, growing volumes and varieties of generated data, revolutionized computers and other related factors are contributing to produce large scale data. This fact enhances the significance of finding reliable and adaptable models which can analyze bigger, more complex data to identify latent patterns, deliver faster and more accurate results and make decisions with minimal human interaction. Adopting the finest and most accurate distribution that appropriately represents the mixture components is critical. The most widely adopted generative model has been the Gaussian mixture. In numerous real-world applications, however, when the nature and structure of data are non-Gaussian, this modelling fails. One of the other crucial issues when using mixtures is determination of the model complexity or number of mixture components. Minimum message length (MML) is one of the main techniques in frequentist frameworks to tackle this challenging issue. In this work, we have designed and implemented a finite mixture model, using the bivariate and multivariate Beta distributions for cluster analysis and demonstrated its flexibility in describing the intrinsic characteristics of the observed data. In addition, we have applied our estimation and model selection algorithms to synthetic and real datasets. Most importantly, we considered interesting applications such as in image segmentation, software modules defect prediction, spam detection and occupancy estimation in smart buildings

    Variational Approaches For Learning Finite Scaled Dirichlet Mixture Models

    Get PDF
    With a massive amount of data created on a daily basis, the ubiquitous demand for data analysis is undisputed. Recent development of technology has made machine learning techniques applicable to various problems. Particularly, we emphasize on cluster analysis, an important aspect of data analysis. Recent works with excellent results on the aforementioned task using finite mixture models have motivated us to further explore their extents with different applications. In other words, the main idea of mixture model is that the observations are generated from a mixture of components, in each of which the probability distribution should provide strong flexibility in order to fit numerous types of data. Indeed, the Dirichlet family of distributions has been known to achieve better clustering performances than those of Gaussian when the data are clearly non-Gaussian, especially proportional data.  Thus, we introduce several variational approaches for finite Scaled Dirichlet mixture models. The proposed algorithms guarantee reaching convergence while avoiding the computational complexity of conventional Bayesian inference. In summary, our contributions are threefold. First, we propose a variational Bayesian learning framework for finite Scaled Dirichlet mixture models, in which the parameters and complexity of the models are naturally estimated through the process of minimizing the Kullback-Leibler (KL) divergence between the approximated posterior distribution and the true one. Secondly, we integrate component splitting into the first model, a local model selection scheme, which gradually splits the components based on their mixing weights to obtain the optimal number of components. Finally, an online variational inference framework for finite Scaled Dirichlet mixture models is developed by employing a stochastic approximation method in order to improve the scalability of finite mixture models for handling large scale data in real time. The effectiveness of our models is validated with real-life challenging problems including object, texture, and scene categorization, text-based and image-based spam email detection

    A Study on Variational Component Splitting approach for Mixture Models

    Get PDF
    Increase in use of mobile devices and the introduction of cloud-based services have resulted in the generation of enormous amount of data every day. This calls for the need to group these data appropriately into proper categories. Various clustering techniques have been introduced over the years to learn the patterns in data that might better facilitate the classification process. Finite mixture model is one of the crucial methods used for this task. The basic idea of mixture models is to fit the data at hand to an appropriate distribution. The design of mixture models hence involves finding the appropriate parameters of the distribution and estimating the number of clusters in the data. We use a variational component splitting framework to do this which could simultaneously learn the parameters of the model and estimate the number of components in the model. The variational algorithm helps to overcome the computational complexity of purely Bayesian approaches and the over fitting problems experienced with Maximum Likelihood approaches guaranteeing convergence. The choice of distribution remains the core concern of mixture models in recent research. The efficiency of Dirichlet family of distributions for this purpose has been proved in latest studies especially for non-Gaussian data. This led us to study the impact of variational component splitting approach on mixture models based on several distributions. Hence, our contribution is the application of variational component splitting approach to design finite mixture models based on inverted Dirichlet, generalized inverted Dirichlet and inverted Beta-Liouville distributions. In addition, we also incorporate a simultaneous feature selection approach for generalized inverted Dirichlet mixture model along with component splitting as another experimental contribution. We evaluate the performance of our models with various real-life applications such as object, scene, texture, speech and video categorization

    Bayesian Learning Frameworks for Multivariate Beta Mixture Models

    Get PDF
    Mixture models have been widely used as a statistical learning paradigm in various unsupervised machine learning applications, where labeling a vast amount of data is impractical and costly. They have shown a significant success and encouraging performance in many real-world problems from different fields such as computer vision, information retrieval and pattern recognition. One of the most widely used distributions in mixture models is Gaussian distribution, due to its characteristics, such as its simplicity and fitting capabilities. However, data obtained from some applications could have different properties like non-Gaussian and asymmetric nature. In this thesis, we propose multivariate Beta mixture models which offer flexibility, various shapes with promising attributes. These models can be considered as decent alternatives to Gaussian distributions. We explore multiple Bayesian inference approaches for multivariate Beta mixture models and propose a suitable solution for the problem of estimating parameters using Markov Chain Monte Carlo (MCMC) technique. We exploit Gibbs sampling within Metropolis-Hastings for learning parameters of our finite mixture model. Moreover, a fully Bayesian approach based on birth-death MCMC technique is proposed which simultaneously allows cluster assignments, parameters estimation and the selection of the optimal number of clusters. Finally, we develop a nonparametric Bayesian framework by extending our finite mixture model to infinity using Dirichlet process to tackle the model selection problem. Experimental results obtained from challenging applications (e.g., intrusion detection, medical, etc.) confirm that our proposed frameworks can provide effective solutions comparing to existing alternatives

    Variational Learning for Finite Shifted-Scaled Dirichlet Mixture Model and Its Applications

    Get PDF
    With the huge amount of data produced every day, the interest in data mining and machine learning techniques has been growing. Ongoing advancement of technology has made AI systems subject to different issues. Data clustering is an important aspect of data analysis which is the process of grouping similar observations in the same subset. Among known clustering techniques, finite mixture models have led to outstanding results that created an inspiration toward further exploration of various mixture models and applications. The main idea of this clustering technique is to fit a mixture of components generated from a predetermined probability distribution into the data through parameter approximation of the components. Therefore, choosing a proper distribution based on the type of the data is another crucial step in data analysis. Although the Gaussian distribution has been widely used with mixture models, the Dirichlet family of distributions have been known to achieve better results particularly when dealing with proportional and non-Gaussian data. Another crucial part in statistical modelling is the learning process. Among the conventional estimation approaches, Maximum Likelihood (ML) is widely used due to its simplicity in terms of implementation but it has some drawbacks, too. Bayesian approach has overcome some of the disadvantages of ML approach via taking prior knowledge into account. However, it creates new issues such as need for additional estimation methods due to the intractability of parameters' marginal probabilities. In this thesis, these limitations are discussed and addressed via defining a variational learning framework for finite shifted-scaled Dirichlet mixture model. The motivation behind applying variational inference is that compared to conventional Bayesian approach, it is much less computationally costly. Furthermore, in this method, the optimal number of components is estimated along with the parameter approximation automatically and simultaneously while convergence is guaranteed. The performance of our model, in terms of accuracy of clustering, is validated on real world challenging medical applications, including image processing, namely, Malaria detection, breast cancer diagnosis and cardiovascular disease detection as well as text-based spam email detection. Finally, in order to evaluate the merits of our model effectiveness, it is compared with four other widely used methods

    Multidimensional Proportional Data Clustering Using Shifted-Scaled Dirichlet Model

    Get PDF
    We have designed and implemented an unsupervised learning algorithm for a finite mixture model of shifted-scaled Dirichlet distributions for the cluster analysis of multivariate proportional data. The cluster analysis task involves model selection using Minimum Message Length to discover the number of natural groupings a dataset is composed of. Also, it involves an estimation step for the model parameters using the expectation maximization framework. This thesis aims to improve the flexibility of the widely used Dirichlet model by adding another set of parameters for the location (beside the scale parameter) We have applied our estimation and model selection algorithm to synthetic generated data, real data and software modules defect prediction. The experimental results show the merits of the shifted scaled Dirichlet mixture model performance in comparison to previously used generative models

    High-dimensional Sparse Count Data Clustering Using Finite Mixture Models

    Get PDF
    Due to the massive amount of available digital data, automating its analysis and modeling for different purposes and applications has become an urgent need. One of the most challenging tasks in machine learning is clustering, which is defined as the process of assigning observations sharing similar characteristics to subgroups. Such a task is significant, especially in implementing complex algorithms to deal with high-dimensional data. Thus, the advancement of computational power in statistical-based approaches is increasingly becoming an interesting and attractive research domain. Among the successful methods, mixture models have been widely acknowledged and successfully applied in numerous fields as they have been providing a convenient yet flexible formal setting for unsupervised and semi-supervised learning. An essential problem with these approaches is to develop a probabilistic model that represents the data well by taking into account its nature. Count data are widely used in machine learning and computer vision applications where an object, e.g., a text document or an image, can be represented by a vector corresponding to the appearance frequencies of words or visual words, respectively. Thus, they usually suffer from the well-known curse of dimensionality as objects are represented with high-dimensional and sparse vectors, i.e., a few thousand dimensions with a sparsity of 95 to 99%, which decline the performance of clustering algorithms dramatically. Moreover, count data systematically exhibit the burstiness and overdispersion phenomena, which both cannot be handled with a generic multinomial distribution, typically used to model count data, due to its dependency assumption. This thesis is constructed around six related manuscripts, in which we propose several approaches for high-dimensional sparse count data clustering via various mixture models based on hierarchical Bayesian modeling frameworks that have the ability to model the dependency of repetitive word occurrences. In such frameworks, a suitable distribution is used to introduce the prior information into the construction of the statistical model, based on a conjugate distribution to the multinomial, e.g. the Dirichlet, generalized Dirichlet, and the Beta-Liouville, which has numerous computational advantages. Thus, we proposed a novel model that we call the Multinomial Scaled Dirichlet (MSD) based on using the scaled Dirichlet as a prior to the multinomial to allow more modeling flexibility. Although these frameworks can model burstiness and overdispersion well, they share similar disadvantages making their estimation procedure is very inefficient when the collection size is large. To handle high-dimensionality, we considered two approaches. First, we derived close approximations to the distributions in a hierarchical structure to bring them to the exponential-family form aiming to combine the flexibility and efficiency of these models with the desirable statistical and computational properties of the exponential family of distributions, including sufficiency, which reduce the complexity and computational efforts especially for sparse and high-dimensional data. Second, we proposed a model-based unsupervised feature selection approach for count data to overcome several issues that may be caused by the high dimensionality of the feature space, such as over-fitting, low efficiency, and poor performance. Furthermore, we handled two significant aspects of mixture based clustering methods, namely, parameters estimation and performing model selection. We considered the Expectation-Maximization (EM) algorithm, which is a broadly applicable iterative algorithm for estimating the mixture model parameters, with incorporating several techniques to avoid its initialization dependency and poor local maxima. For model selection, we investigated different approaches to find the optimal number of components based on the Minimum Message Length (MML) philosophy. The effectiveness of our approaches is evaluated using challenging real-life applications, such as sentiment analysis, hate speech detection on Twitter, topic novelty detection, human interaction recognition in films and TV shows, facial expression recognition, face identification, and age estimation

    Distribution-based Regression for Count and Semi-Bounded Data

    Get PDF
    Data mining techniques have been successfully utilized in different applications of significant fields, including pattern recognition, computer vision, medical researches, etc. With the wealth of data generated every day, there is a lack of practical analysis tools to discover hidden relationships and trends. Among all statistical frameworks, regression has been proven to be one of the most strong tools in prediction. The complexity of data that is unfavorable for most models is a considerable challenge in prediction. The ability of a model to perform accurately and efficiently is extremely important. Thus, a model must be selected to fit the data well, such that the learning from previous data is efficient and highly accurate. This work is motivated by the limited number of regression analysis tools for multivariate count data in the literature. We propose two regression models for count data based on flexible distributions, namely, the multinomial Beta-Liouville and multinomial scaled Dirichlet, and evaluate them in the problem of disease diagnosis. The performance is measured based on the accuracy of the prediction, which depends on the nature and complexity of the dataset. Our results show the efficiency of the two proposed regression models where the prediction performance of both models is competitive to other previously used regression approaches for count data and to the best results in the literature. Then, we propose three regression models for positive vectors based on flexible distributions for semi-bounded data, namely, inverted Dirichlet, inverted generalize Dirichlet, and inverted Beta-Liouville. The efficiency of these models is tested via real-world applications, including software defects prediction, spam filtering, and disease diagnosis. Our results show that the performance of the three proposed regression models is better than other commonly used regression models

    Distributional Feature Mapping in Data Classification

    Get PDF
    Performance of a machine learning algorithm depends on the representation of the input data. In computer vision problems, histogram based feature representation has significantly improved the classification tasks. L1 normalized histograms can be modelled by Dirichlet and related distributions to transform input space to feature space. We propose a mapping technique that contains prior knowledge about the distribution of the data and increases the discriminative power of the classifiers in supervised learning such as Support Vector Machine (SVM). The mapping technique for proportional data which is based on Dirichlet, Generalized Dirichlet, Beta Liouville, scaled Dirichlet and shifted scaled Dirichlet distributions can be incorporated with traditional kernels to improve the base kernels accuracy. Experimental results show that the proposed technique for proportional data increases accuracy for machine vision tasks such as natural scene recognition, satellite image classification, gender classification, facial expression recognition and human action recognition in videos. In addition, in object tracking, learning parametric features of the target object using Dirichlet and related distributions may help to capture representations invariant to noise. This further motivated our study of such distributions in object tracking. We propose a framework for feature representation on probability simplex for proportional data utilizing the histogram representation of the target object at initial frame. A set of parameter vectors determine the appearance features of the target object in the subsequent frames. Motivated by the success of distribution based feature mapping for proportional data, we extend this technique for semi-bounded data utilizing inverted Dirichlet, generalized inverted Dirichlet and inverted Beta Liouville distributions. Similar approach is taken into account for count data where Dirichlet multinomial and generalized Dirichlet multinomial distributions are used to map density features with input features

    Novel Mixture Allocation Models for Topic Learning

    Get PDF
    Unsupervised learning has been an interesting area of research in recent years. Novel algorithms are being built on the basis of unsupervised learning methodologies to solve many real world problems. Topic modelling is one such fascinating methodology that identifies patterns as topics within data. Introduction of latent Dirichlet Allocation (LDA) has bolstered research on topic modelling approaches with modifications specific to the application. However, the basic assumption of a Dirichlet prior in LDA for topic proportions, might not be applicable in certain real world scenarios. Hence, in this thesis we explore the use of generalized Dirichlet (GD) and Beta-Liouville (BL) as alternative priors for topic proportions. In addition, we assume a mixture of distributions over topic proportions which provides better fit to the data. In order to accommodate application of the resulting models to real-time streaming data, we also provide an online learning solution for the models. A supervised version of the learning framework is also provided and is shown to be advantageous when labelled data are available. There is a slight chance that the topics thus derived may not be that accurate. In order to alleviate this problem, we integrate an interactive approach which uses inputs from the user to improve the quality of identified topics. We have also tweaked our models to be applied for interesting applications such as parallel topics extraction from multilingual texts and content based recommendation systems proving the adaptability of our proposed models. In the case of multilingual topic extraction, we use global topic proportions sampled from a Dirichlet process (DP) to tackle the problem and in the case of recommendation systems, we use the co-occurrences of words to our advantage. For inference, we use a variational approach which makes computation of variational solutions easier. The applications we validated our models with, show the efficiency of proposed models
    corecore