2 research outputs found

    Multidimensional Proportional Data Clustering Using Shifted-Scaled Dirichlet Model

    Get PDF
    We have designed and implemented an unsupervised learning algorithm for a finite mixture model of shifted-scaled Dirichlet distributions for the cluster analysis of multivariate proportional data. The cluster analysis task involves model selection using Minimum Message Length to discover the number of natural groupings a dataset is composed of. Also, it involves an estimation step for the model parameters using the expectation maximization framework. This thesis aims to improve the flexibility of the widely used Dirichlet model by adding another set of parameters for the location (beside the scale parameter) We have applied our estimation and model selection algorithm to synthetic generated data, real data and software modules defect prediction. The experimental results show the merits of the shifted scaled Dirichlet mixture model performance in comparison to previously used generative models
    corecore