1 research outputs found

    Optimum Average Silhouette Width Clustering Methods

    Get PDF
    Cluster analysis is the search for groups of alike instances in the data. The two major problems in cluster analysis are: how many clusters are present in the data? And how can the actual clustering solution be found? We have developed a unified approach to estimate number of clusters and clustering solution mutually. This work is about theory, methodology and algorithm developed of newly proposed approach. // Average silhouette width (ASW) is a well-known index for measuring the clustering quality and for the estimation of the number of clusters. The index is in wide use across disciplines as standard practice for these tasks. In this work the clustering methodolo- gies is proposed that can itself estimate number of clusters on the fly, as well as produce the clustering against this estimated number by optimizing the ASW index. The performance of the ASW index for these two tasks are meticulously investigated. // ASW based clustering functions are proposed for the two most popular clustering domains i.e., hierarchical and non-hierarchical. The performance comparison for clustering solutions obtained from the proposed methods with a range of clustering methods has been done for the quality evaluation. // The performance comparison for the estimation of the number of clusters of the proposed methods has been made using a wide spectrum of cluster estimation indices and methods. For this, large scale studies for the estimation of the number of clusters have been conducted with well-reputed clustering methods to find out each method’s estimation performance with different indices/methods for various kinds of clustering structures. // Developing mathematical and theoretical aspects for clustering is a relatively new and challenging avenue. Recently this research domain has received considerable attention due to the present need and importance of theory of clustering. The purpose behind the theory development for clustering is to make the general nature of clustering more understandable without assuming particular data generating structures and independently from any clustering algorithm/functions. Lastly, a considerable amount of attention has been drawn towards the theory development of the ASW index in the latter part of the thesis
    corecore