3 research outputs found

    Ensemble based distributed k-harmonic means clustering

    No full text
    Abstract—Due to the explosion in the number of autonomous data sources, there is a growing need for effective approaches for distributed knowledge discovery and data mining. The distributed clustering algorithm is used to cluster the distributed datasets without necessarily downloading all the data to a single site. K-Means is used as a popular clustering method due to its simplicity and high speed in clustering large datasets. The dependency of the K-Means performance on the initialization of centroids is a major problem. Similarly, distributed clustering algorithm based on K-Means is also sensitive to centroid initialization. It is demonstrated that K-Harmonic Means is essentially insensitive to centroid initialization. In this paper, a novel ensemble based distributed clustering algorithm using K-Harmonic Means is proposed. The simulated experiments described in this paper confirm robust performance of the proposed algorithm

    Ensemble based Distributed K-Modes Clustering

    No full text
    <p>Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.</p
    corecore