295 research outputs found

    Uncertain centroid based partitional clustering of uncertain data

    Full text link

    A Short Survey on Data Clustering Algorithms

    Full text link
    With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end

    A STUDY ON ROUGH CLUSTERING

    Get PDF

    BigFCM: Fast, Precise and Scalable FCM on Hadoop

    Full text link
    Clustering plays an important role in mining big data both as a modeling technique and a preprocessing step in many data mining process implementations. Fuzzy clustering provides more flexibility than non-fuzzy methods by allowing each data record to belong to more than one cluster to some degree. However, a serious challenge in fuzzy clustering is the lack of scalability. Massive datasets in emerging fields such as geosciences, biology and networking do require parallel and distributed computations with high performance to solve real-world problems. Although some clustering methods are already improved to execute on big data platforms, but their execution time is highly increased for large datasets. In this paper, a scalable Fuzzy C-Means (FCM) clustering named BigFCM is proposed and designed for the Hadoop distributed data platform. Based on the map-reduce programming model, it exploits several mechanisms including an efficient caching design to achieve several orders of magnitude reduction in execution time. Extensive evaluation over multi-gigabyte datasets shows that BigFCM is scalable while it preserves the quality of clustering

    Segmenting Images Using Hybridization of K-Means and Fuzzy C-Means Algorithms

    Get PDF
    Image segmentation is an essential technique of image processing for analyzing an image by partitioning it into non-overlapped regions each region referring to a set of pixels. Image segmentation approaches can be divided into four categories. They are thresholding, edge detection, region extraction and clustering. Clustering techniques can be used for partitioning datasets into groups according to the homogeneity of data points. The present research work proposes two algorithms involving hybridization of K-Means (KM) and Fuzzy C-Means (FCM) techniques as an attempt to achieve better clustering results. Along with the proposed hybrid algorithms, the present work also experiments with the standard K-Means and FCM algorithms. All the algorithms are experimented on four images. CPU Time, clustering fitness and sum of squared errors (SSE) are computed for measuring clustering performance of the algorithms. In all the experiments it is observed that the proposed hybrid algorithm KMandFCM is consistently producing better clustering results

    Improving a Particle Swarm Optimization-based Clustering Method

    Get PDF
    This thesis discusses clustering related works with emphasis on Particle Swarm Optimization (PSO) principles. Specifically, we review in detail the PSO clustering algorithm proposed by Van Der Merwe & Engelbrecht, the particle swarm clustering (PSC) algorithm proposed by Cohen & de Castro, Szabo’s modified PSC (mPSC), and Georgieva & Engelbrecht’s Cooperative-Multi-Population PSO (CMPSO). In this thesis, an improvement over Van Der Merwe & Engelbrecht’s PSO clustering has been proposed and tested for standard datasets. The improvements observed in those experiments vary from slight to moderate, both in terms of minimizing the cost function, and in terms of run time
    • …
    corecore