295 research outputs found
A Short Survey on Data Clustering Algorithms
With rapidly increasing data, clustering algorithms are important tools for
data analytics in modern research. They have been successfully applied to a
wide range of domains; for instance, bioinformatics, speech recognition, and
financial analysis. Formally speaking, given a set of data instances, a
clustering algorithm is expected to divide the set of data instances into the
subsets which maximize the intra-subset similarity and inter-subset
dissimilarity, where a similarity measure is defined beforehand. In this work,
the state-of-the-arts clustering algorithms are reviewed from design concept to
methodology; Different clustering paradigms are discussed. Advanced clustering
algorithms are also discussed. After that, the existing clustering evaluation
metrics are reviewed. A summary with future insights is provided at the end
BigFCM: Fast, Precise and Scalable FCM on Hadoop
Clustering plays an important role in mining big data both as a modeling
technique and a preprocessing step in many data mining process implementations.
Fuzzy clustering provides more flexibility than non-fuzzy methods by allowing
each data record to belong to more than one cluster to some degree. However, a
serious challenge in fuzzy clustering is the lack of scalability. Massive
datasets in emerging fields such as geosciences, biology and networking do
require parallel and distributed computations with high performance to solve
real-world problems. Although some clustering methods are already improved to
execute on big data platforms, but their execution time is highly increased for
large datasets. In this paper, a scalable Fuzzy C-Means (FCM) clustering named
BigFCM is proposed and designed for the Hadoop distributed data platform. Based
on the map-reduce programming model, it exploits several mechanisms including
an efficient caching design to achieve several orders of magnitude reduction in
execution time. Extensive evaluation over multi-gigabyte datasets shows that
BigFCM is scalable while it preserves the quality of clustering
Segmenting Images Using Hybridization of K-Means and Fuzzy C-Means Algorithms
Image segmentation is an essential technique of image processing for analyzing an image by partitioning it into non-overlapped regions each region referring to a set of pixels. Image segmentation approaches can be divided into four categories. They are thresholding, edge detection, region extraction and clustering. Clustering techniques can be used for partitioning datasets into groups according to the homogeneity of data points. The present research work proposes two algorithms involving hybridization of K-Means (KM) and Fuzzy C-Means (FCM) techniques as an attempt to achieve better clustering results. Along with the proposed hybrid algorithms, the present work also experiments with the standard K-Means and FCM algorithms. All the algorithms are experimented on four images. CPU Time, clustering fitness and sum of squared errors (SSE) are computed for measuring clustering performance of the algorithms. In all the experiments it is observed that the proposed hybrid algorithm KMandFCM is consistently producing better clustering results
Improving a Particle Swarm Optimization-based Clustering Method
This thesis discusses clustering related works with emphasis on Particle Swarm Optimization (PSO) principles. Specifically, we review in detail the PSO clustering algorithm proposed by Van Der Merwe & Engelbrecht, the particle swarm clustering (PSC) algorithm proposed by Cohen & de Castro, Szabo’s modified PSC (mPSC), and Georgieva & Engelbrecht’s Cooperative-Multi-Population PSO (CMPSO). In this thesis, an improvement over Van Der Merwe & Engelbrecht’s PSO clustering has been proposed and tested for standard datasets. The improvements observed in those experiments vary from slight to moderate, both in terms of minimizing the cost function, and in terms of run time
- …