Search CORE

29,184 research outputs found

An Intelligent System A Comparative Study Of Fuzzy C-Means And K-Means Clustering Techniques

Author: Afirah Taufik
Publication venue: 'M. Utemissov West Kazakhstan State University'
Publication date: 01/01/2013
Field of study

Clustering analysis has been considered as useful means for identifying patterns of dataset. The aim for this analysis is to decide what is the most suitable algorithm to be used when dealings with new scatter data. In this analysis, two important clustering algorithms namely fuzzy c-means and k-means clustering algorithms are compared. These algorithms are applied to synthetic data 2-dimensional dataset. The numbers of data points as well as the number of clusters are determined, with that the behavior patterns of both the algorithm are analyzed. Quality of clustering is based on lowest distance and highest membership similarity between the points and the centre cluster in one cluster, known as inter-class cluster similarity. Fuzzy c-means and k-means clustering are compared based on the inter-class cluster similarity by obtaining the minimum value of summation of distance. Additionally, in fuzzy c-means algorithm, most researchers fix weighting exponent (m) to a conventional value of 2 which might not be the appropriate for all applications. In order to find m, also called as fuzziness coefficient, optimal in fuzzy c-means on particular dataset is based on minimal reconstruction error

Relational visual cluster validity

Author: Ding Y.
Harrison R.F.
Publication venue: 'Elsevier BV'
Publication date: 01/11/2007
Field of study

The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to correctly estimate the results of clustering algorithms. Although the visual cluster validity (VCV) method of Hathaway and Bezdek can successfully solve this problem, it can only be applied for object data, i.e. feature measurements. There are very few validity methods that can be used to analyze the validity of data where only a similarity or dissimilarity relation exists – relational data. To tackle this problem, this paper presents a relational visual cluster validity (RVCV) method to assess the validity of clustering relational data. This is done by combining the results of the non-Euclidean relational fuzzy c-means (NERFCM) algorithm with a modification of the VCV method to produce a visual representation of cluster validity. RVCV can cluster complete and incomplete relational data and adds to the visual cluster validity theory. Numeric examples using synthetic and real data are presente

A Short Survey on Data Clustering Algorithms

Author: Wong Ka-Chun
Publication venue
Publication date: 25/11/2015
Field of study

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end

arXiv.org e-Print Archive

Observer-biased bearing condition monitoring: from fault detection to multi-fault classification

Author: Cabrera Diego
Cerrada Mariela
Li Chuan
Oliveira José Valente de
Pacheco Fannia
Sanchez Vinicio
Zurita Grover
Publication venue: Elsevier
Publication date: 01/04/2016
Field of study

Bearings are simultaneously a fundamental component and one of the principal causes of failure in rotary machinery. The work focuses on the employment of fuzzy clustering for bearing condition monitoring, i.e., fault detection and classification. The output of a clustering algorithm is a data partition (a set of clusters) which is merely a hypothesis on the structure of the data. This hypothesis requires validation by domain experts. In general, clustering algorithms allow a limited usage of domain knowledge on the cluster formation process. In this study, a novel method allowing for interactive clustering in bearing fault diagnosis is proposed. The method resorts to shrinkage to generalize an otherwise unbiased clustering algorithm into a biased one. In this way, the method provides a natural and intuitive way to control the cluster formation process, allowing for the employment of domain knowledge to guiding it. The domain expert can select a desirable level of granularity ranging from fault detection to classification of a variable number of faults and can select a specific region of the feature space for detailed analysis. Moreover, experimental results under realistic conditions show that the adopted algorithm outperforms the corresponding unbiased algorithm (fuzzy c-means) which is being widely used in this type of problems. (C) 2016 Elsevier Ltd. All rights reserved.Grant number: 145602

Taming Wild High Dimensional Text Data with a Fuzzy Lash

Author: Karami Amir
Publication venue
Publication date: 01/11/2017
Field of study

The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy

arXiv.org e-Print Archive

Scholar Commons - Institutional Repository of the University of South Carolina