257 research outputs found

    Fuzzy clustering with volume prototypes and adaptive cluster merging

    Get PDF
    Two extensions to the objective function-based fuzzy clustering are proposed. First, the (point) prototypes are extended to hypervolumes, whose size can be fixed or can be determined automatically from the data being clustered. It is shown that clustering with hypervolume prototypes can be formulated as the minimization of an objective function. Second, a heuristic cluster merging step is introduced where the similarity among the clusters is assessed during optimization. Starting with an overestimation of the number of clusters in the data, similar clusters are merged in order to obtain a suitable partitioning. An adaptive threshold for merging is proposed. The extensions proposed are applied to Gustafson–Kessel and fuzzy c-means algorithms, and the resulting extended algorithm is given. The properties of the new algorithm are illustrated by various examples

    Extended Fuzzy Clustering Algorithms

    Get PDF
    Fuzzy clustering is a widely applied method for obtaining fuzzy models from data. Ithas been applied successfully in various fields including finance and marketing. Despitethe successful applications, there are a number of issues that must be dealt with in practicalapplications of fuzzy clustering algorithms. This technical report proposes two extensionsto the objective function based fuzzy clustering for dealing with these issues. First, the(point) prototypes are extended to hypervolumes whose size is determined automaticallyfrom the data being clustered. These prototypes are shown to be less sensitive to a biasin the distribution of the data. Second, cluster merging by assessing the similarity amongthe clusters during optimization is introduced. Starting with an over-estimated number ofclusters in the data, similar clusters are merged during clustering in order to obtain a suitablepartitioning of the data. An adaptive threshold for merging is introduced. The proposedextensions are applied to Gustafson-Kessel and fuzzy c-means algorithms, and the resultingextended algorithms are given. The properties of the new algorithms are illustrated invarious examples.fuzzy clustering;cluster merging;similarity;volume prototypes

    Speaker specific feature based clustering and its applications in language independent forensic speaker recognition

    Get PDF
    Forensic speaker recognition (FSR) is the process of determining whether the source of a questioned voice recording (trace) is a specific individual (suspected speaker). The role of the forensic expert is to testify by using, if possible, a quantitative measure of this value to the value of the voice evidence. Using this information as an aid in their judgments and decisions are up to the judge and/or the jury. Most existing methods measure inter-utterance similarities directly based on spectrum-based characteristics, the resulting clusters may not be well related to speaker’s, but rather to different acoustic classes. This research addresses this deficiency by projecting language-independent utterances into a reference space equipped to cover the standard voice features underlying the entire utterance set. The resulting projection vectors naturally represent the language-independent voice-like relationships among all the utterances and are therefore more robust against non-speaker interference. Then a clustering approach is proposed based on the peak approximation in order to maximize the similarities between language-independent utterances within all clusters. This method uses a K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva algorithm to evaluate the cluster to which each utterance should be allocated, overcoming the disadvantage of traditional hierarchical clustering that the ultimate outcome can only hit the optimum recognition efficiency. The recognition efficiency of K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva clustering algorithms are 95.2%, 97.3%, 98.5% and 99.7% and EER are 3.62%, 2.91 %, 2.82%, and 2.61% respectively. The EER improvement of the Gath-Geva technique based FSRsystem compared with Gustafson and Kessel and Fuzzy C-means is 8.04% and 11.49% respectivel

    Extended Fuzzy Clustering Algorithms

    Get PDF
    Fuzzy clustering is a widely applied method for obtaining fuzzy models from data. It has been applied successfully in various fields including finance and marketing. Despite the successful applications, there are a number of issues that must be dealt with in practical applications of fuzzy clustering algorithms. This technical report proposes two extensions to the objective function based fuzzy clustering for dealing with these issues. First, the (point) prototypes are extended to hypervolumes whose size is determined automatically from the data being clustered. These prototypes are shown to be less sensitive to a bias in the distribution of the data. Second, cluster merging by assessing the similarity among the clusters during optimization is introduced. Starting with an over-estimated number of clusters in the data, similar clusters are merged during clustering in order to obtain a suitable partitioning of the data. An adaptive threshold for merging is introduced. The proposed extensions are applied to Gustafson-Kessel and fuzzy c-means algorithms, and the resulting extended algorithms are given. The properties of the new algorithms are illustrated in various examples

    Evidential Evolving Gustafson-Kessel Algorithm For Online Data Streams Partitioning Using Belief Function Theory.

    Get PDF
    International audienceA new online clustering method called E2GK (Evidential Evolving Gustafson-Kessel) is introduced. This partitional clustering algorithm is based on the concept of credal partition defined in the theoretical framework of belief functions. A credal partition is derived online by applying an algorithm resulting from the adaptation of the Evolving Gustafson-Kessel (EGK) algorithm. Online partitioning of data streams is then possible with a meaningful interpretation of the data structure. A comparative study with the original online procedure shows that E2GK outperforms EGK on different entry data sets. To show the performance of E2GK, several experiments have been conducted on synthetic data sets as well as on data collected from a real application problem. A study of parameters' sensitivity is also carried out and solutions are proposed to limit complexity issues

    Evolving Clustering Algorithms And Their Application For Condition Monitoring, Diagnostics, & Prognostics

    Get PDF
    Applications of Condition-Based Maintenance (CBM) technology requires effective yet generic data driven methods capable of carrying out diagnostics and prognostics tasks without detailed domain knowledge and human intervention. Improved system availability, operational safety, and enhanced logistics and supply chain performance could be achieved, with the widespread deployment of CBM, at a lower cost level. This dissertation focuses on the development of a Mutual Information based Recursive Gustafson-Kessel-Like (MIRGKL) clustering algorithm which operates recursively to identify underlying model structure and parameters from stream type data. Inspired by the Evolving Gustafson-Kessel-like Clustering (eGKL) algorithm, we applied the notion of mutual information to the well-known Mahalanobis distance as the governing similarity measure throughout. This is also a special case of the Kullback-Leibler (KL) Divergence where between-cluster shape information (governed by the determinant and trace of the covariance matrix) is omitted and is only applicable in the case of normally distributed data. In the cluster assignment and consolidation process, we proposed the use of the Chi-square statistic with the provision of having different probability thresholds. Due to the symmetry and boundedness property brought in by the mutual information formulation, we have shown with real-world data that the algorithm’s performance becomes less sensitive to the same range of probability thresholds which makes system tuning a simpler task in practice. As a result, improvement demonstrated by the proposed algorithm has implications in improving generic data driven methods for diagnostics, prognostics, generic function approximations and knowledge extractions for stream type of data. The work in this dissertation demonstrates MIRGKL’s effectiveness in clustering and knowledge representation and shows promising results in diagnostics and prognostics applications

    Weighted-covariance factor fuzzy C-means clustering

    Get PDF
    In this paper, we propose a factor weighted fuzzy c-means clustering algorithm. Based on the inverse of a covariance factor, which assesses the collinearity between the centers and samples, this factor takes also into account the compactness of the samples within clusters. The proposed clustering algorithm allows to classify spherical and non-spherical structural clusters, contrary to classical fuzzy c-means algorithm that is only adapted for spherical structural clusters. Compared with other algorithms designed for non-spherical structural clusters, such as Gustafson-Kessel, Gath-Geva or adaptive Mahalanobis distance-based fuzzy c-means clustering algorithms, the proposed algorithm gives better numerical results on artificial and real well known data sets. Moreover, this algorithm can be used for high dimensional data, contrary to other algorithms that require the computation of determinants of large matrices. Application on Mid-Infrared spectra acquired on maize root and aerial parts of Miscanthus for the classification of vegetal biomass shows that this algorithm can successfully be applied on high dimensional data

    Evaluation of Clustering Algorithms on HPC Platforms

    Full text link
    [EN] Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson-Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform.This work has been partially supported by the Spanish Ministry of Science and Innovation, under the Ramon y Cajal Program (Grant No. RYC2018-025580-I) and by the Spanish "Agencia Estatal de Investigacion" under grant PID2020-112827GB-I00 /AEI/ 10.13039/501100011033, and under grants RTI2018-096384-B-I00, RTC-2017-6389-5 and RTC2019-007159-5, by the Fundacion Seneca del Centro de Coordinacion de la Investigacion de la Region de Murcia under Project 20813/PI/18, and by the "Conselleria de Educacion, Investigacion, Cultura y Deporte, Direccio General de Ciencia i Investigacio, Proyectos AICO/2020", Spain, under Grant AICO/2020/302.Cebrian, JM.; ImbernĂłn, B.; Soto, J.; Cecilia-Canales, JM. (2021). Evaluation of Clustering Algorithms on HPC Platforms. Mathematics. 9(17):1-20. https://doi.org/10.3390/math917215612091

    A Comparison of Fuzzy Clustering Algorithms Applied to Feature Extraction on Vineyard

    Get PDF
    Image segmentation is a process by which an image is partitioned into regions with similar features. Many approaches have been proposed for color image segmentation, but Fuzzy C-Means has been widely used, because it has a good performance in a large class of images. However, it is not adequate for noisy images and it also takes more time for execution as compared to other method as K-means. For this reason, several methods have been proposed to improve these weaknesses. Method like Possibilistic C-Means, Fuzzy Possibilistic C-Means, Robust Fuzzy Possibilistic C-Means and Fuzzy C-Means with Gustafson-Kessel algorithm. In this paper we perform a comparison of these clustering algorithms applied to feature extraction on vineyard images. Segmented images are evaluated using several quality parameters such as the rate of correctly classied area and runtim
    • 

    corecore