Search CORE

114,519 research outputs found

A New Cluster Validity for Data Clustering

Author: A.O. Boudraa
Aize Cao
B. Scholkopf
D.W. Kim
I. Gath
J.C. Bezdek
J.C. Bezdek
J.C. Bezdek
K. Rose
K. Rose
K. Rose
K.L. Wu
M.K. Pakhira
M.R. Rezaee
N. Zahid
N.R. Pal
Qing Song
R. Krishnapuram
R.N. Dave
S. Geman
S. Haykin
S.H. Kwon
S.J. Roberts
T. Graepel
V.N. Vapnik
X.L. Xie
Xulei Yang
Y. Man
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Fuzzy clustering of univariate and multivariate time series by genetic multiobjective optimization

Author: R. Baragona
S. Bandyopadhyay
U. Maulik
Publication venue
Publication date
Field of study

Given a set of time series, it is of interest to discover subsets that share similar properties. For instance, this may be useful for identifying and estimating a single model that may fit conveniently several time series, instead of performing the usual identification and estimation steps for each one. On the other hand time series in the same cluster are related with respect to the measures assumed for cluster analysis and are suitable for building multivariate time series models. Though many approaches to clustering time series exist, in this view the most effective method seems to have to rely on choosing some features relevant for the problem at hand and seeking for clusters according to their measurements, for instance the autoregressive coe±cients, spectral measures or the eigenvectors of the covariance matrix. Some new indexes based on goodnessof-fit criteria will be proposed in this paper for fuzzy clustering of multivariate time series. A general purpose fuzzy clustering algorithm may be used to estimate the proper cluster structure according to some internal criteria of cluster validity. Such indexes are known to measure actually definite often conflicting cluster properties, compactness or connectedness, for instance, or distribution, orientation, size and shape. It is argued that the multiobjective optimization supported by genetic algorithms is a most effective choice in such a di±cult context. In this paper we use the Xie-Beni index and the C-means functional as objective functions to evaluate the cluster validity in a multiobjective optimization framework. The concept of Pareto optimality in multiobjective genetic algorithms is used to evolve a set of potential solutions towards a set of optimal non-dominated solutions. Genetic algorithms are well suited for implementing di±cult optimization problems where objective functions do not usually have good mathematical properties such as continuity, differentiability or convexity. In addition the genetic algorithms, as population based methods, may yield a complete Pareto front at each step of the iterative evolutionary procedure. The method is illustrated by means of a set of real data and an artificial multivariate time series data set.Fuzzy clustering, Internal criteria of cluster validity, Genetic algorithms, Multiobjective optimization, Time series, Pareto optimality

Research Papers in Economics

Comparative cluster labelling involving external text sources

Author: Hornyák Miklós
Kovács Balázs
Kruzslicz Ferenc
Publication venue: 'Statisztikai Szemle'
Publication date: 01/01/2017
Field of study

Giving clear, straightforward names to individual result groups of clustering data is most important in making research usable. This is especially so when clustering is the real outcome of the analysis and not just a tool for data preparation. In this case, the underlying concept of the cluster itself makes the result meaningful and useful. However, a cluster comes alive only in the investigator’s mind since it can be defined or described in words. Our method introduced in this paper aims to facilitate and partly automate this verbal characterisation process. The external text database is joined to the objects of the clustering that adds new, previously unused features to the data set. Clusters are described by labels produced by text mining analytics. The validity of clustering can be characterised by the shape of the final word cloud

Repository of the Academy's Library

On-line evolving fuzzy clustering

Author: Kasabov N
Ravi V.
Srinivas E.
Publication venue: IEEE
Publication date: 27/05/2009
Field of study

In this paper, a novel on-line evolving fuzzy clustering method that extends the evolving clustering method (ECM) of Kasabov and Song (2002) is presented, called EFCM. Since it is an on-line algorithm, the fuzzy membership matrix of the data is updated whenever the existing cluster expands, or a new cluster is formed. EFCM does not need the numbers of the clusters to be pre-defined. The algorithm is tested on several benchmark data sets, such as Iris, Wine, Glass, E-Coli, Yeast and Italian Olive oils. EFCM results in the least objective function value compared to the ECM and Fuzzy C-Means. It is significantly faster (by several orders of magnitude) than any of the off-line batch-mode clustering algorithms. A methodology is also proposed for using theXie-Beni cluster validity measure to optimize the number of clusters. © 2007 IEEE

AUT Scholarly Commons

Clustering in relational data and ontologies

Author: Havens Timothy C., 1976-
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Title from PDF of title page (University of Missouri--Columbia, viewed on August 20, 2010).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Dissertation advisor: Dr. James M. Keller.Vita.Ph. D. University of Missouri--Columbia 2010.This dissertation studies the problem of clustering objects represented by relational data. This is a pertinent problem as many real-world data sets can only be represented by relational data for which object-based clustering algorithms are not designed. Relational data are encountered in many fields including biology, management, industrial engineering, and social sciences. Unlike numerical object data, which are represented by a set of feature values (e.g. height, weight, shoe size) of an object, relational object data are the numerical values of (dis) similarity between objects. For this reason, conventional cluster analysis methods such as k-means and fuzzy c-means cannot be used directly with relational data. I focus on three main problems of cluster analysis of relational data: (i) tendency prior to clustering -- how many clusters are there?; (ii) partitioning of objects -- which objects belong to which cluster?; and (iii) validity of the resultant clusters -- are the partitions \good"?Analyses are included in this dissertation that prove that the Visual Assessment of cluster Tendency (VAT) algorithm has a direct relation to single-linkage hierarchical clustering and Dunn's cluster validity index. These analyses are important to the development of two novel clustering algorithms, CLODD-CLustering in Ordered Dissimilarity Data and ReSL-Rectangular Single-Linkage clustering. Last, this dissertation addresses clustering in ontologies; examples include the Gene Ontology, the MeSH ontology, patient medical records, and web documents. I apply an extension to the Self-Organizing Map (SOM) to produce a new algorithm, the OSOM-Ontological Self-Organizing Map. OSOM provides visualization and linguistic summarization of ontology-based data.Includes bibliographical references

University of Missouri: MOspace

Segmentation of Colour Images by Modified Mountain Clustering

Author: Hanmandlu M.
Jha Devendra
Publication venue: 'Defence Scientific Information and Documentation Centre'
Publication date: 01/07/2002
Field of study

Segmentation of colour images is an important issue in various machine vision and image processing applications. Though clustering techniques have been in vogue for many years, these have not been very effective because of problems like selection of the number of clusters. This problem has been tackled by having a validity measure coupled with the new clustering technique. This method treats each point in the dataset, which is the map of all possible colour combinations in the given image, as a potential cluster centre and estimates its potential wrt other data elements. First, the point with the maximum value of potential is considered to be a cluster centre and then its effect is removed from other points of the dataset. This procedure is repeated to determine different cluster centres. At the same time, the compactness and the minimum separation is computed amongst all the cluster centres, and also the validity function as the ratio of these quantities. The validity function can be used to choose the number of clusters. This technique has been compared to the fuzzy C-means technique and the results have been shown for a sample colour image

Directory of Open Access Journals

Defence Science Journal

Clustering performance analysis using a new correlation-based cluster validity index

Author: Wiroonsri Nathakhun
Publication venue
Publication date: 25/07/2022
Field of study

There are various cluster validity indices used for evaluating clustering results. One of the main objectives of using these indices is to seek the optimal unknown number of clusters. Some indices work well for clusters with different densities, sizes, and shapes. Yet, one shared weakness of those validity indices is that they often provide only one optimal number of clusters. That number is unknown in real-world problems, and there might be more than one possible option. We develop a new cluster validity index based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points occupy. Our proposed index constantly yields several local peaks and overcomes the previously stated weakness. Several experiments in different scenarios, including UCI real-world data sets, have been conducted to compare the proposed validity index with several well-known ones. An R package related to this new index called NCvalid is available at https://github.com/nwiroonsri/NCvalid.Comment: 19 page

arXiv.org e-Print Archive

Selecting the Number of Clusters $K$ with a Stability Trade-off: an Internal Validation Criterion

Author: Azzag Hanane
Forest Florent
Lacaille Jérôme
Lebbah Mustapha
Mourer Alex
Publication venue
Publication date: 16/07/2020
Field of study

Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that there is no ground truth against which results could be tested, as in supervised learning. The difficulty to find a universal evaluation criterion is a direct consequence of the fundamentally ill-defined objective of clustering. In this perspective, clustering stability has emerged as a natural and model-agnostic principle: an algorithm should find stable structures in the data. If data sets are repeatedly sampled from the same underlying distribution, an algorithm should find similar partitions. However, it turns out that stability alone is not a well-suited tool to determine the number of clusters. For instance, it is unable to detect if the number of clusters is too small. We propose a new principle for clustering validation: a good clustering should be stable, and within each cluster, there should exist no stable partition. This principle leads to a novel internal clustering validity criterion based on between-cluster and within-cluster stability, overcoming limitations of previous stability-based methods. We empirically show the superior ability of additive noise to discover structures, compared with sampling-based perturbation. We demonstrate the effectiveness of our method for selecting the number of clusters through a large number of experiments and compare it with existing evaluation methods.Comment: 43 page

arXiv.org e-Print Archive

HAL-Paris1