Search CORE

426,549 research outputs found

Cluster validation by measurement of clustering characteristics relevant to the user

Author: Bowcock
Calinski
Coretto
Fang
Franck
Halkidi
Hausdorf
Hennig
Hennig
Hennig
Hennig
Hubert
Hubert
Katsnelson
Kaufman
Lago-Fernandez
Stigler
Tibshirani
Publication venue
Publication date: 01/01/2019
Field of study

There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. In this paper, a number of validation criteria will be introduced that refer to different desirable characteristics of a clustering, and that characterise a clustering in a multidimensional way. In specific applications the user may be interested in some of these criteria rather than others. A focus of the paper is on methodology to standardise the different characteristics so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.Comment: 20 pages 2 figure

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Properties from relativistic coupled-cluster without truncation: hyperfine constants of $^{25}{\rm Mg}^+$ , $^{43}{\rm Ca}^+$ , $^{87}{\rm Sr}^+$ and $^{137}{\rm Ba}^+$

Author: B. K. Mani
D. Angom
I. Lindgren
W. R. Johnson
Publication venue: 'American Physical Society (APS)'
Publication date: 25/12/2009
Field of study

We demonstrate an iterative scheme for coupled-cluster properties calculations without truncating the dressed properties operator. For validation, magnetic dipole hyperfine constants of alkaline Earth ions are calculated with relativistic coupled-cluster and role of electron correlation examined. Then, a detailed analysis of the higher order terms is carried out. Based on the results, we arrive at an optimal form of the dressed operator. Which we recommend for properties calculations with relativistic coupled-cluster theory.Comment: 13 pages, 4 figures, 5 table

arXiv.org e-Print Archive

Crossref

clValid: An R Package for Cluster Validation

Author: Guy Brock
Somnath Datta
Susmita Datta
Vasyl Pihur
Publication venue
Publication date
Field of study

The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.

Research Papers in Economics

Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

Author: Hennig Christian
Lin Chien-Ju
Publication venue
Publication date: 09/02/2015
Field of study

There are two notoriously hard problems in cluster analysis, estimating the number of clusters, and checking whether the population to be clustered is not actually homogeneous. Given a dataset, a clustering method and a cluster validation index, this paper proposes to set up null models that capture structural features of the data that cannot be interpreted as indicating clustering. Artificial datasets are sampled from the null model with parameters estimated from the original dataset. This can be used for testing the null hypothesis of a homogeneous population against a clustering alternative. It can also be used to calibrate the validation index for estimating the number of clusters, by taking into account the expected distribution of the index under the null model for any given number of clusters. The approach is illustrated by three examples, involving various different clustering techniques (partitioning around medoids, hierarchical methods, a Gaussian mixture model), validation indexes (average silhouette width, prediction strength and BIC), and issues such as mixed type data, temporal and spatial autocorrelation

arXiv.org e-Print Archive

Springer - Publisher Connector

An empirical study on the visual cluster validation method with Fastmap

Author: Cheung DWL
Huang Z
Ng MK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

This paper presents an empirical study on the visual method for cluster validation based on the Fastmap projection. The visual cluster validation method attempts to tackle two clustering problems in data mining: to verify partitions of data created by a clustering algorithm; and to identify genuine clusters from data partitions. They are achieved through projecting objects and clusters by Fastmap to the 2D space and visually examining the results by humans. A Monte Carlo evaluation of the visual method was conducted. The validation results of the visual method were compared with the results of two internal statistical cluster validation indices, which shows that the visual method is in consistence with the statistical validation methods. This indicates that the visual cluster validation method is indeed effective and applicable to data mining applications.published_or_final_versio

HKU Scholars Hub

Combining Cluster Validation Indices for Detecting Label Noise

Author: Angelova Milena
Boeva Veselka
Kohstall Jan
Lundberg Lars
Publication venue
Publication date: 15/07/2020
Field of study

In this paper, we show that cluster validation indices can be used for filtering mislabeled instances or class outliers prior to training in supervised learning problems. We propose a technique, entitled Cluster Validation Index (CVI)-based Outlier Filtering, in which mislabeled instances are identified and eliminated from the training set, and a classification hypothesis is then built from the set of remaining instances. The proposed approach assigns each instance several cluster validation scores representing its potential of being an outlier with respect to the clustering properties the used validation measures assess. We examine CVI-based Outlier Filtering and compare it against the Local Outlier Factor (LOF) detection method on ten data sets from the UCI data repository using five well-known learning algorithms and three different cluster validation indices. In addition, we study and compare three different approaches for combining the selected cluster validation measures. Our results show that for most learning algorithms and data sets, the proposed CVI-based outlier filtering algorithm outperforms the baseline method (LOF). The greatest increase in classification accuracy has been achieved by using union or ranked-based median strategies to assemble the used cluster validation indices and global filtering of mislabeled instances

KITopen