6,940 research outputs found

    Cluster validity in clustering methods

    Get PDF

    Modified Kohonen network algorithm for selection of the initial centres of Gustafson-Kessel algorithm in credit scoring

    Get PDF
    Credit risk assessment has become an important topic in financial risk administration. Fuzzy clustering analysis has been applied in credit scoring. Gustafson-Kessel (GK) algorithm has been utilised to cluster creditworthy customers as against non-creditworthy ones. A good clustering analysis implemented by good Initial Centres of clusters should be selected. To overcome this problem of Gustafson-Kessel (GK) algorithm, we proposed a modified version of Kohonen Network (KN) algorithm to select the initial centres. Utilising similar degree between points to get similarity density, and then by means of maximum density points selecting; the modified Kohonen Network method generate clustering initial centres to get more reasonable clustering results. The comparative was conducted using three credit scoring datasets: Australian, German and Taiwan. Internal and external indexes of validity clustering are computed and the proposed method was found to have the best performance in these three data sets

    An overview of clustering methods with guidelines for application in mental health research

    Get PDF
    Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and librarie

    Comparing hard and overlapping clusterings

    Get PDF
    Similarity measures for comparing clusterings is an important component, e.g., of evaluating clustering algorithms, for consensus clustering, and for clustering stability assessment. These measures have been studied for over 40 years in the domain of exclusive hard clusterings (exhaustive and mutually exclusive object sets). In the past years, the literature has proposed measures to handle more general clusterings (e.g., fuzzy/probabilistic clusterings). This paper provides an overview of these new measures and discusses their drawbacks. We ultimately develop a corrected-for-chance measure (13AGRI) capable of comparing exclusive hard, fuzzy/probabilistic, non-exclusive hard, and possibilistic clusterings. We prove that 13AGRI and the adjusted Rand index (ARI, by Hubert and Arabie) are equivalent in the exclusive hard domain. The reported experiments show that only 13AGRI could provide both a fine-grained evaluation across clusterings with different numbers of clusters and a constant evaluation between random clusterings, showing all the four desirable properties considered here. We identified a high correlation between 13AGRI applied to fuzzy clusterings and ARI applied to hard exclusive clusterings over 14 real data sets from the UCI repository, which corroborates the validity of 13AGRI fuzzy clustering evaluation. 13AGRI also showed good results as a clustering stability statistic for solutions produced by the expectation maximization algorithm for Gaussian mixture

    Variability of behaviour in electricity load profile clustering: who does things at the same time each day?

    Get PDF
    UK electricity market changes provide opportunities to alter households' electricity usage patterns for the benet of the overall electricity network. Work on clustering similar households has concentrated on daily load proles and the variability in regular household behaviours has not been considered. Those households with most variability in reg- ular activities may be the most receptive to incentives to change timing. Whether using the variability of regular behaviour allows the creation of more consistent groupings of households is investigated and compared with daily load prole clustering. 204 UK households are analysed to nd repeating patterns (motifs). Variability in the time of the motif is used as the basis for clustering households. Dierent clustering algorithms are assessed by the consistency of the results. Findings show that variability of behaviour, using motifs, provides more consistent groupings of households across dierent clustering algorithms and allows for more ecient targeting of behaviour change interventions

    Bibliometric Mapping of the Computational Intelligence Field

    Get PDF
    In this paper, a bibliometric study of the computational intelligence field is presented. Bibliometric maps showing the associations between the main concepts in the field are provided for the periods 1996–2000 and 2001–2005. Both the current structure of the field and the evolution of the field over the last decade are analyzed. In addition, a number of emerging areas in the field are identified. It turns out that computational intelligence can best be seen as a field that is structured around four important types of problems, namely control problems, classification problems, regression problems, and optimization problems. Within the computational intelligence field, the neural networks and fuzzy systems subfields are fairly intertwined, whereas the evolutionary computation subfield has a relatively independent position.neural networks;bibliometric mapping;fuzzy systems;bibliometrics;computational intelligence;evolutionary computation

    Applying subclustering and Lp distance in Weighted K-Means with distributed centroids

    Get PDF
    We consider the Weighted K-Means algorithm with distributed centroids aimed at clustering data sets with numerical, categorical and mixed types of data. Our approach allows given features (i.e., variables) to have different weights at different clusters. Thus, it supports the intuitive idea that features may have different degrees of relevance at different clusters. We use the Minkowski metric in a way that feature weights become feature re-scaling factors for any considered exponent. Moreover, the traditional Silhouette clustering validity index was adapted to deal with both numerical and categorical types of features. Finally, we show that our new method usually outperforms traditional K-Means as well as the recently proposed WK-DC clustering algorithm.Peer reviewe
    corecore