9,217 research outputs found

    Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information

    Full text link
    In this paper, we provide an approach to clustering relational matrices whose entries correspond to either similarities or dissimilarities between objects. Our approach is based on the value of information, a parameterized, information-theoretic criterion that measures the change in costs associated with changes in information. Optimizing the value of information yields a deterministic annealing style of clustering with many benefits. For instance, investigators avoid needing to a priori specify the number of clusters, as the partitions naturally undergo phase changes, during the annealing process, whereby the number of clusters changes in a data-driven fashion. The global-best partition can also often be identified.Comment: Submitted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP

    LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles

    Get PDF
    Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.

    Comparison and validation of community structures in complex networks

    Full text link
    The issue of partitioning a network into communities has attracted a great deal of attention recently. Most authors seem to equate this issue with the one of finding the maximum value of the modularity, as defined by Newman. Since the problem formulated this way is NP-hard, most effort has gone into the construction of search algorithms, and less to the question of other measures of community structures, similarities between various partitionings and the validation with respect to external information. Here we concentrate on a class of computer generated networks and on three well-studied real networks which constitute a bench-mark for network studies; the karate club, the US college football teams and a gene network of yeast. We utilize some standard ways of clustering data (originally not designed for finding community structures in networks) and show that these classical methods sometimes outperform the newer ones. We discuss various measures of the strength of the modular structure, and show by examples features and drawbacks. Further, we compare different partitions by applying some graph-theoretic concepts of distance, which indicate that one of the quality measures of the degree of modularity corresponds quite well with the distance from the true partition. Finally, we introduce a way to validate the partitionings with respect to external data when the nodes are classified but the network structure is unknown. This is here possible since we know everything of the computer generated networks, as well as the historical answer to how the karate club and the football teams are partitioned in reality. The partitioning of the gene network is validated by use of the Gene Ontology database, where we show that a community in general corresponds to a biological process.Comment: To appear in Physica A; 25 page

    Median evidential c-means algorithm and its application to community detection

    Get PDF
    Median clustering is of great value for partitioning relational data. In this paper, a new prototype-based clustering method, called Median Evidential C-Means (MECM), which is an extension of median c-means and median fuzzy c-means on the theoretical framework of belief functions is proposed. The median variant relaxes the restriction of a metric space embedding for the objects but constrains the prototypes to be in the original data set. Due to these properties, MECM could be applied to graph clustering problems. A community detection scheme for social networks based on MECM is investigated and the obtained credal partitions of graphs, which are more refined than crisp and fuzzy ones, enable us to have a better understanding of the graph structures. An initial prototype-selection scheme based on evidential semi-centrality is presented to avoid local premature convergence and an evidential modularity function is defined to choose the optimal number of communities. Finally, experiments in synthetic and real data sets illustrate the performance of MECM and show its difference to other methods

    Relational visual cluster validity

    Get PDF
    The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to correctly estimate the results of clustering algorithms. Although the visual cluster validity (VCV) method of Hathaway and Bezdek can successfully solve this problem, it can only be applied for object data, i.e. feature measurements. There are very few validity methods that can be used to analyze the validity of data where only a similarity or dissimilarity relation exists – relational data. To tackle this problem, this paper presents a relational visual cluster validity (RVCV) method to assess the validity of clustering relational data. This is done by combining the results of the non-Euclidean relational fuzzy c-means (NERFCM) algorithm with a modification of the VCV method to produce a visual representation of cluster validity. RVCV can cluster complete and incomplete relational data and adds to the visual cluster validity theory. Numeric examples using synthetic and real data are presente

    Evidential relational clustering using medoids

    Get PDF
    In real clustering applications, proximity data, in which only pairwise similarities or dissimilarities are known, is more general than object data, in which each pattern is described explicitly by a list of attributes. Medoid-based clustering algorithms, which assume the prototypes of classes are objects, are of great value for partitioning relational data sets. In this paper a new prototype-based clustering method, named Evidential C-Medoids (ECMdd), which is an extension of Fuzzy C-Medoids (FCMdd) on the theoretical framework of belief functions is proposed. In ECMdd, medoids are utilized as the prototypes to represent the detected classes, including specific classes and imprecise classes. Specific classes are for the data which are distinctly far from the prototypes of other classes, while imprecise classes accept the objects that may be close to the prototypes of more than one class. This soft decision mechanism could make the clustering results more cautious and reduce the misclassification rates. Experiments in synthetic and real data sets are used to illustrate the performance of ECMdd. The results show that ECMdd could capture well the uncertainty in the internal data structure. Moreover, it is more robust to the initializations compared with FCMdd.Comment: in The 18th International Conference on Information Fusion, July 2015, Washington, DC, USA , Jul 2015, Washington, United State

    ALTERNATYWNY KRYTERIUM ZATRZYMANIA DLA K-OKREŚLONYCH TWARDYCH ALGORYTMÓW KLASTERYZACJI DANYCH

    Get PDF
    In this paper the analysis of k-specified (namely k-means) crisp data partitioning pre-clustering algorithm’s termination criterion performance is described. The results have been analyzed using the clustering validity indices. Termination criterion allows analyzing data with any number of clusters. Moreover, introduced criterion in contrast to the known validity indices enables to analyze data that make up one cluster.W przedstawionym artykule została pokazana analiza wstępnej klasteryzacji danych w oparciu o partycjonowanie (algorytm k-średnich) w połączeniu z logiką dwuwartościową. Dodatkowo, zostało przedstawione kryterium zatrzymania klasteryzacji, które umożliwia analizowanie danych z dowolną liczbą klastrów. Otrzymane wyniki badań zostały przeanalizowane przy użyciu wewnętrznych indeksów walidacji. Wprowadzone kryterium w przeciwieństwie do znanych indeksów walidacji umożliwia analizę danych, które tworzą jeden klaster

    Cluster validity in clustering methods

    Get PDF
    corecore