9,217 research outputs found
Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information
In this paper, we provide an approach to clustering relational matrices whose
entries correspond to either similarities or dissimilarities between objects.
Our approach is based on the value of information, a parameterized,
information-theoretic criterion that measures the change in costs associated
with changes in information. Optimizing the value of information yields a
deterministic annealing style of clustering with many benefits. For instance,
investigators avoid needing to a priori specify the number of clusters, as the
partitions naturally undergo phase changes, during the annealing process,
whereby the number of clusters changes in a data-driven fashion. The
global-best partition can also often be identified.Comment: Submitted to the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP
LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles
Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.
Comparison and validation of community structures in complex networks
The issue of partitioning a network into communities has attracted a great
deal of attention recently. Most authors seem to equate this issue with the one
of finding the maximum value of the modularity, as defined by Newman. Since the
problem formulated this way is NP-hard, most effort has gone into the
construction of search algorithms, and less to the question of other measures
of community structures, similarities between various partitionings and the
validation with respect to external information. Here we concentrate on a class
of computer generated networks and on three well-studied real networks which
constitute a bench-mark for network studies; the karate club, the US college
football teams and a gene network of yeast. We utilize some standard ways of
clustering data (originally not designed for finding community structures in
networks) and show that these classical methods sometimes outperform the newer
ones. We discuss various measures of the strength of the modular structure, and
show by examples features and drawbacks. Further, we compare different
partitions by applying some graph-theoretic concepts of distance, which
indicate that one of the quality measures of the degree of modularity
corresponds quite well with the distance from the true partition. Finally, we
introduce a way to validate the partitionings with respect to external data
when the nodes are classified but the network structure is unknown. This is
here possible since we know everything of the computer generated networks, as
well as the historical answer to how the karate club and the football teams are
partitioned in reality. The partitioning of the gene network is validated by
use of the Gene Ontology database, where we show that a community in general
corresponds to a biological process.Comment: To appear in Physica A; 25 page
Median evidential c-means algorithm and its application to community detection
Median clustering is of great value for partitioning relational data. In this
paper, a new prototype-based clustering method, called Median Evidential
C-Means (MECM), which is an extension of median c-means and median fuzzy
c-means on the theoretical framework of belief functions is proposed. The
median variant relaxes the restriction of a metric space embedding for the
objects but constrains the prototypes to be in the original data set. Due to
these properties, MECM could be applied to graph clustering problems. A
community detection scheme for social networks based on MECM is investigated
and the obtained credal partitions of graphs, which are more refined than crisp
and fuzzy ones, enable us to have a better understanding of the graph
structures. An initial prototype-selection scheme based on evidential
semi-centrality is presented to avoid local premature convergence and an
evidential modularity function is defined to choose the optimal number of
communities. Finally, experiments in synthetic and real data sets illustrate
the performance of MECM and show its difference to other methods
Relational visual cluster validity
The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to correctly estimate the results of clustering algorithms. Although the visual cluster validity (VCV) method of Hathaway and Bezdek can successfully solve this problem, it can only be applied for object data, i.e. feature measurements. There are very few validity methods that can be used to analyze the validity of data where only a similarity or dissimilarity relation exists – relational data. To tackle this problem, this paper presents a relational visual cluster validity (RVCV) method to assess the validity of clustering relational data. This is done by combining the results of the non-Euclidean relational fuzzy c-means (NERFCM) algorithm with a modification of the VCV method to produce a visual representation of cluster validity. RVCV can cluster complete and incomplete relational data and adds to the visual cluster validity theory. Numeric examples using synthetic and real data are presente
Evidential relational clustering using medoids
In real clustering applications, proximity data, in which only pairwise
similarities or dissimilarities are known, is more general than object data, in
which each pattern is described explicitly by a list of attributes.
Medoid-based clustering algorithms, which assume the prototypes of classes are
objects, are of great value for partitioning relational data sets. In this
paper a new prototype-based clustering method, named Evidential C-Medoids
(ECMdd), which is an extension of Fuzzy C-Medoids (FCMdd) on the theoretical
framework of belief functions is proposed. In ECMdd, medoids are utilized as
the prototypes to represent the detected classes, including specific classes
and imprecise classes. Specific classes are for the data which are distinctly
far from the prototypes of other classes, while imprecise classes accept the
objects that may be close to the prototypes of more than one class. This soft
decision mechanism could make the clustering results more cautious and reduce
the misclassification rates. Experiments in synthetic and real data sets are
used to illustrate the performance of ECMdd. The results show that ECMdd could
capture well the uncertainty in the internal data structure. Moreover, it is
more robust to the initializations compared with FCMdd.Comment: in The 18th International Conference on Information Fusion, July
2015, Washington, DC, USA , Jul 2015, Washington, United State
ALTERNATYWNY KRYTERIUM ZATRZYMANIA DLA K-OKREŚLONYCH TWARDYCH ALGORYTMÓW KLASTERYZACJI DANYCH
In this paper the analysis of k-specified (namely k-means) crisp data partitioning pre-clustering algorithm’s termination criterion performance is described. The results have been analyzed using the clustering validity indices. Termination criterion allows analyzing data with any number of clusters. Moreover, introduced criterion in contrast to the known validity indices enables to analyze data that make up one cluster.W przedstawionym artykule została pokazana analiza wstępnej klasteryzacji danych w oparciu o partycjonowanie (algorytm k-średnich) w połączeniu z logiką dwuwartościową. Dodatkowo, zostało przedstawione kryterium zatrzymania klasteryzacji, które umożliwia analizowanie danych z dowolną liczbą klastrów. Otrzymane wyniki badań zostały przeanalizowane przy użyciu wewnętrznych indeksów walidacji. Wprowadzone kryterium w przeciwieństwie do znanych indeksów walidacji umożliwia analizę danych, które tworzą jeden klaster
- …