8,523 research outputs found
Median evidential c-means algorithm and its application to community detection
Median clustering is of great value for partitioning relational data. In this
paper, a new prototype-based clustering method, called Median Evidential
C-Means (MECM), which is an extension of median c-means and median fuzzy
c-means on the theoretical framework of belief functions is proposed. The
median variant relaxes the restriction of a metric space embedding for the
objects but constrains the prototypes to be in the original data set. Due to
these properties, MECM could be applied to graph clustering problems. A
community detection scheme for social networks based on MECM is investigated
and the obtained credal partitions of graphs, which are more refined than crisp
and fuzzy ones, enable us to have a better understanding of the graph
structures. An initial prototype-selection scheme based on evidential
semi-centrality is presented to avoid local premature convergence and an
evidential modularity function is defined to choose the optimal number of
communities. Finally, experiments in synthetic and real data sets illustrate
the performance of MECM and show its difference to other methods
Evidential relational clustering using medoids
In real clustering applications, proximity data, in which only pairwise
similarities or dissimilarities are known, is more general than object data, in
which each pattern is described explicitly by a list of attributes.
Medoid-based clustering algorithms, which assume the prototypes of classes are
objects, are of great value for partitioning relational data sets. In this
paper a new prototype-based clustering method, named Evidential C-Medoids
(ECMdd), which is an extension of Fuzzy C-Medoids (FCMdd) on the theoretical
framework of belief functions is proposed. In ECMdd, medoids are utilized as
the prototypes to represent the detected classes, including specific classes
and imprecise classes. Specific classes are for the data which are distinctly
far from the prototypes of other classes, while imprecise classes accept the
objects that may be close to the prototypes of more than one class. This soft
decision mechanism could make the clustering results more cautious and reduce
the misclassification rates. Experiments in synthetic and real data sets are
used to illustrate the performance of ECMdd. The results show that ECMdd could
capture well the uncertainty in the internal data structure. Moreover, it is
more robust to the initializations compared with FCMdd.Comment: in The 18th International Conference on Information Fusion, July
2015, Washington, DC, USA , Jul 2015, Washington, United State
Fuzzy clustering with volume prototypes and adaptive cluster merging
Two extensions to the objective function-based fuzzy
clustering are proposed. First, the (point) prototypes are extended to hypervolumes, whose size can be fixed or can be determined automatically from the data being clustered. It is shown that clustering with hypervolume prototypes can be formulated as the minimization of an objective function. Second, a heuristic cluster merging step is introduced where the similarity among the clusters
is assessed during optimization. Starting with an overestimation of the number of clusters in the data, similar clusters are merged in order to obtain a suitable partitioning. An adaptive threshold for merging is proposed. The extensions proposed are applied to
GustafsonâKessel and fuzzy c-means algorithms, and the resulting extended algorithm is given. The properties of the new algorithm are illustrated by various examples
Global Optimization strategies for two-mode clustering
Two-mode clustering is a relatively new form of clustering that clusters both rows and columns of a data matrix. To do so, a criterion similar to k-means is optimized. However, it is still unclear which optimization method should be used to perform two-mode clustering, as various methods may lead to non-global optima. This paper reviews and compares several optimization methods for two-mode clustering. Several known algorithms are discussed and a new, fuzzy algorithm is introduced. The meta-heuristics Multistart, Simulated Annealing, and Tabu Search are used in combination with these algorithms. The new, fuzzy algorithm is based on the fuzzy c-means algorithm of Bezdek (1981) and the Fuzzy Steps approach to avoid local minima of Heiser and Groenen (1997) and Groenen and Jajuga (2001). The performance of all methods is compared in a large simulation study. It is found that using a Multistart meta-heuristic in combination with a two-mode k-means algorithm or the fuzzy algorithm often gives the best results. Finally, an empirical data set is used to give a practical example of two-mode clustering.algorithms;fuzzy clustering;multistart;simulated annealing;simulation;tabu search;two-mode clustering
Extended Fuzzy Clustering Algorithms
Fuzzy clustering is a widely applied method for obtaining fuzzy models from data. Ithas been applied successfully in various fields including finance and marketing. Despitethe successful applications, there are a number of issues that must be dealt with in practicalapplications of fuzzy clustering algorithms. This technical report proposes two extensionsto the objective function based fuzzy clustering for dealing with these issues. First, the(point) prototypes are extended to hypervolumes whose size is determined automaticallyfrom the data being clustered. These prototypes are shown to be less sensitive to a biasin the distribution of the data. Second, cluster merging by assessing the similarity amongthe clusters during optimization is introduced. Starting with an over-estimated number ofclusters in the data, similar clusters are merged during clustering in order to obtain a suitablepartitioning of the data. An adaptive threshold for merging is introduced. The proposedextensions are applied to Gustafson-Kessel and fuzzy c-means algorithms, and the resultingextended algorithms are given. The properties of the new algorithms are illustrated invarious examples.fuzzy clustering;cluster merging;similarity;volume prototypes
A similarity-based community detection method with multiple prototype representation
Communities are of great importance for understanding graph structures in
social networks. Some existing community detection algorithms use a single
prototype to represent each group. In real applications, this may not
adequately model the different types of communities and hence limits the
clustering performance on social networks. To address this problem, a
Similarity-based Multi-Prototype (SMP) community detection approach is proposed
in this paper. In SMP, vertices in each community carry various weights to
describe their degree of representativeness. This mechanism enables each
community to be represented by more than one node. The centrality of nodes is
used to calculate prototype weights, while similarity is utilized to guide us
to partitioning the graph. Experimental results on computer generated and
real-world networks clearly show that SMP performs well for detecting
communities. Moreover, the method could provide richer information for the
inner structure of the detected communities with the help of prototype weights
compared with the existing community detection models
On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems
We present a new distributed fuzzy partitioning method to reduce the
complexity of multi-way fuzzy decision trees in Big Data classification
problems. The proposed algorithm builds a fixed number of fuzzy sets for all
variables and adjusts their shape and position to the real distribution of
training data. A two-step process is applied : 1) transformation of the
original distribution into a standard uniform distribution by means of the
probability integral transform. Since the original distribution is generally
unknown, the cumulative distribution function is approximated by computing the
q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy
partition in the transformed attribute space using a fixed number of equally
distributed triangular membership functions. Despite the aforementioned
transformation, the definition of every fuzzy set in the original space can be
recovered by applying the inverse cumulative distribution function (also known
as quantile function). The experimental results reveal that the proposed
methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT)
induction algorithm to maintain classification accuracy with up to 6 million
fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData
Congress). arXiv admin note: text overlap with arXiv:1902.0935
- âŠ