7,498 research outputs found
A link density clustering algorithm based on automatically selecting density peaks for overlapping community detection
Peer reviewedPostprin
BPEC: Belief-Peaks Evidential Clustering
International audienceThis paper introduces a new evidential clustering method based on the notion of "belief peaks" in the framework of belief functions. The basic idea is that all data objects in the neighborhood of each sample provide pieces of evidence that induce belief on the possibility of such sample to become a cluster center. A sample having higher belief than its neighbors and located far away from other local maxima is then characterized as cluster center. Finally, a credal partition is created by minimizing an objective function with the fixed cluster centers. An adaptive distance metric is used to fit for unknown shapes of data structures. We show that the proposed evidential clustering procedure has very good performance with an ability to reveal the data structure in the form of a credal partition, from which hard, fuzzy, possibilistic and rough partitions can be derived. Simulations on synthetic and real-world datasets validate our conclusions
Adaptive scaling of cluster boundaries for large-scale social media data clustering
The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the clustering mechanism of Fuzzy ART, and discover the vigilance region (VR) that essentially determines how a cluster in the Fuzzy ART system recognizes similar patterns in the feature space. The VR gives an intrinsic interpretation of the clustering mechanism and limitations of Fuzzy ART. Second, we introduce the idea of allowing different clusters in the Fuzzy ART system to have different vigilance levels in order to meet the diverse nature of the pattern distribution of social media data. To this end, we propose three vigilance adaptation methods, namely, the activation maximization (AM) rule, the confliction minimization (CM) rule, and the hybrid integration (HI) rule. With an initial vigilance value, the resulting clustering algorithms, namely, the AM-ART, CM-ART, and HI-ART, can automatically adapt the vigilance values of all clusters during the learning epochs in order to produce better cluster boundaries. Experiments on four social media data sets show that AM-ART, CM-ART, and HI-ART are more robust than Fuzzy ART to the initial vigilance value, and they usually achieve better or comparable performance and much faster speed than the state-of-the-art clustering algorithms that also do not require a predefined number of clusters
Data clustering using a model granular magnet
We present a new approach to clustering, based on the physical properties of
an inhomogeneous ferromagnet. No assumption is made regarding the underlying
distribution of the data. We assign a Potts spin to each data point and
introduce an interaction between neighboring points, whose strength is a
decreasing function of the distance between the neighbors. This magnetic system
exhibits three phases. At very low temperatures it is completely ordered; all
spins are aligned. At very high temperatures the system does not exhibit any
ordering and in an intermediate regime clusters of relatively strongly coupled
spins become ordered, whereas different clusters remain uncorrelated. This
intermediate phase is identified by a jump in the order parameters. The
spin-spin correlation function is used to partition the spins and the
corresponding data points into clusters. We demonstrate on three synthetic and
three real data sets how the method works. Detailed comparison to the
performance of other techniques clearly indicates the relative success of our
method.Comment: 46 pages, postscript, 15 ps figures include
- …