815 research outputs found

    Evidential relational clustering using medoids

    Get PDF
    In real clustering applications, proximity data, in which only pairwise similarities or dissimilarities are known, is more general than object data, in which each pattern is described explicitly by a list of attributes. Medoid-based clustering algorithms, which assume the prototypes of classes are objects, are of great value for partitioning relational data sets. In this paper a new prototype-based clustering method, named Evidential C-Medoids (ECMdd), which is an extension of Fuzzy C-Medoids (FCMdd) on the theoretical framework of belief functions is proposed. In ECMdd, medoids are utilized as the prototypes to represent the detected classes, including specific classes and imprecise classes. Specific classes are for the data which are distinctly far from the prototypes of other classes, while imprecise classes accept the objects that may be close to the prototypes of more than one class. This soft decision mechanism could make the clustering results more cautious and reduce the misclassification rates. Experiments in synthetic and real data sets are used to illustrate the performance of ECMdd. The results show that ECMdd could capture well the uncertainty in the internal data structure. Moreover, it is more robust to the initializations compared with FCMdd.Comment: in The 18th International Conference on Information Fusion, July 2015, Washington, DC, USA , Jul 2015, Washington, United State

    The Advantage of Evidential Attributes in Social Networks

    Get PDF
    Nowadays, there are many approaches designed for the task of detecting communities in social networks. Among them, some methods only consider the topological graph structure, while others take use of both the graph structure and the node attributes. In real-world networks, there are many uncertain and noisy attributes in the graph. In this paper, we will present how we detect communities in graphs with uncertain attributes in the first step. The numerical, probabilistic as well as evidential attributes are generated according to the graph structure. In the second step, some noise will be added to the attributes. We perform experiments on graphs with different types of attributes and compare the detection results in terms of the Normalized Mutual Information (NMI) values. The experimental results show that the clustering with evidential attributes gives better results comparing to those with probabilistic and numerical attributes. This illustrates the advantages of evidential attributes.Comment: 20th International Conference on Information Fusion, Jul 2017, Xi'an, Chin

    Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

    Full text link
    Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In Euclidean geometry the mean-as used in k-means-is a good estimator for the cluster center, but this does not hold for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-)similarity, and thus is of high relevance to many domains such as biology that require the use of Jaccard, Gower, or more complex distances. A key issue with PAM is its high run time cost. We propose modifications to the PAM algorithm to achieve an O(k)-fold speedup in the second SWAP phase of the algorithm, but will still find the same results as the original PAM algorithm. If we slightly relax the choice of swaps performed (at comparable quality), we can further accelerate the algorithm by performing up to k swaps in each iteration. With the substantially faster SWAP, we can now also explore alternative strategies for choosing the initial medoids. We also show how the CLARA and CLARANS algorithms benefit from these modifications. It can easily be combined with earlier approaches to use PAM and CLARA on big data (some of which use PAM as a subroutine, hence can immediately benefit from these improvements), where the performance with high k becomes increasingly important. In experiments on real data with k=100, we observed a 200-fold speedup compared to the original PAM SWAP algorithm, making PAM applicable to larger data sets as long as we can afford to compute a distance matrix, and in particular to higher k (at k=2, the new SWAP was only 1.5 times faster, as the speedup is expected to increase with k)

    Assessing luminosity correlations via cluster analysis: Evidence for dual tracks in the radio/X-ray domain of black hole X-ray binaries

    Full text link
    [abridged] The radio:X-ray correlation for hard and quiescent state black hole X-ray binaries is critically investigated in this paper. New observations of known sources, along with newly discovered ones, have resulted in an increasingly large number of outliers lying well outside the scatter about the quoted best-fit relation. Here, we employ and compare state of the art data clustering techniques in order to identify and characterize different data groupings within the radio:X-ray luminosity plane for 18 hard and quiescent state black hole X-ray binaries with nearly simultaneous multi-wavelength coverage. Linear regression is then carried out on the clustered data to infer the parameters of a relationship of the form {ell}_{r}=alpha+beta {ell}_x through a Bayesian approach (where {ell} denotes log lum). We conclude that the two cluster model, with independent linear fits, is a significant improvement over fitting all points as a single cluster. While the upper track slope (0.63\pm0.03) is consistent, within the errors, with the fitted slope for the 2003 relation (0.7\pm0.1), the lower track slope (0.98\pm0.08) is not consistent with the upper track, nor it is with the widely adopted value of ~1.4 for the neutron stars. The two luminosity tracks do not reflect systematic differences in black hole spins as estimated either from reflection, or continuum fitting method. These results are insensitive to the selection of sub-samples, accuracy in the distances, and to the treatment of upper limits. Besides introducing a further level of complexity in understanding the interplay between synchrotron and Comptonised emission from black hole X-ray binaries, the existence of two tracks in the radio:X-ray domain underscores that a high level of caution must be exercised when employing black hole luminosity relations for the purpose of estimating a third parameter, such as distance or mass.Comment: MNRAS, in press (10 pages, 7 figures

    A similarity-based community detection method with multiple prototype representation

    Get PDF
    Communities are of great importance for understanding graph structures in social networks. Some existing community detection algorithms use a single prototype to represent each group. In real applications, this may not adequately model the different types of communities and hence limits the clustering performance on social networks. To address this problem, a Similarity-based Multi-Prototype (SMP) community detection approach is proposed in this paper. In SMP, vertices in each community carry various weights to describe their degree of representativeness. This mechanism enables each community to be represented by more than one node. The centrality of nodes is used to calculate prototype weights, while similarity is utilized to guide us to partitioning the graph. Experimental results on computer generated and real-world networks clearly show that SMP performs well for detecting communities. Moreover, the method could provide richer information for the inner structure of the detected communities with the help of prototype weights compared with the existing community detection models
    • …
    corecore