Search CORE

815 research outputs found

Evidential relational clustering using medoids

Author: Liu Zhun-Ga
Martin Arnaud
Pan Quan
Zhou Kuang
Publication venue
Publication date: 06/07/2015
Field of study

In real clustering applications, proximity data, in which only pairwise similarities or dissimilarities are known, is more general than object data, in which each pattern is described explicitly by a list of attributes. Medoid-based clustering algorithms, which assume the prototypes of classes are objects, are of great value for partitioning relational data sets. In this paper a new prototype-based clustering method, named Evidential C-Medoids (ECMdd), which is an extension of Fuzzy C-Medoids (FCMdd) on the theoretical framework of belief functions is proposed. In ECMdd, medoids are utilized as the prototypes to represent the detected classes, including specific classes and imprecise classes. Specific classes are for the data which are distinctly far from the prototypes of other classes, while imprecise classes accept the objects that may be close to the prototypes of more than one class. This soft decision mechanism could make the clustering results more cautious and reduce the misclassification rates. Experiments in synthetic and real data sets are used to illustrate the performance of ECMdd. The results show that ECMdd could capture well the uncertainty in the internal data structure. Moreover, it is more robust to the initializations compared with FCMdd.Comment: in The 18th International Conference on Information Fusion, July 2015, Washington, DC, USA , Jul 2015, Washington, United State

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

The Advantage of Evidential Attributes in Social Networks

Author: adar
adar
khan
leskovec
newman
scott
scott
shafer
Publication venue
Publication date: 10/07/2017
Field of study

Nowadays, there are many approaches designed for the task of detecting communities in social networks. Among them, some methods only consider the topological graph structure, while others take use of both the graph structure and the node attributes. In real-world networks, there are many uncertain and noisy attributes in the graph. In this paper, we will present how we detect communities in graphs with uncertain attributes in the first step. The numerical, probabilistic as well as evidential attributes are generated according to the graph structure. In the second step, some noise will be added to the attributes. We perform experiments on graphs with different types of attributes and compare the detection results in terms of the Normalized Mutual Information (NMI) values. The experimental results show that the clustering with evidential attributes gives better results comparing to those with probabilistic and numerical attributes. This illustrates the advantages of evidential attributes.Comment: 20th International Conference on Information Fusion, Jul 2017, Xi'an, Chin

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

Author: AP Reynolds
C Lucasius
E Schubert
H Bock
H Kriegel
H Park
Leonard Kaufman
ML Overton
RT Ng
V Estivill-Castro
V Estivill-Castro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/10/2019
Field of study

Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In Euclidean geometry the mean-as used in k-means-is a good estimator for the cluster center, but this does not hold for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-)similarity, and thus is of high relevance to many domains such as biology that require the use of Jaccard, Gower, or more complex distances. A key issue with PAM is its high run time cost. We propose modifications to the PAM algorithm to achieve an O(k)-fold speedup in the second SWAP phase of the algorithm, but will still find the same results as the original PAM algorithm. If we slightly relax the choice of swaps performed (at comparable quality), we can further accelerate the algorithm by performing up to k swaps in each iteration. With the substantially faster SWAP, we can now also explore alternative strategies for choosing the initial medoids. We also show how the CLARA and CLARANS algorithms benefit from these modifications. It can easily be combined with earlier approaches to use PAM and CLARA on big data (some of which use PAM as a subroutine, hence can immediately benefit from these improvements), where the performance with high k becomes increasingly important. In experiments on real data with k=100, we observed a 200-fold speedup compared to the original PAM SWAP algorithm, making PAM applicable to larger data sets as long as we can afford to compute a distance matrix, and in particular to higher k (at k=2, the new SWAP was only 1.5 times faster, as the speedup is expected to increase with k)

arXiv.org e-Print Archive

Crossref

Assessing luminosity correlations via cluster analysis: Evidence for dual tracks in the radio/X-ray domain of black hole X-ray binaries

Author: Blandford
Bodenhofer
Brendan P. Miller
Brocksopp
Brocksopp
Cabanac
Cadolle
Calvelo
Casella
Chatterjee
Chipman
Corbel
Corbel
Corbel
Coriat
Coriat
Dunn
Elena Gallo
Falcke
Falcke
Fender
Fender
Fender
Fender
Fraley
Fraley
Frey
Fuentes
Gallo
Gallo
Gallo
Gou
Greene
Gültekin
Hannikainen
Heinz
Heinz
Heinz
Homan
Jolliffe
Jonker
Jonker
Jonker
Kalemci
Kalemci
Kaufman
Kelly
King
Körding
Körding
Körding
Li
Maccarone
Maccarone
Maccarone
Maitra
Maitra
Malzac
Markoff
Markoff
Markoff
Markoff
Markoff
McClintock
McClintock
Merloni
Merloni
Migliari
Migliari
Miller
Miller
Miller
Miller
Miller-Jones
Narayan
Orosz
Orosz
Paizis
Pe’er
Plotkin
Reines
Rob Fender
Rodriguez
Rodriguez
Rushton
Rushton
Russell
Russell
Russell
Russell
Shafee
Soleri
Soleri
Tomsick
Xue
Yuan
Yuan
Publication venue: 'Wiley'
Publication date: 19/03/2012
Field of study

[abridged] The radio:X-ray correlation for hard and quiescent state black hole X-ray binaries is critically investigated in this paper. New observations of known sources, along with newly discovered ones, have resulted in an increasingly large number of outliers lying well outside the scatter about the quoted best-fit relation. Here, we employ and compare state of the art data clustering techniques in order to identify and characterize different data groupings within the radio:X-ray luminosity plane for 18 hard and quiescent state black hole X-ray binaries with nearly simultaneous multi-wavelength coverage. Linear regression is then carried out on the clustered data to infer the parameters of a relationship of the form {ell}_{r}=alpha+beta {ell}_x through a Bayesian approach (where {ell} denotes log lum). We conclude that the two cluster model, with independent linear fits, is a significant improvement over fitting all points as a single cluster. While the upper track slope (0.63\pm0.03) is consistent, within the errors, with the fitted slope for the 2003 relation (0.7\pm0.1), the lower track slope (0.98\pm0.08) is not consistent with the upper track, nor it is with the widely adopted value of ~1.4 for the neutron stars. The two luminosity tracks do not reflect systematic differences in black hole spins as estimated either from reflection, or continuum fitting method. These results are insensitive to the selection of sub-samples, accuracy in the distances, and to the treatment of upper limits. Besides introducing a further level of complexity in understanding the interplay between synchrotron and Comptonised emission from black hole X-ray binaries, the existence of two tracks in the radio:X-ray domain underscores that a high level of caution must be exercised when employing black hole luminosity relations for the purpose of estimating a third parameter, such as distance or mass.Comment: MNRAS, in press (10 pages, 7 figures

arXiv.org e-Print Archive

Crossref

Deep Blue Documents at the University of Michigan

A similarity-based community detection method with multiple prototype representation

Author: Martin Arnaud
Pan Quan
Zhou Kuang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Communities are of great importance for understanding graph structures in social networks. Some existing community detection algorithms use a single prototype to represent each group. In real applications, this may not adequately model the different types of communities and hence limits the clustering performance on social networks. To address this problem, a Similarity-based Multi-Prototype (SMP) community detection approach is proposed in this paper. In SMP, vertices in each community carry various weights to describe their degree of representativeness. This mechanism enables each community to be represented by more than one node. The centrality of nodes is used to calculate prototype weights, while similarity is utilized to guide us to partitioning the graph. Experimental results on computer generated and real-world networks clearly show that SMP performs well for detecting communities. Moreover, the method could provide richer information for the inner structure of the detected communities with the help of prototype weights compared with the existing community detection models

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1