815 research outputs found
Evidential relational clustering using medoids
In real clustering applications, proximity data, in which only pairwise
similarities or dissimilarities are known, is more general than object data, in
which each pattern is described explicitly by a list of attributes.
Medoid-based clustering algorithms, which assume the prototypes of classes are
objects, are of great value for partitioning relational data sets. In this
paper a new prototype-based clustering method, named Evidential C-Medoids
(ECMdd), which is an extension of Fuzzy C-Medoids (FCMdd) on the theoretical
framework of belief functions is proposed. In ECMdd, medoids are utilized as
the prototypes to represent the detected classes, including specific classes
and imprecise classes. Specific classes are for the data which are distinctly
far from the prototypes of other classes, while imprecise classes accept the
objects that may be close to the prototypes of more than one class. This soft
decision mechanism could make the clustering results more cautious and reduce
the misclassification rates. Experiments in synthetic and real data sets are
used to illustrate the performance of ECMdd. The results show that ECMdd could
capture well the uncertainty in the internal data structure. Moreover, it is
more robust to the initializations compared with FCMdd.Comment: in The 18th International Conference on Information Fusion, July
2015, Washington, DC, USA , Jul 2015, Washington, United State
The Advantage of Evidential Attributes in Social Networks
Nowadays, there are many approaches designed for the task of detecting
communities in social networks. Among them, some methods only consider the
topological graph structure, while others take use of both the graph structure
and the node attributes. In real-world networks, there are many uncertain and
noisy attributes in the graph. In this paper, we will present how we detect
communities in graphs with uncertain attributes in the first step. The
numerical, probabilistic as well as evidential attributes are generated
according to the graph structure. In the second step, some noise will be added
to the attributes. We perform experiments on graphs with different types of
attributes and compare the detection results in terms of the Normalized Mutual
Information (NMI) values. The experimental results show that the clustering
with evidential attributes gives better results comparing to those with
probabilistic and numerical attributes. This illustrates the advantages of
evidential attributes.Comment: 20th International Conference on Information Fusion, Jul 2017, Xi'an,
Chin
Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
Clustering non-Euclidean data is difficult, and one of the most used
algorithms besides hierarchical clustering is the popular algorithm
Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In
Euclidean geometry the mean-as used in k-means-is a good estimator for the
cluster center, but this does not hold for arbitrary dissimilarities. PAM uses
the medoid instead, the object with the smallest dissimilarity to all others in
the cluster. This notion of centrality can be used with any (dis-)similarity,
and thus is of high relevance to many domains such as biology that require the
use of Jaccard, Gower, or more complex distances.
A key issue with PAM is its high run time cost. We propose modifications to
the PAM algorithm to achieve an O(k)-fold speedup in the second SWAP phase of
the algorithm, but will still find the same results as the original PAM
algorithm. If we slightly relax the choice of swaps performed (at comparable
quality), we can further accelerate the algorithm by performing up to k swaps
in each iteration. With the substantially faster SWAP, we can now also explore
alternative strategies for choosing the initial medoids. We also show how the
CLARA and CLARANS algorithms benefit from these modifications. It can easily be
combined with earlier approaches to use PAM and CLARA on big data (some of
which use PAM as a subroutine, hence can immediately benefit from these
improvements), where the performance with high k becomes increasingly
important.
In experiments on real data with k=100, we observed a 200-fold speedup
compared to the original PAM SWAP algorithm, making PAM applicable to larger
data sets as long as we can afford to compute a distance matrix, and in
particular to higher k (at k=2, the new SWAP was only 1.5 times faster, as the
speedup is expected to increase with k)
Assessing luminosity correlations via cluster analysis: Evidence for dual tracks in the radio/X-ray domain of black hole X-ray binaries
[abridged] The radio:X-ray correlation for hard and quiescent state black
hole X-ray binaries is critically investigated in this paper. New observations
of known sources, along with newly discovered ones, have resulted in an
increasingly large number of outliers lying well outside the scatter about the
quoted best-fit relation. Here, we employ and compare state of the art data
clustering techniques in order to identify and characterize different data
groupings within the radio:X-ray luminosity plane for 18 hard and quiescent
state black hole X-ray binaries with nearly simultaneous multi-wavelength
coverage. Linear regression is then carried out on the clustered data to infer
the parameters of a relationship of the form {ell}_{r}=alpha+beta {ell}_x
through a Bayesian approach (where {ell} denotes log lum). We conclude that the
two cluster model, with independent linear fits, is a significant improvement
over fitting all points as a single cluster. While the upper track slope
(0.63\pm0.03) is consistent, within the errors, with the fitted slope for the
2003 relation (0.7\pm0.1), the lower track slope (0.98\pm0.08) is not
consistent with the upper track, nor it is with the widely adopted value of
~1.4 for the neutron stars. The two luminosity tracks do not reflect systematic
differences in black hole spins as estimated either from reflection, or
continuum fitting method. These results are insensitive to the selection of
sub-samples, accuracy in the distances, and to the treatment of upper limits.
Besides introducing a further level of complexity in understanding the
interplay between synchrotron and Comptonised emission from black hole X-ray
binaries, the existence of two tracks in the radio:X-ray domain underscores
that a high level of caution must be exercised when employing black hole
luminosity relations for the purpose of estimating a third parameter, such as
distance or mass.Comment: MNRAS, in press (10 pages, 7 figures
A similarity-based community detection method with multiple prototype representation
Communities are of great importance for understanding graph structures in
social networks. Some existing community detection algorithms use a single
prototype to represent each group. In real applications, this may not
adequately model the different types of communities and hence limits the
clustering performance on social networks. To address this problem, a
Similarity-based Multi-Prototype (SMP) community detection approach is proposed
in this paper. In SMP, vertices in each community carry various weights to
describe their degree of representativeness. This mechanism enables each
community to be represented by more than one node. The centrality of nodes is
used to calculate prototype weights, while similarity is utilized to guide us
to partitioning the graph. Experimental results on computer generated and
real-world networks clearly show that SMP performs well for detecting
communities. Moreover, the method could provide richer information for the
inner structure of the detected communities with the help of prototype weights
compared with the existing community detection models
- …