324 research outputs found
Defining and Evaluating Network Communities based on Ground-truth
Nodes in real-world networks organize into densely linked communities where
edges appear with high concentration among the members of the community.
Identifying such communities of nodes has proven to be a challenging task
mainly due to a plethora of definitions of a community, intractability of
algorithms, issues with evaluation and the lack of a reliable gold-standard
ground-truth.
In this paper we study a set of 230 large real-world social, collaboration
and information networks where nodes explicitly state their group memberships.
For example, in social networks nodes explicitly join various interest based
social groups. We use such groups to define a reliable and robust notion of
ground-truth communities. We then propose a methodology which allows us to
compare and quantitatively evaluate how different structural definitions of
network communities correspond to ground-truth communities. We choose 13
commonly used structural definitions of network communities and examine their
sensitivity, robustness and performance in identifying the ground-truth. We
show that the 13 structural definitions are heavily correlated and naturally
group into four classes. We find that two of these definitions, Conductance and
Triad-participation-ratio, consistently give the best performance in
identifying ground-truth communities. We also investigate a task of detecting
communities given a single seed node. We extend the local spectral clustering
algorithm into a heuristic parameter-free community detection method that
easily scales to networks with more than hundred million nodes. The proposed
method achieves 30% relative improvement over current local clustering methods.Comment: Proceedings of 2012 IEEE International Conference on Data Mining
(ICDM), 201
A new hierarchical clustering algorithm to identify non-overlapping like-minded communities
A network has a non-overlapping community structure if the nodes of the
network can be partitioned into disjoint sets such that each node in a set is
densely connected to other nodes inside the set and sparsely connected to the
nodes out- side it. There are many metrics to validate the efficacy of such a
structure, such as clustering coefficient, betweenness, centrality, modularity
and like-mindedness. Many methods have been proposed to optimize some of these
metrics, but none of these works well on the recently introduced metric
like-mindedness. To solve this problem, we propose a be- havioral property
based algorithm to identify communities that optimize the like-mindedness
metric and compare its performance on this metric with other behavioral data
based methodologies as well as community detection methods that rely only on
structural data. We execute these algorithms on real-life datasets of
Filmtipset and Twitter and show that our algorithm performs better than the
existing algorithms with respect to the like-mindedness metric
On methods to assess the significance of community structure in networks of financial time series
We consider the problem of determining whether the community
structure found by a clustering algorithm applied to nancial
time series is statistically signi cant, or is due to pure chance, when
no other information than the observed values and a similarity measure
among time series are available. As a subsidiary problem we also analyse
the in
uence of the choice of similarity measure in the accuracy of the
clustering method.
We propose two raw-data based methods for assessing robustness of clustering
algorithms on time-dependent data linked by a relation of similarity:
One based on community scoring functions that quantify some topological
property that characterises ground-truth communities, and another
based on random perturbations and quanti cation of the variation
in the community structure. These methodologies are well-established in
the realm of unweighted networks; our contribution are versions of these
methodologies properly adapted to complete weighted networks.Peer ReviewedPostprint (published version
- …