32,157 research outputs found
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
Defining and Evaluating Network Communities based on Ground-truth
Nodes in real-world networks organize into densely linked communities where
edges appear with high concentration among the members of the community.
Identifying such communities of nodes has proven to be a challenging task
mainly due to a plethora of definitions of a community, intractability of
algorithms, issues with evaluation and the lack of a reliable gold-standard
ground-truth.
In this paper we study a set of 230 large real-world social, collaboration
and information networks where nodes explicitly state their group memberships.
For example, in social networks nodes explicitly join various interest based
social groups. We use such groups to define a reliable and robust notion of
ground-truth communities. We then propose a methodology which allows us to
compare and quantitatively evaluate how different structural definitions of
network communities correspond to ground-truth communities. We choose 13
commonly used structural definitions of network communities and examine their
sensitivity, robustness and performance in identifying the ground-truth. We
show that the 13 structural definitions are heavily correlated and naturally
group into four classes. We find that two of these definitions, Conductance and
Triad-participation-ratio, consistently give the best performance in
identifying ground-truth communities. We also investigate a task of detecting
communities given a single seed node. We extend the local spectral clustering
algorithm into a heuristic parameter-free community detection method that
easily scales to networks with more than hundred million nodes. The proposed
method achieves 30% relative improvement over current local clustering methods.Comment: Proceedings of 2012 IEEE International Conference on Data Mining
(ICDM), 201
Node-Centric Detection of Overlapping Communities in Social Networks
We present NECTAR, a community detection algorithm that generalizes Louvain
method's local search heuristic for overlapping community structures. NECTAR
chooses dynamically which objective function to optimize based on the network
on which it is invoked. Our experimental evaluation on both synthetic benchmark
graphs and real-world networks, based on ground-truth communities, shows that
NECTAR provides excellent results as compared with state of the art community
detection algorithms
IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis
We conduct the most comprehensive study of WLAN traces to date. Measurements
collected from four major university campuses are analyzed with the aim of
developing fundamental understanding of realistic user behavior in wireless
networks. Both individual user and inter-node (group) behaviors are
investigated and two classes of metrics are devised to capture the underlying
structure of such behaviors.
For individual user behavior we observe distinct patterns in which most users
are 'on' for a small fraction of the time, the number of access points visited
is very small and the overall on-line user mobility is quite low. We clearly
identify categories of heavy and light users. In general, users exhibit high
degree of similarity over days and weeks.
For group behavior, we define metrics for encounter patterns and friendship.
Surprisingly, we find that a user, on average, encounters less than 6% of the
network user population within a month, and that encounter and friendship
relations are highly asymmetric. We establish that number of encounters follows
a biPareto distribution, while friendship indexes follow an exponential
distribution. We capture the encounter graph using a small world model, the
characteristics of which reach steady state after only one day.
We hope for our study to have a great impact on realistic modeling of network
usage and mobility patterns in wireless networks.Comment: 16 pages, 31 figure
Put three and three together: Triangle-driven community detection
Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its applications in many fields such as biology, social networks, or network traffic analysis. Although the existing metrics used to quantify the quality of a community work well in general, under some circumstances, they fail at correctly capturing such notion. The main reason is that these metrics consider the internal community edges as a set, but ignore how these actually connect the vertices of the community. We propose the Weighted Community Clustering (WCC), which is a new community metric that takes the triangle instead of the edge as the minimal structural motif indicating the presence of a strong relation in a graph. We theoretically analyse WCC in depth and formally prove, by means of a set of properties, that the maximization of WCC guarantees communities with cohesion and structure. In addition, we propose Scalable Community Detection (SCD), a community detection algorithm based on WCC, which is designed to be fast and scalable on SMP machines, showing experimentally that WCC correctly captures the concept of community in social networks using real datasets. Finally, using ground-truth data, we show that SCD provides better quality than the best disjoint community detection algorithms of the state of the art while performing faster.Peer ReviewedPostprint (author's final draft
Considerations about multistep community detection
The problem and implications of community detection in networks have raised a
huge attention, for its important applications in both natural and social
sciences. A number of algorithms has been developed to solve this problem,
addressing either speed optimization or the quality of the partitions
calculated. In this paper we propose a multi-step procedure bridging the
fastest, but less accurate algorithms (coarse clustering), with the slowest,
most effective ones (refinement). By adopting heuristic ranking of the nodes,
and classifying a fraction of them as `critical', a refinement step can be
restricted to this subset of the network, thus saving computational time.
Preliminary numerical results are discussed, showing improvement of the final
partition.Comment: 12 page
- …