33,960 research outputs found
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm
We introduce a community detection algorithm (Fluid Communities) based on the
idea of fluids interacting in an environment, expanding and contracting as a
result of that interaction. Fluid Communities is based on the propagation
methodology, which represents the state-of-the-art in terms of computational
cost and scalability. While being highly efficient, Fluid Communities is able
to find communities in synthetic graphs with an accuracy close to the current
best alternatives. Additionally, Fluid Communities is the first
propagation-based algorithm capable of identifying a variable number of
communities in network. To illustrate the relevance of the algorithm, we
evaluate the diversity of the communities found by Fluid Communities, and find
them to be significantly different from the ones found by alternative methods.Comment: Accepted at the 6th International Conference on Complex Networks and
Their Application
Node-Centric Detection of Overlapping Communities in Social Networks
We present NECTAR, a community detection algorithm that generalizes Louvain
method's local search heuristic for overlapping community structures. NECTAR
chooses dynamically which objective function to optimize based on the network
on which it is invoked. Our experimental evaluation on both synthetic benchmark
graphs and real-world networks, based on ground-truth communities, shows that
NECTAR provides excellent results as compared with state of the art community
detection algorithms
- …