500 research outputs found
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
The Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity
Community detection is a classic problem in network science with extensive
applications in various fields. Among numerous approaches, the most common
method is modularity maximization. Despite their design philosophy and wide
adoption, heuristic modularity maximization algorithms rarely return an optimal
partition or anything similar. We propose a specialized algorithm, Bayan, which
returns partitions with a guarantee of either optimality or proximity to an
optimal partition. At the core of the Bayan algorithm is a branch-and-cut
scheme that solves an integer programming formulation of the problem to
optimality or approximate it within a factor. We demonstrate Bayan's
distinctive accuracy and stability over 21 other algorithms in retrieving
ground-truth communities in synthetic benchmarks and node labels in real
networks. Bayan is several times faster than open-source and commercial solvers
for modularity maximization making it capable of finding optimal partitions for
instances that cannot be optimized by any other existing method. Overall, our
assessments point to Bayan as a suitable choice for exact maximization of
modularity in networks with up to 3000 edges (in their largest connected
component) and approximating maximum modularity in larger networks on ordinary
computers.Comment: 6 pages, 2 figures, 1 tabl
Resolution of ranking hierarchies in directed networks
Identifying hierarchies and rankings of nodes in directed graphs is
fundamental in many applications such as social network analysis, biology,
economics, and finance. A recently proposed method identifies the hierarchy by
finding the ordered partition of nodes which minimises a score function, termed
agony. This function penalises the links violating the hierarchy in a way
depending on the strength of the violation. To investigate the resolution of
ranking hierarchies we introduce an ensemble of random graphs, the Ranked
Stochastic Block Model. We find that agony may fail to identify hierarchies
when the structure is not strong enough and the size of the classes is small
with respect to the whole network. We analytically characterise the resolution
threshold and we show that an iterated version of agony can partly overcome
this resolution limit.Comment: 27 pages, 9 figure
A generalised significance test for individual communities in networks
Many empirical networks have community structure, in which nodes are densely
interconnected within each community (i.e., a group of nodes) and sparsely
across different communities. Like other local and meso-scale structure of
networks, communities are generally heterogeneous in various aspects such as
the size, density of edges, connectivity to other communities and significance.
In the present study, we propose a method to statistically test the
significance of individual communities in a given network. Compared to the
previous methods, the present algorithm is unique in that it accepts different
community-detection algorithms and the corresponding quality function for
single communities. The present method requires that a quality of each
community can be quantified and that community detection is performed as
optimisation of such a quality function summed over the communities. Various
community detection algorithms including modularity maximisation and graph
partitioning meet this criterion. Our method estimates a distribution of the
quality function for randomised networks to calculate a likelihood of each
community in the given network. We illustrate our algorithm by synthetic and
empirical networks.Comment: 20 pages, 4 figures and 4 table
A Comprehensive Review of Community Detection in Graphs
The study of complex networks has significantly advanced our understanding of
community structures which serves as a crucial feature of real-world graphs.
Detecting communities in graphs is a challenging problem with applications in
sociology, biology, and computer science. Despite the efforts of an
interdisciplinary community of scientists, a satisfactory solution to this
problem has not yet been achieved. This review article delves into the topic of
community detection in graphs, which serves as a crucial role in understanding
the organization and functioning of complex systems. We begin by introducing
the concept of community structure, which refers to the arrangement of vertices
into clusters, with strong internal connections and weaker connections between
clusters. Then, we provide a thorough exposition of various community detection
methods, including a new method designed by us. Additionally, we explore
real-world applications of community detection in diverse networks. In
conclusion, this comprehensive review provides a deep understanding of
community detection in graphs. It serves as a valuable resource for researchers
and practitioners in multiple disciplines, offering insights into the
challenges, methodologies, and applications of community detection in complex
networks
Transformers for Capturing Multi-level Graph Structure using Hierarchical Distances
Graph transformers need strong inductive biases to derive meaningful
attention scores. Yet, current proposals rarely address methods capturing
longer ranges, hierarchical structures, or community structures, as they appear
in various graphs such as molecules, social networks, and citation networks. In
this paper, we propose a hierarchy-distance structural encoding (HDSE), which
models a hierarchical distance between the nodes in a graph focusing on its
multi-level, hierarchical nature. In particular, this yields a framework which
can be flexibly integrated with existing graph transformers, allowing for
simultaneous application with other positional representations. Through
extensive experiments on 12 real-world datasets, we demonstrate that our HDSE
method successfully enhances various types of baseline transformers, achieving
state-of-the-art empirical performances on 10 benchmark datasets
- …