3 research outputs found
Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach
Betweenness centrality (BC) is one of the most used centrality measures for
network analysis, which seeks to describe the importance of nodes in a network
in terms of the fraction of shortest paths that pass through them. It is key to
many valuable applications, including community detection and network
dismantling. Computing BC scores on large networks is computationally
challenging due to high time complexity. Many approximation algorithms have
been proposed to speed up the estimation of BC, which are mainly
sampling-based. However, these methods are still prone to considerable
execution time on large-scale networks, and their results are often exacerbated
when small changes happen to the network structures. In this paper, we focus on
identifying nodes with high BC in a graph, since many application scenarios are
built upon retrieving nodes with top-k BC. Different from previous heuristic
methods, we turn this task into a learning problem and design an
encoder-decoder based framework to resolve the problem. More specifcally, the
encoder leverages the network structure to encode each node into an embedding
vector, which captures the important structural information of the node. The
decoder transforms the embedding vector for each node into a scalar, which
captures the relative rank of this node in terms of BC. We use the pairwise
ranking loss to train the model to identify the orders of nodes regarding their
BC. By training on small-scale networks, the learned model is capable of
assigning relative BC scores to nodes for any unseen networks, and thus
identifying the highly-ranked nodes. Comprehensive experiments on both
synthetic and real-world networks demonstrate that, compared to representative
baselines, our model drastically speeds up the prediction without noticeable
sacrifce in accuracy, and outperforms the state-of-the-art by accuracy on
several large real-world networks.Comment: 10 pages, 4 figures, 8 table
Scaling Betweenness Approximation to Billions of Edges by MPI-based Adaptive Sampling
Betweenness centrality is one of the most popular vertex centrality measures
in network analysis. Hence, many (sequential and parallel) algorithms to
compute or approximate betweenness have been devised. Recent algorithmic
advances have made it possible to approximate betweenness very efficiently on
shared-memory architectures. Yet, the best shared-memory algorithms can still
take hours of running time for large graphs, especially for graphs with a high
diameter or when a small relative error is required.
In this work, we present an MPI-based generalization of the state-of-the-art
shared-memory algorithm for betweenness approximation. This algorithm is based
on adaptive sampling; our parallelization strategy can be applied in the same
manner to adaptive sampling algorithms for other problems. In experiments on a
16-node cluster, our MPI-based implementation is by a factor of 16.1x faster
than the state-of-the-art shared-memory implementation when considering our
parallelization focus -- the adaptive sampling phase -- only. For the complete
algorithm, we obtain an average (geom. mean) speedup factor of 7.4x over the
state of the art. For some previously very challenging inputs, this speedup is
much higher. As a result, our algorithm is the first to approximate betweenness
centrality on graphs with several billion edges in less than ten minutes with
high accuracy
Guidelines for Experimental Algorithmics in Network Analysis
The field of network science is a highly interdisciplinary area; for the
empirical analysis of network data, it draws algorithmic methodologies from
several research fields. Hence, research procedures and descriptions of the
technical results often differ, sometimes widely. In this paper we focus on
methodologies for the experimental part of algorithm engineering for network
analysis -- an important ingredient for a research area with empirical focus.
More precisely, we unify and adapt existing recommendations from different
fields and propose universal guidelines -- including statistical analyses --
for the systematic evaluation of network analysis algorithms. This way, the
behavior of newly proposed algorithms can be properly assessed and comparisons
to existing solutions become meaningful. Moreover, as the main technical
contribution, we provide SimexPal, a highly automated tool to perform and
analyze experiments following our guidelines. To illustrate the merits of
SimexPal and our guidelines, we apply them in a case study: we design, perform,
visualize and evaluate experiments of a recent algorithm for approximating
betweenness centrality, an important problem in network analysis. In summary,
both our guidelines and SimexPal shall modernize and complement previous
efforts in experimental algorithmics; they are not only useful for network
analysis, but also in related contexts