1,709 research outputs found
Compressing networks with super nodes
Community detection is a commonly used technique for identifying groups in a
network based on similarities in connectivity patterns. To facilitate community
detection in large networks, we recast the network to be partitioned into a
smaller network of 'super nodes', each super node comprising one or more nodes
in the original network. To define the seeds of our super nodes, we apply the
'CoreHD' ranking from dismantling and decycling. We test our approach through
the analysis of two common methods for community detection: modularity
maximization with the Louvain algorithm and maximum likelihood optimization for
fitting a stochastic block model. Our results highlight that applying community
detection to the compressed network of super nodes is significantly faster
while successfully producing partitions that are more aligned with the local
network connectivity, more stable across multiple (stochastic) runs within and
between community detection algorithms, and overlap well with the results
obtained using the full network
Streaming Graph Challenge: Stochastic Block Partition
An important objective for analyzing real-world graphs is to achieve scalable
performance on large, streaming graphs. A challenging and relevant example is
the graph partition problem. As a combinatorial problem, graph partition is
NP-hard, but existing relaxation methods provide reasonable approximate
solutions that can be scaled for large graphs. Competitive benchmarks and
challenges have proven to be an effective means to advance state-of-the-art
performance and foster community collaboration. This paper describes a graph
partition challenge with a baseline partition algorithm of sub-quadratic
complexity. The algorithm employs rigorous Bayesian inferential methods based
on a statistical model that captures characteristics of the real-world graphs.
This strong foundation enables the algorithm to address limitations of
well-known graph partition approaches such as modularity maximization. This
paper describes various aspects of the challenge including: (1) the data sets
and streaming graph generator, (2) the baseline partition algorithm with
pseudocode, (3) an argument for the correctness of parallelizing the Bayesian
inference, (4) different parallel computation strategies such as node-based
parallelism and matrix-based parallelism, (5) evaluation metrics for partition
correctness and computational requirements, (6) preliminary timing of a
Python-based demonstration code and the open source C++ code, and (7)
considerations for partitioning the graph in streaming fashion. Data sets and
source code for the algorithm as well as metrics, with detailed documentation
are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing
Conference (HPEC
The performance of modularity maximization in practical contexts
Although widely used in practice, the behavior and accuracy of the popular
module identification technique called modularity maximization is not well
understood in practical contexts. Here, we present a broad characterization of
its performance in such situations. First, we revisit and clarify the
resolution limit phenomenon for modularity maximization. Second, we show that
the modularity function Q exhibits extreme degeneracies: it typically admits an
exponential number of distinct high-scoring solutions and typically lacks a
clear global maximum. Third, we derive the limiting behavior of the maximum
modularity Q_max for one model of infinitely modular networks, showing that it
depends strongly both on the size of the network and on the number of modules
it contains. Finally, using three real-world metabolic networks as examples, we
show that the degenerate solutions can fundamentally disagree on many, but not
all, partition properties such as the composition of the largest modules and
the distribution of module sizes. These results imply that the output of any
modularity maximization procedure should be interpreted cautiously in
scientific contexts. They also explain why many heuristics are often successful
at finding high-scoring partitions in practice and why different heuristics can
disagree on the modular structure of the same network. We conclude by
discussing avenues for mitigating some of these behaviors, such as combining
information from many degenerate solutions or using generative models.Comment: 20 pages, 14 figures, 6 appendices; code available at
http://www.santafe.edu/~aaronc/modularity
Network Community Detection On Small Quantum Computers
In recent years a number of quantum computing devices with small numbers of
qubits became available. We present a hybrid quantum local search (QLS)
approach that combines a classical machine and a small quantum device to solve
problems of practical size. The proposed approach is applied to the network
community detection problem. QLS is hardware-agnostic and easily extendable to
new quantum computing devices as they become available. We demonstrate it to
solve the 2-community detection problem on graphs of size up to 410 vertices
using the 16-qubit IBM quantum computer and D-Wave 2000Q, and compare their
performance with the optimal solutions. Our results demonstrate that QLS
perform similarly in terms of quality of the solution and the number of
iterations to convergence on both types of quantum computers and it is capable
of achieving results comparable to state-of-the-art solvers in terms of quality
of the solution including reaching the optimal solutions
Column generation algorithms for exact modularity maximization in networks
International audienceFinding modules, or clusters, in networks currently attracts much attention in several domains. The most studied criterion for doing so, due to Newman and Girvan [Phys. Rev. E 69, 026113 (2004)], is modularity maximization. Many heuristics have been proposed for maximizing modularity and yield rapidly near optimal solution or sometimes optimal ones but without a guarantee of optimality. There are few exact algorithms, prominent among which is a paper by Xu et al. [Eur. Phys. J. B 60, 231 (2007)]. Modularity maximization can also be expressed as a clique partitioning problem and the row generation algorithm of Grötschel and Wakabayashi [Math. Program. 45, 59 (1989)] applied. We propose to extend both of these algorithms using the powerful column generation methods for linear and non linear integer programming. Performance of the four resulting algorithms is compared on problems from the literature. Instances with up to 512 entities are solved exactly. Moreover, the computing time of previously solved problems are reduced substantially
Efficient modularity density heuristics in graph clustering and their applications
Modularity Density Maximization is a graph clustering problem which avoids the resolution limit degeneracy of the Modularity Maximization problem. This thesis aims at solving larger instances than current Modularity Density heuristics do, and show how close the obtained solutions are to the expected clustering. Three main contributions arise from this objective. The first one is about the theoretical contributions about properties of Modularity Density based prioritizers. The second one is the development of eight Modularity Density Maximization heuristics. Our heuristics are compared with optimal results from the literature, and with GAOD, iMeme-Net, HAIN, BMD- heuristics. Our results are also compared with CNM and Louvain which are heuristics for Modularity Maximization that solve instances with thousands of nodes. The tests were carried out by using graphs from the “Stanford Large Network Dataset Collection”. The experiments have shown that our eight heuristics found solutions for graphs with hundreds of thousands of nodes. Our results have also shown that five of our heuristics surpassed the current state-of-the-art Modularity Density Maximization heuristic solvers for large graphs. A third contribution is the proposal of six column generation methods. These methods use exact and heuristic auxiliary solvers and an initial variable generator. Comparisons among our proposed column generations and state-of-the-art algorithms were also carried out. The results showed that: (i) two of our methods surpassed the state-of-the-art algorithms in terms of time, and (ii) our methods proved the optimal value for larger instances than current approaches can tackle. Our results suggest clear improvements to the state-of-the-art results for the Modularity Density Maximization problem
- …