1,709 research outputs found

    Compressing networks with super nodes

    Full text link
    Community detection is a commonly used technique for identifying groups in a network based on similarities in connectivity patterns. To facilitate community detection in large networks, we recast the network to be partitioned into a smaller network of 'super nodes', each super node comprising one or more nodes in the original network. To define the seeds of our super nodes, we apply the 'CoreHD' ranking from dismantling and decycling. We test our approach through the analysis of two common methods for community detection: modularity maximization with the Louvain algorithm and maximum likelihood optimization for fitting a stochastic block model. Our results highlight that applying community detection to the compressed network of super nodes is significantly faster while successfully producing partitions that are more aligned with the local network connectivity, more stable across multiple (stochastic) runs within and between community detection algorithms, and overlap well with the results obtained using the full network

    Streaming Graph Challenge: Stochastic Block Partition

    Full text link
    An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing Conference (HPEC

    The performance of modularity maximization in practical contexts

    Full text link
    Although widely used in practice, the behavior and accuracy of the popular module identification technique called modularity maximization is not well understood in practical contexts. Here, we present a broad characterization of its performance in such situations. First, we revisit and clarify the resolution limit phenomenon for modularity maximization. Second, we show that the modularity function Q exhibits extreme degeneracies: it typically admits an exponential number of distinct high-scoring solutions and typically lacks a clear global maximum. Third, we derive the limiting behavior of the maximum modularity Q_max for one model of infinitely modular networks, showing that it depends strongly both on the size of the network and on the number of modules it contains. Finally, using three real-world metabolic networks as examples, we show that the degenerate solutions can fundamentally disagree on many, but not all, partition properties such as the composition of the largest modules and the distribution of module sizes. These results imply that the output of any modularity maximization procedure should be interpreted cautiously in scientific contexts. They also explain why many heuristics are often successful at finding high-scoring partitions in practice and why different heuristics can disagree on the modular structure of the same network. We conclude by discussing avenues for mitigating some of these behaviors, such as combining information from many degenerate solutions or using generative models.Comment: 20 pages, 14 figures, 6 appendices; code available at http://www.santafe.edu/~aaronc/modularity

    Network Community Detection On Small Quantum Computers

    Full text link
    In recent years a number of quantum computing devices with small numbers of qubits became available. We present a hybrid quantum local search (QLS) approach that combines a classical machine and a small quantum device to solve problems of practical size. The proposed approach is applied to the network community detection problem. QLS is hardware-agnostic and easily extendable to new quantum computing devices as they become available. We demonstrate it to solve the 2-community detection problem on graphs of size up to 410 vertices using the 16-qubit IBM quantum computer and D-Wave 2000Q, and compare their performance with the optimal solutions. Our results demonstrate that QLS perform similarly in terms of quality of the solution and the number of iterations to convergence on both types of quantum computers and it is capable of achieving results comparable to state-of-the-art solvers in terms of quality of the solution including reaching the optimal solutions

    Column generation algorithms for exact modularity maximization in networks

    No full text
    International audienceFinding modules, or clusters, in networks currently attracts much attention in several domains. The most studied criterion for doing so, due to Newman and Girvan [Phys. Rev. E 69, 026113 (2004)], is modularity maximization. Many heuristics have been proposed for maximizing modularity and yield rapidly near optimal solution or sometimes optimal ones but without a guarantee of optimality. There are few exact algorithms, prominent among which is a paper by Xu et al. [Eur. Phys. J. B 60, 231 (2007)]. Modularity maximization can also be expressed as a clique partitioning problem and the row generation algorithm of Grötschel and Wakabayashi [Math. Program. 45, 59 (1989)] applied. We propose to extend both of these algorithms using the powerful column generation methods for linear and non linear integer programming. Performance of the four resulting algorithms is compared on problems from the literature. Instances with up to 512 entities are solved exactly. Moreover, the computing time of previously solved problems are reduced substantially

    Efficient modularity density heuristics in graph clustering and their applications

    Get PDF
    Modularity Density Maximization is a graph clustering problem which avoids the resolution limit degeneracy of the Modularity Maximization problem. This thesis aims at solving larger instances than current Modularity Density heuristics do, and show how close the obtained solutions are to the expected clustering. Three main contributions arise from this objective. The first one is about the theoretical contributions about properties of Modularity Density based prioritizers. The second one is the development of eight Modularity Density Maximization heuristics. Our heuristics are compared with optimal results from the literature, and with GAOD, iMeme-Net, HAIN, BMD- heuristics. Our results are also compared with CNM and Louvain which are heuristics for Modularity Maximization that solve instances with thousands of nodes. The tests were carried out by using graphs from the “Stanford Large Network Dataset Collection”. The experiments have shown that our eight heuristics found solutions for graphs with hundreds of thousands of nodes. Our results have also shown that five of our heuristics surpassed the current state-of-the-art Modularity Density Maximization heuristic solvers for large graphs. A third contribution is the proposal of six column generation methods. These methods use exact and heuristic auxiliary solvers and an initial variable generator. Comparisons among our proposed column generations and state-of-the-art algorithms were also carried out. The results showed that: (i) two of our methods surpassed the state-of-the-art algorithms in terms of time, and (ii) our methods proved the optimal value for larger instances than current approaches can tackle. Our results suggest clear improvements to the state-of-the-art results for the Modularity Density Maximization problem
    • …
    corecore