3,280 research outputs found

    Streaming Graph Challenge: Stochastic Block Partition

    Full text link
    An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing Conference (HPEC

    Parallel Community Detection in Incremental Graphs

    Get PDF
    The problem of community detection in large, expanding real-world networks presents significant challenges due to the scale and complexity of these networks. Traditional algorithms struggle to provide optimal solutions or require unviable computational resources. In this thesis, we address these challenges by exploring, designing and evaluating parallel computing strategies for community detection in incremental graphs. We provide a novel parallel implementation of the NCLiC algorithm by dividing its phases into parallel tasks using a shared memory approach. The algorithm has been extensively tested on various graphs. The results demonstrate promising performance improvements and scalability while retaining the quality of the partitions. The parallel implementation of the Leiden algorithm used for pre-clustering shows virtually no loss in modularity and obtained speedups up to a factor of 10.3. The refinement and merging phases of the parallel NCLiC algorithm obtained speedups up to 18.42 and 10.36, respectively, resulting in a total speedup of up to a factor of 6.73.Masteroppgave i informatikkINF399MAMN-INFMAMN-PRO

    Accelerating the Information-Theoretic Approach of Community Detection Using Distributed and Hybrid Memory Parallel Schemes

    Get PDF
    There are several approaches for discovering communities in a network (graph). Despite being approximating in nature, discovering communities based on the laws of Information Theory has a proven standard of accuracy. The information-theoretic algorithm known as Infomap developed a decade ago for detecting communities, did not foresee the tremendous growth of social networking, multimedia, and massive information boom. To discover communities in massive networks, we have designed a distributed-memory-parallel Infomap in the MPI framework. Our design reaches scalability of over 500 processes capable of processing networks with millions of edges while maintaining quality comparable to the sequential Infomap. We have further developed a novel parallel hybrid approach for Infomap consists of both distributed and shared memory parallelism using MPI and OpenMP frameworks. This achieves a speedup of more than 11x in processing a network of over 100 million edges which is significantly greater than the state-of-the-art techniques

    Scalable Online Betweenness Centrality in Evolving Graphs

    Full text link
    Betweenness centrality is a classic measure that quantifies the importance of a graph element (vertex or edge) according to the fraction of shortest paths passing through it. This measure is notoriously expensive to compute, and the best known algorithm runs in O(nm) time. The problems of efficiency and scalability are exacerbated in a dynamic setting, where the input is an evolving graph seen edge by edge, and the goal is to keep the betweenness centrality up to date. In this paper we propose the first truly scalable algorithm for online computation of betweenness centrality of both vertices and edges in an evolving graph where new edges are added and existing edges are removed. Our algorithm is carefully engineered with out-of-core techniques and tailored for modern parallel stream processing engines that run on clusters of shared-nothing commodity hardware. Hence, it is amenable to real-world deployment. We experiment on graphs that are two orders of magnitude larger than previous studies. Our method is able to keep the betweenness centrality measures up to date online, i.e., the time to update the measures is smaller than the inter-arrival time between two consecutive updates.Comment: 15 pages, 9 Figures, accepted for publication in IEEE Transactions on Knowledge and Data Engineerin
    corecore