23,906 research outputs found

    Multi-GPU Graph Analytics

    Full text link
    We present a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges. Directly using the single-GPU implementations, our design only requires programmers to specify a few algorithm-dependent concerns, hiding most multi-GPU related implementation details. We analyze the theoretical and practical limits to scalability in the context of varying graph primitives and datasets. We describe several optimizations, such as direction optimizing traversal, and a just-enough memory allocation scheme, for better performance and smaller memory consumption. Compared to previous work, we achieve best-of-class performance across operations and datasets, including excellent strong and weak scalability on most primitives as we increase the number of GPUs in the system.Comment: 12 pages. Final version submitted to IPDPS 201

    Gunrock: A High-Performance Graph Processing Library on the GPU

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

    Data fragmentation for parallel transitive closure strategies

    Get PDF
    Addresses the problem of fragmenting a relation to make the parallel computation of the transitive closure efficient, based on the disconnection set approach. To better understand this design problem, the authors focus on transportation networks. These are characterized by loosely interconnected clusters of nodes with a high internal connectivity rate. Three requirements that have to be fulfilled by a fragmentation are formulated, and three different fragmentation strategies are presented, each emphasizing one of these requirements. Some test results are presented to show the performance of the various fragmentation strategie

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

    Persistent Homology Guided Force-Directed Graph Layouts

    Full text link
    Graphs are commonly used to encode relationships among entities, yet their abstractness makes them difficult to analyze. Node-link diagrams are popular for drawing graphs, and force-directed layouts provide a flexible method for node arrangements that use local relationships in an attempt to reveal the global shape of the graph. However, clutter and overlap of unrelated structures can lead to confusing graph visualizations. This paper leverages the persistent homology features of an undirected graph as derived information for interactive manipulation of force-directed layouts. We first discuss how to efficiently extract 0-dimensional persistent homology features from both weighted and unweighted undirected graphs. We then introduce the interactive persistence barcode used to manipulate the force-directed graph layout. In particular, the user adds and removes contracting and repulsing forces generated by the persistent homology features, eventually selecting the set of persistent homology features that most improve the layout. Finally, we demonstrate the utility of our approach across a variety of synthetic and real datasets

    Distributed Edge Connectivity in Sublinear Time

    Full text link
    We present the first sublinear-time algorithm for a distributed message-passing network sto compute its edge connectivity λ\lambda exactly in the CONGEST model, as long as there are no parallel edges. Our algorithm takes O~(n11/353D1/353+n11/706)\tilde O(n^{1-1/353}D^{1/353}+n^{1-1/706}) time to compute λ\lambda and a cut of cardinality λ\lambda with high probability, where nn and DD are the number of nodes and the diameter of the network, respectively, and O~\tilde O hides polylogarithmic factors. This running time is sublinear in nn (i.e. O~(n1ϵ)\tilde O(n^{1-\epsilon})) whenever DD is. Previous sublinear-time distributed algorithms can solve this problem either (i) exactly only when λ=O(n1/8ϵ)\lambda=O(n^{1/8-\epsilon}) [Thurimella PODC'95; Pritchard, Thurimella, ACM Trans. Algorithms'11; Nanongkai, Su, DISC'14] or (ii) approximately [Ghaffari, Kuhn, DISC'13; Nanongkai, Su, DISC'14]. To achieve this we develop and combine several new techniques. First, we design the first distributed algorithm that can compute a kk-edge connectivity certificate for any k=O(n1ϵ)k=O(n^{1-\epsilon}) in time O~(nk+D)\tilde O(\sqrt{nk}+D). Second, we show that by combining the recent distributed expander decomposition technique of [Chang, Pettie, Zhang, SODA'19] with techniques from the sequential deterministic edge connectivity algorithm of [Kawarabayashi, Thorup, STOC'15], we can decompose the network into a sublinear number of clusters with small average diameter and without any mincut separating a cluster (except the `trivial' ones). Finally, by extending the tree packing technique from [Karger STOC'96], we can find the minimum cut in time proportional to the number of components. As a byproduct of this technique, we obtain an O~(n)\tilde O(n)-time algorithm for computing exact minimum cut for weighted graphs.Comment: Accepted at 51st ACM Symposium on Theory of Computing (STOC 2019

    On the Complexity of Local Distributed Graph Problems

    Full text link
    This paper is centered on the complexity of graph problems in the well-studied LOCAL model of distributed computing, introduced by Linial [FOCS '87]. It is widely known that for many of the classic distributed graph problems (including maximal independent set (MIS) and (Δ+1)(\Delta+1)-vertex coloring), the randomized complexity is at most polylogarithmic in the size nn of the network, while the best deterministic complexity is typically 2O(logn)2^{O(\sqrt{\log n})}. Understanding and narrowing down this exponential gap is considered to be one of the central long-standing open questions in the area of distributed graph algorithms. We investigate the problem by introducing a complexity-theoretic framework that allows us to shed some light on the role of randomness in the LOCAL model. We define the SLOCAL model as a sequential version of the LOCAL model. Our framework allows us to prove completeness results with respect to the class of problems which can be solved efficiently in the SLOCAL model, implying that if any of the complete problems can be solved deterministically in logO(1)n\log^{O(1)} n rounds in the LOCAL model, we can deterministically solve all efficient SLOCAL-problems (including MIS and (Δ+1)(\Delta+1)-coloring) in logO(1)n\log^{O(1)} n rounds in the LOCAL model. We show that a rather rudimentary looking graph coloring problem is complete in the above sense: Color the nodes of a graph with colors red and blue such that each node of sufficiently large polylogarithmic degree has at least one neighbor of each color. The problem admits a trivial zero-round randomized solution. The result can be viewed as showing that the only obstacle to getting efficient determinstic algorithms in the LOCAL model is an efficient algorithm to approximately round fractional values into integer values
    corecore