2,306 research outputs found

    Feluca : A two-stage graph coloring algorithm with color-centric paradigm on GPU

    Get PDF
    In this paper, we propose a two-stage high-performance graph coloring algorithm, called Feluca, aiming to address the above challenges. Feluca combines the recursion-based method with the sequential spread-based method. In the first stage, Feluca uses a recursive routine to color a majority of vertices in the graph. Then, it switches to the sequential spread method to color the remaining vertices in order to avoid the conflicts of the recursive algorithm. Moreover, the following techniques are proposed to further improve the graph coloring performance. i) A new method is proposed to eliminate the cycles in the graph; ii) a top-down scheme is developed to avoid the atomic operation originally required for color selection; and iii) a novel color-centric coloring paradigm is designed to improve the degree of parallelism for the sequential spread part. All these newly developed techniques, together with further GPU-specific optimizations such as coalesced memory access, comprise an efficient parallel graph coloring solution in Feluca. We have conducted extensive experiments on NVIDIA GPUs. The results show that Feluca can achieve 1.76 - 12.98x speedup over the state-of-the-art algorithms

    Scalable partitioning for parallel position based dynamics

    Get PDF
    We introduce a practical partitioning technique designed for parallelizing Position Based Dynamics, and exploiting the ubiquitous multi-core processors present in current commodity GPUs. The input is a set of particles whose dynamics is influenced by spatial constraints. In the initialization phase, we build a graph in which each node corresponds to a constraint and two constraints are connected by an edge if they influence at least one common particle. We introduce a novel greedy algorithm for inserting additional constraints (phantoms) in the graph such that the resulting topology is q-colourable, where ˆ qˆ ≥ 2 is an arbitrary number. We color the graph, and the constraints with the same color are assigned to the same partition. Then, the set of constraints belonging to each partition is solved in parallel during the animation phase. We demonstrate this by using our partitioning technique; the performance hit caused by the GPU kernel calls is significantly decreased, leaving unaffected the visual quality, robustness and speed of serial position based dynamics

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU
    • …
    corecore