3,293 research outputs found

    Gunrock: A High-Performance Graph Processing Library on the GPU

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

    GraphLab: A New Framework for Parallel Machine Learning

    Full text link
    Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

    A Decentralized Parallelization-in-Time Approach with Parareal

    Get PDF
    With steadily increasing parallelism for high-performance architectures, simulations requiring a good strong scalability are prone to be limited in scalability with standard spatial-decomposition strategies at a certain amount of parallel processors. This can be a show-stopper if the simulation results have to be computed with wallclock time restrictions (e.g.\,for weather forecasts) or as fast as possible (e.g. for urgent computing). Here, the time-dimension is the only one left for parallelization and we focus on Parareal as one particular parallelization-in-time method. We discuss a software approach for making Parareal parallelization transparent for application developers, hence allowing fast prototyping for Parareal. Further, we introduce a decentralized Parareal which results in autonomous simulation instances which only require communicating with the previous and next simulation instances, hence with strong locality for communication. This concept is evaluated by a prototypical solver for the rotational shallow-water equations which we use as a representative black-box solver
    • …
    corecore