1,373 research outputs found

    Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access

    Full text link
    Pregel is a popular distributed computing model for dealing with large-scale graphs. However, it can be tricky to implement graph algorithms correctly and efficiently in Pregel's vertex-centric model, especially when the algorithm has multiple computation stages, complicated data dependencies, or even communication over dynamic internal data structures. Some domain-specific languages (DSLs) have been proposed to provide more intuitive ways to implement graph algorithms, but due to the lack of support for remote access --- reading or writing attributes of other vertices through references --- they cannot handle the above mentioned dynamic communication, causing a class of Pregel algorithms with fast convergence impossible to implement. To address this problem, we design and implement Palgol, a more declarative and powerful DSL which supports remote access. In particular, programmers can use a more declarative syntax called chain access to naturally specify dynamic communication as if directly reading data on arbitrary remote vertices. By analyzing the logic patterns of chain access, we provide a novel algorithm for compiling Palgol programs to efficient Pregel code. We demonstrate the power of Palgol by using it to implement several practical Pregel algorithms, and the evaluation result shows that the efficiency of Palgol is comparable with that of hand-written code.Comment: 12 pages, 10 figures, extended version of APLAS 2017 pape

    Compile time prediction of intrinsic parallelism in high level programs

    Get PDF

    Safe data structure visualisation

    Get PDF

    Graph Locality Prefetcher for Graph Database

    Get PDF
    This work presents a hardware prefetcher to improve the performance of accessing graph data representing large and complex networks. We represent complex networks as graphs, and queries amount to traversals on the graph. Unlike conventional memory hierarchies that exploit spatial and temporal locality, we observe that graph traversals do not necessarily exhibit these same notions of locality. This results in degraded performance of the memory hierarchy. Consequently, our hardware prefetcher exploits locality that is intrinsic to graph traversals, which we call graph-locality to improve the performance of the memory hierarchy. We design and evaluate our prototype using a micro-architectural simulator, and deploy benchmarks from GDBench that is oriented to evaluate the performance of graph database systems.1 yea

    RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs

    Full text link
    Sparse matrix multiplication is an important kernel for large-scale graph processing and other data-intensive applications. In this paper, we implement various asynchronous, RDMA-based sparse times dense (SpMM) and sparse times sparse (SpGEMM) algorithms, evaluating their performance running in a distributed memory setting on GPUs. Our RDMA-based implementations use the NVSHMEM communication library for direct, asynchronous one-sided communication between GPUs. We compare our asynchronous implementations to state-of-the-art bulk synchronous GPU libraries as well as a CUDA-aware MPI implementation of the SUMMA algorithm. We find that asynchronous RDMA-based implementations are able to offer favorable performance compared to bulk synchronous implementations, while also allowing for the straightforward implementation of novel work stealing algorithms
    corecore