5,069 research outputs found

    Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach

    Get PDF
    While it is well-known and acknowledged that the performance of graph algorithms is heavily dependent on the input data, there has been surprisingly little research to quantify and predict the impact the graph structure has on performance. Parallel graph algorithms, running on many-core systems such as GPUs, are no exception: most research has focused on how to efficiently implement and tune different graph operations on a specific GPU. However, the performance impact of the input graph has only been taken into account indirectly as a result of the graphs used to benchmark the system. In this work, we present a case study investigating how to use the properties of the input graph to improve the performance of the breadth-first search (BFS) graph traversal. To do so, we first study the performance variation of 15 different BFS implementations across 248 graphs. Using this performance data, we show that significant speed-up can be achieved by combining the best implementation for each level of the traversal. To make use of this data-dependent optimization, we must correctly predict the relative performance of algorithms per graph level, and enable dynamic switching to the optimal algorithm for each level at runtime. We use the collected performance data to train a binary decision tree, to enable high-accuracy predictions and fast switching. We demonstrate empirically that our decision tree is both fast enough to allow dynamic switching between implementations, without noticeable overhead, and accurate enough in its prediction to enable significant BFS speedup. We conclude that our model-driven approach (1) enables BFS to outperform state of the art GPU algorithms, and (2) can be adapted for other BFS variants, other algorithms, or more specific datasets

    Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access

    Full text link
    Pregel is a popular distributed computing model for dealing with large-scale graphs. However, it can be tricky to implement graph algorithms correctly and efficiently in Pregel's vertex-centric model, especially when the algorithm has multiple computation stages, complicated data dependencies, or even communication over dynamic internal data structures. Some domain-specific languages (DSLs) have been proposed to provide more intuitive ways to implement graph algorithms, but due to the lack of support for remote access --- reading or writing attributes of other vertices through references --- they cannot handle the above mentioned dynamic communication, causing a class of Pregel algorithms with fast convergence impossible to implement. To address this problem, we design and implement Palgol, a more declarative and powerful DSL which supports remote access. In particular, programmers can use a more declarative syntax called chain access to naturally specify dynamic communication as if directly reading data on arbitrary remote vertices. By analyzing the logic patterns of chain access, we provide a novel algorithm for compiling Palgol programs to efficient Pregel code. We demonstrate the power of Palgol by using it to implement several practical Pregel algorithms, and the evaluation result shows that the efficiency of Palgol is comparable with that of hand-written code.Comment: 12 pages, 10 figures, extended version of APLAS 2017 pape

    A Simple Deterministic Distributed MST Algorithm, with Near-Optimal Time and Message Complexities

    Full text link
    Distributed minimum spanning tree (MST) problem is one of the most central and fundamental problems in distributed graph algorithms. Garay et al. \cite{GKP98,KP98} devised an algorithm with running time O(D+nβ‹…logβ‘βˆ—n)O(D + \sqrt{n} \cdot \log^* n), where DD is the hop-diameter of the input nn-vertex mm-edge graph, and with message complexity O(m+n3/2)O(m + n^{3/2}). Peleg and Rubinovich \cite{PR99} showed that the running time of the algorithm of \cite{KP98} is essentially tight, and asked if one can achieve near-optimal running time **together with near-optimal message complexity**. In a recent breakthrough, Pandurangan et al. \cite{PRS16} answered this question in the affirmative, and devised a **randomized** algorithm with time O~(D+n)\tilde{O}(D+ \sqrt{n}) and message complexity O~(m)\tilde{O}(m). They asked if such a simultaneous time- and message-optimality can be achieved by a **deterministic** algorithm. In this paper, building upon the work of \cite{PRS16}, we answer this question in the affirmative, and devise a **deterministic** algorithm that computes MST in time O((D+n)β‹…log⁑n)O((D + \sqrt{n}) \cdot \log n), using O(mβ‹…log⁑n+nlog⁑nβ‹…logβ‘βˆ—n)O(m \cdot \log n + n \log n \cdot \log^* n) messages. The polylogarithmic factors in the time and message complexities of our algorithm are significantly smaller than the respective factors in the result of \cite{PRS16}. Also, our algorithm and its analysis are very **simple** and self-contained, as opposed to rather complicated previous sublinear-time algorithms \cite{GKP98,KP98,E04b,PRS16}

    A Faster Parameterized Algorithm for Treedepth

    Full text link
    The width measure \emph{treedepth}, also known as vertex ranking, centered coloring and elimination tree height, is a well-established notion which has recently seen a resurgence of interest. We present an algorithm which---given as input an nn-vertex graph, a tree decomposition of the graph of width ww, and an integer tt---decides Treedepth, i.e. whether the treedepth of the graph is at most tt, in time 2O(wt)β‹…n2^{O(wt)} \cdot n. If necessary, a witness structure for the treedepth can be constructed in the same running time. In conjunction with previous results we provide a simple algorithm and a fast algorithm which decide treedepth in time 22O(t)β‹…n2^{2^{O(t)}} \cdot n and 2O(t2)β‹…n2^{O(t^2)} \cdot n, respectively, which do not require a tree decomposition as part of their input. The former answers an open question posed by Ossona de Mendez and Nesetril as to whether deciding Treedepth admits an algorithm with a linear running time (for every fixed tt) that does not rely on Courcelle's Theorem or other heavy machinery. For chordal graphs we can prove a running time of 2O(tlog⁑t)β‹…n2^{O(t \log t)}\cdot n for the same algorithm.Comment: An extended abstract was published in ICALP 2014, Track
    • …
    corecore