4,118 research outputs found

    Comprehensive evaluation of a New GPU-based approach to the shortest-path problem

    Get PDF
    Producción CientíficaThe Single-Source Shortest Path (SSSP) problem arises in many different fields. In this paper, we present a GPU SSSP algorithm implementation. Our work significantly speeds up the computation of the SSSP, not only with respect to a CPU-based version, but also to other state-of-the-art GPU implementations based on Dijkstra. Both GPU implementations have been evaluated using the latest NVIDIA architectures. The graphs chosen as input sets vary in nature, size, and fan-out degree, in order to evaluate the behavior of the algorithms for different data classes. Additionally, we have enhanced our GPU algorithm implementation using two optimization techniques: The use of a proper choice of threadblock size; and the modification of the GPU L1 cache memory state of NVIDIA devices. These optimizations lead to performance improvements of up to 23% with respect to the non-optimized versions. In addition, we have made a platform comparison of several NVIDIA boards in order to distinguish which one is better for each class of graphs, depending on their features. Finally, we compare our results with an optimized sequential implementation of Dijkstra's algorithm included in the reference Boost library, obtaining an improvement ratio of up to 19x for some graph families, using less memory space.Ministerio de Economía y Competitividad (Spain) and ERDF program of the European Union: CAPAP-H5 network (TIN2014-53522- REDT), MOGECOPP project (TIN2011-25639); Junta de Castilla y León (Spain): ATLAS project (VA172A12-2); and the COST Program Action IC1305: NESUS

    GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

    Full text link
    High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel hardware, and (3) graph problems having low arithmetic intensity. To address some of these challenges, GraphBLAS is an innovative, on-going effort by the graph analytics community to propose building blocks based on sparse linear algebra, which will allow graph algorithms to be expressed in a performant, succinct, composable and portable manner. In this paper, we examine the performance challenges of a linear-algebra-based approach to building graph frameworks and describe new design principles for overcoming these bottlenecks. Among the new design principles is exploiting input sparsity, which allows users to write graph algorithms without specifying push and pull direction. Exploiting output sparsity allows users to tell the backend which values of the output in a single vectorized computation they do not want computed. Load-balancing is an important feature for balancing work amongst parallel workers. We describe the important load-balancing features for handling graphs with different characteristics. The design principles described in this paper have been implemented in "GraphBLAST", the first high-performance linear algebra-based graph framework on NVIDIA GPUs that is open-source. The results show that on a single GPU, GraphBLAST has on average at least an order of magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL, comparable performance to the fastest GPU hardwired primitives and shared-memory graph frameworks Ligra and Gunrock, and better performance than any other GPU graph framework, while offering a simpler and more concise programming model.Comment: 50 pages, 14 figures, 14 table

    Connected component identification and cluster update on GPU

    Full text link
    Cluster identification tasks occur in a multitude of contexts in physics and engineering such as, for instance, cluster algorithms for simulating spin models, percolation simulations, segmentation problems in image processing, or network analysis. While it has been shown that graphics processing units (GPUs) can result in speedups of two to three orders of magnitude as compared to serial codes on CPUs for the case of local and thus naturally parallelized problems such as single-spin flip update simulations of spin models, the situation is considerably more complicated for the non-local problem of cluster or connected component identification. I discuss the suitability of different approaches of parallelization of cluster labeling and cluster update algorithms for calculations on GPU and compare to the performance of serial implementations.Comment: 15 pages, 14 figures, one table, submitted to PR

    Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

    Full text link
    In this paper, we address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50x and 85x with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.Comment: 37 pages, 16 figure

    Viewpoint Discovery and Understanding in Social Networks

    Full text link
    The Web has evolved to a dominant platform where everyone has the opportunity to express their opinions, to interact with other users, and to debate on emerging events happening around the world. On the one hand, this has enabled the presence of different viewpoints and opinions about a - usually controversial - topic (like Brexit), but at the same time, it has led to phenomena like media bias, echo chambers and filter bubbles, where users are exposed to only one point of view on the same topic. Therefore, there is the need for methods that are able to detect and explain the different viewpoints. In this paper, we propose a graph partitioning method that exploits social interactions to enable the discovery of different communities (representing different viewpoints) discussing about a controversial topic in a social network like Twitter. To explain the discovered viewpoints, we describe a method, called Iterative Rank Difference (IRD), which allows detecting descriptive terms that characterize the different viewpoints as well as understanding how a specific term is related to a viewpoint (by detecting other related descriptive terms). The results of an experimental evaluation showed that our approach outperforms state-of-the-art methods on viewpoint discovery, while a qualitative analysis of the proposed IRD method on three different controversial topics showed that IRD provides comprehensive and deep representations of the different viewpoints
    corecore