10 research outputs found

    An efficient parallel algorithm for mesh smoothing

    Full text link

    Ordering heuristics for parallel graph coloring

    Full text link
    This paper introduces the largest-log-degree-first (LLF) and smallest-log-degree-last (SLL) ordering heuristics for paral-lel greedy graph-coloring algorithms, which are inspired by the largest-degree-first (LF) and smallest-degree-last (SL) serial heuristics, respectively. We show that although LF and SL, in prac-tice, generate colorings with relatively small numbers of colors, they are vulnerable to adversarial inputs for which any paralleliza-tion yields a poor parallel speedup. In contrast, LLF and SLL allow for provably good speedups on arbitrary inputs while, in practice, producing colorings of competitive quality to their serial analogs. We applied LLF and SLL to the parallel greedy coloring algo-rithm introduced by Jones and Plassmann, referred to here as JP. Jones and Plassman analyze the variant of JP that processes the ver-tices of a graph in a random order, and show that on an O(1)-degree graph G = (V,E), this JP-R variant has an expected parallel run-ning time of O(lgV / lg lgV) in a PRAM model. We improve this bound to show, using work-span analysis, that JP-R, augmented to handle arbitrary-degree graphs, colors a graph G = (V,E) with degree ∆ using Θ(V +E) work and O(lgV + lg ∆ ·min{√E,∆+ lg ∆ lgV / lg lgV}) expected span. We prove that JP-LLF and JP-SLL — JP using the LLF and SLL heuristics, respectively — execute with the same asymptotic work as JP-R and only logarith-mically more span while producing higher-quality colorings than JP-R in practice. We engineered an efficient implementation of JP for modern shared-memory multicore computers and evaluated its performance on a machine with 12 Intel Core-i7 (Nehalem) processor cores. Our implementation of JP-LLF achieves a geometric-mean speedup of 7.83 on eight real-world graphs and a geometric-mean speedup of 8.08 on ten synthetic graphs, while our implementation using SLL achieves a geometric-mean speedup of 5.36 on these real-world graphs and a geometric-mean speedup of 7.02 on these synthetic graphs. Furthermore, on one processor, JP-LLF is slightly faster than a well-engineered serial greedy algorithm using LF, and like-wise, JP-SLL is slightly faster than the greedy algorithm using SL

    A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication

    Get PDF
    The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially and behaves in accordance with the Roofline model. Outliers are discussed and analyzed in detail. While we focus on SymmSpMV in this paper, our algorithm and software is applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring

    Utilización del paralelismo multihebra en el precondicionado y la resolución iterativa de sistemas lineales dispersos

    Get PDF
    La resolución eficiente de sistemas de ecuaciones lineales dispersos y de gran dimensión es uno de los problemas del álgebra lineal moderna que surge con mayor frecuencia en aplicaciones científicas e ingenieriles. La incesante demanda de mayor precisión y realismo en las simulaciones requiere el uso de modelos computacionales tridimensionales cada vez más elaborados, lo que se traduce en un aumento del tamaño y complejidad de los sistemas y del tiempo de simulación. La resolución de estos sistemas en un tiempo razonable requiere algoritmos con un alto grado de eficiencia y escalabilidad algorítmica, es decir, resolutores cuyas demandas computacionales y de memoria sólo crezcan moderadamente con el tamaño del sistema, algoritmos y software paralelos capaces de extraer la concurrencia inherente en estos métodos, y arquitecturas de computadores paralelas que dispongan de los suficientes recursos computacionales. En esta línea, el trabajo realizado en la tesis ha afrontado el análisis, desarrollo e implementación de algoritmos paralelos capaces de identificar, extraer y aprovechar eficientemente el paralelismo de tareas disponible en los resolutores algebraicos multinivel de la biblioteca numérica ILUPACK. La tesis demuestra experimentalmente, en el marco de los sistemas de ecuaciones lineales dispersos y de gran dimensión que aparecen ligados a varias EDPs bidimensionales y tridimensionales, que el grado de paralelismo de tareas presente en los métodos numéricos de ILUPACK es suficiente para la ejecución eficiente de implementaciones paralelas de estos métodos sobre multiprocesadores de memoria compartida con un número moderado de procesadores
    corecore