Search CORE

10 research outputs found

Recommended from our members

Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning

Author: Chow Edmond
Dongarra Jack
Hartwig Antz
Scott Jennifer
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

When using incomplete factorization preconditioners with an iterative method to solve large sparse linear systems, each application of the preconditioner involves solving two sparse triangular systems. These triangular systems are challenging to solve efficiently on computers with high levels of concurrency. On such computers, it has recently been proposed to use Jacobi iterations, which are highly parallel, to approximately solve the triangular systems from incomplete factorizations. The effectiveness of this approach, however, is problem-dependent: the Jacobi iterations may not always converge quickly enough for all problems. Thus, as a necessary and important step to evaluate this approach, we experimentally test the approach on a large number of realistic symmetric positive definite problems. We also show that by using block Jacobi iterations, we can extend the range of problems for which such an approach can be effective. For block Jacobi iterations, it is essential for the blocking to be cognizant of the matrix structure

Central Archive at the University of Reading

Crossref

KITopen

The University of Manchester - Institutional Repository

ePubs: the open archive for STFC research publications

An efficient parallel algorithm for mesh smoothing

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Parallel adaptive mesh refinement techniques for plasticity problems

Author: Carstensen
Jones
Jones
Jones
Mark T. Jones
Nam-Sua
Nkyak
Paul E. Plassmann
Peric
Tatambe
Wiberg
William J. Barry
Yuge
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning

Author: Anzt Hartwig
Chow Edmond
Dongarra Jack
Scott Jennifer
Publication venue: Elsevier
Publication date: 14/05/2018
Field of study

KITopen

Ordering heuristics for parallel graph coloring

Author: Adams L.
Allwright J. R.
Bertsekas D. P.
Briggs P.
Coleman T.
Diks K.
Garey M.
Herlihy M.
Marx D.
Saad Y.
Welsh D. J. A.
Çatalyürek V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

This paper introduces the largest-log-degree-first (LLF) and smallest-log-degree-last (SLL) ordering heuristics for paral-lel greedy graph-coloring algorithms, which are inspired by the largest-degree-first (LF) and smallest-degree-last (SL) serial heuristics, respectively. We show that although LF and SL, in prac-tice, generate colorings with relatively small numbers of colors, they are vulnerable to adversarial inputs for which any paralleliza-tion yields a poor parallel speedup. In contrast, LLF and SLL allow for provably good speedups on arbitrary inputs while, in practice, producing colorings of competitive quality to their serial analogs. We applied LLF and SLL to the parallel greedy coloring algo-rithm introduced by Jones and Plassmann, referred to here as JP. Jones and Plassman analyze the variant of JP that processes the ver-tices of a graph in a random order, and show that on an O(1)-degree graph G = (V,E), this JP-R variant has an expected parallel run-ning time of O(lgV / lg lgV) in a PRAM model. We improve this bound to show, using work-span analysis, that JP-R, augmented to handle arbitrary-degree graphs, colors a graph G = (V,E) with degree ∆ using Θ(V +E) work and O(lgV + lg ∆ ·min{√E,∆+ lg ∆ lgV / lg lgV}) expected span. We prove that JP-LLF and JP-SLL — JP using the LLF and SLL heuristics, respectively — execute with the same asymptotic work as JP-R and only logarith-mically more span while producing higher-quality colorings than JP-R in practice. We engineered an efficient implementation of JP for modern shared-memory multicore computers and evaluated its performance on a machine with 12 Intel Core-i7 (Nehalem) processor cores. Our implementation of JP-LLF achieves a geometric-mean speedup of 7.83 on eight real-world graphs and a geometric-mean speedup of 8.08 on ten synthetic graphs, while our implementation using SLL achieves a geometric-mean speedup of 5.36 on these real-world graphs and a geometric-mean speedup of 7.02 on these synthetic graphs. Furthermore, on one processor, JP-LLF is slightly faster than a well-engineered serial greedy algorithm using LF, and like-wise, JP-SLL is slightly faster than the greedy algorithm using SL

CiteSeerX

Crossref

A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication

Author: Alappat Christie Louis
Basermann Achim
Bishop Alan R.
Fehske Holger
Hager Georg
Schenk Olaf
Thies Jonas
Wellein Gerhard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/07/2019
Field of study

The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially and behaves in accordance with the Roofline model. Outliers are discussed and analyzed in detail. While we focus on SymmSpMV in this paper, our algorithm and software is applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring

Institute of Transport Research:Publications

arXiv.org e-Print Archive

Utilización del paralelismo multihebra en el precondicionado y la resolución iterativa de sistemas lineales dispersos

Author: Martín Huertas Alberto Fco.
Publication venue: 'Universitat Jaume I'
Publication date: 15/07/2010
Field of study

La resolución eficiente de sistemas de ecuaciones lineales dispersos y de gran dimensión es uno de los problemas del álgebra lineal moderna que surge con mayor frecuencia en aplicaciones científicas e ingenieriles. La incesante demanda de mayor precisión y realismo en las simulaciones requiere el uso de modelos computacionales tridimensionales cada vez más elaborados, lo que se traduce en un aumento del tamaño y complejidad de los sistemas y del tiempo de simulación. La resolución de estos sistemas en un tiempo razonable requiere algoritmos con un alto grado de eficiencia y escalabilidad algorítmica, es decir, resolutores cuyas demandas computacionales y de memoria sólo crezcan moderadamente con el tamaño del sistema, algoritmos y software paralelos capaces de extraer la concurrencia inherente en estos métodos, y arquitecturas de computadores paralelas que dispongan de los suficientes recursos computacionales. En esta línea, el trabajo realizado en la tesis ha afrontado el análisis, desarrollo e implementación de algoritmos paralelos capaces de identificar, extraer y aprovechar eficientemente el paralelismo de tareas disponible en los resolutores algebraicos multinivel de la biblioteca numérica ILUPACK. La tesis demuestra experimentalmente, en el marco de los sistemas de ecuaciones lineales dispersos y de gran dimensión que aparecen ligados a varias EDPs bidimensionales y tridimensionales, que el grado de paralelismo de tareas presente en los métodos numéricos de ILUPACK es suficiente para la ejecución eficiente de implementaciones paralelas de estos métodos sobre multiprocesadores de memoria compartida con un número moderado de procesadores

Tesis Doctorals en Xarxa

Repositori Institucional de la Universitat Jaume I