Search CORE

12 research outputs found

Effects of partitioning and scheduling sparse matrix factorization on communication and load balance

Author: Naik Vijay K.
Venugopal Sesh
Publication venue
Publication date
Field of study

A block based, automatic partitioning and scheduling methodology is presented for sparse matrix factorization on distributed memory systems. Using experimental results, this technique is analyzed for communication and load imbalance overhead. To study the performance effects, these overheads were compared with those obtained from a straightforward 'wrap mapped' column assignment scheme. All experimental results were obtained using test sparse matrices from the Harwell-Boeing data set. The results show that there is a communication and load balance tradeoff. The block based method results in lower communication cost whereas the wrap mapped scheme gives better load balance

NASA Technical Reports Server

OPTIMIZATION ISSUES IN FINITE ELEMENT CODES FOR SOLVING OPEN DOMAIN 3D ELECTROMAGNETIC SCATTERING AND CONFORMAL ANTENNA PROBLEMS

Author: Bartsch
Beggs
Dohlus
Jurgens
Kunz
Luebbers
Maloney
Marcuvitz
Railton
Taflove
Taflove
Weiland
Weiland
Yee
Zivanovic
Publication venue: 'Wiley'
Publication date: 01/09/1996
Field of study

The first part of the paper presents the implementation and performance of a new absorbing boundary condition (ABC) for truncating finite element meshes. This ABC can be applied conformally to the surface of the structure for scattering and antenna radiation calculations. Consequently, the computational domain is reduced dramatically, thus allowing the simulation of much larger structures, and results are presented for three-dimensional bodies. The latter part of the paper discusses optimization issues relating to the solver's CPU speed on parallel and vector processors. It is shown that a jagged diagonal storage scheme leads to a four-fold increase in the FLOP rate of the code, and a standard matrix profile reduction algorithm substantially reduces the inter-processor communication.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/50414/1/241_ftp.pd

Crossref

Surrey Research Insight

Deep Blue Documents at the University of Michigan

Run-time parallelization and scheduling of loops

Author: Crowley Kay
Mirchandaney Ravi
Saltz Joel H.
Publication venue
Publication date
Field of study

Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure

NASA Technical Reports Server

Software Support for Irregular and Loosely Synchronous Problems

Author: Choudhary Alok
Fox Geoffrey C.
Hiranandani Seema
Ranka Sanja
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1992
Field of study

A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of such problems. This classification has motivated us to enhance Fortran D to provide language support for irregular, loosely synchronous problems. We present techniques for parallelization of such problems in the context of Fortran D

Syracuse University Research Facility and Collaborative Environment

Software Support for Irregular and Loosely Synchronous Problems

Author: Choudhary Alok
Fox Geoffrey C.
Hiranandani Seema
Ranka Sanjay
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1992
Field of study

Syracuse University Research Facility and Collaborative Environment

Compiler Support for Sparse Tensor Computations in MLIR

Author: Bik Aart J. C.
Kjolstad Fredrik
Koanantakool Penporn
Shpeisman Tatiana
Vasilache Nicolas
Zheng Bixia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/02/2022
Field of study

Sparse tensors arise in problems in science, engineering, machine learning, and data analytics. Programs that operate on such tensors can exploit sparsity to reduce storage requirements and computational time. Developing and maintaining sparse software by hand, however, is a complex and error-prone task. Therefore, we propose treating sparsity as a property of tensors, not a tedious implementation task, and letting a sparse compiler generate sparse code automatically from a sparsity-agnostic definition of the computation. This paper discusses integrating this idea into MLIR

arXiv.org e-Print Archive

Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs

Author: Alappat Christie
Fehske Holger
Hager Georg
Thies Jonas
Wellein Gerhard
Publication venue
Publication date: 05/09/2023
Field of study

Sparse linear iterative solvers are essential for many large-scale simulations. Much of the runtime of these solvers is often spent in the implicit evaluation of matrix polynomials via a sequence of sparse matrix-vector products. A variety of approaches has been proposed to make these polynomial evaluations explicit (i.e., fix the coefficients), e.g., polynomial preconditioners or s-step Krylov methods. Furthermore, it is nowadays a popular practice to approximate triangular solves by a matrix polynomial to increase parallelism. Such algorithms allow to evaluate the polynomial using a so-called matrix power kernel (MPK), which computes the product between a power of a sparse matrix A and a dense vector x, or a related operation. Recently we have shown that using the level-based formulation of sparse matrix-vector multiplications in the Recursive Algebraic Coloring Engine (RACE) framework we can perform temporal cache blocking of MPK to increase its performance. In this work, we demonstrate the application of this cache-blocking optimization in sparse iterative solvers. By integrating the RACE library into the Trilinos framework, we demonstrate the speedups achieved in preconditioned) s-step GMRES, polynomial preconditioners, and algebraic multigrid (AMG). For MPK-dominated algorithms we achieve speedups of up to 3x on modern multi-core compute nodes. For algorithms with moderate contributions from subspace orthogonalization, the gain reduces significantly, which is often caused by the insufficient quality of the orthogonalization routines. Finally, we showcase the application of RACE-accelerated solvers in a real-world wind turbine simulation (Nalu-Wind) and highlight the new opportunities and perspectives opened up by RACE as a cache-blocking technique for MPK-enabled sparse solvers.Comment: 25 pages, 11 figures, 3 table

arXiv.org e-Print Archive