148 research outputs found
Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards
We discuss an approach for solving sparse or dense banded linear systems
on a Graphics Processing Unit (GPU) card. The
matrix is possibly nonsymmetric and
moderately large; i.e., . The ${\it split\ and\
parallelize}{\tt SaP}{\bf A}{\bf A}_ii=1,\ldots,P{\bf A}_i{\tt SaP::GPU}{\tt PARDISO}{\tt SuperLU}{\tt MUMPS}{\tt SaP::GPU}{\tt MKL}{\tt SaP::GPU}{\tt SaP::GPU}$ is publicly available and distributed as
open source under a permissive BSD3 license.Comment: 38 page
Load-balanced parallel banded-system solvers
AbstractSolving banded systems is important in the applications of science and engineering. This paper presents a load-balancing strategy for solving banded systems in parallel when the number of processors used is small. An optimization-based load-balancing analysis is given to determine how many loads should be assigned to each processor in order to minimize the time requirement. Some experimentations are carried out on the nCUBE 2E multiprocessor to demonstrate the speedup advantage of the proposed load-balancing strategy. The speedup improvement ratio ranges from 47% to 66% (from 12% to 24%) when using 4 (8) processors
Using parallel banded linear system solvers in generalized eigenvalue problems
Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
A high-accuracy optical linear algebra processor for finite element applications
Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
- …