312 research outputs found
Recommended from our members
Fast finite difference Poisson solvers on heterogeneous architectures
In this paper we propose and evaluate a set of new strategies for the solution of three dimensional separable elliptic problems on CPU–GPU platforms. The numerical solution of the system of linear equations arising when discretizing those operators often represents the most time consuming part of larger simulation codes tackling a variety of physical situations. Incompressible fluid flows, electromagnetic problems, heat transfer and solid mechanic simulations are just a few examples of application areas that require efficient solution strategies for this class of problems. GPU computing has emerged as an attractive alternative to conventional CPUs for many scientific applications. High speedups over CPU implementations have been reported and this trend is expected to continue in the future with improved programming support and tighter CPU–GPU integration. These speedups by no means imply that CPU performance is no longer critical. The conventional CPU-control–GPU-compute pattern used in many applications wastes much of CPU’s computational power. Our proposed parallel implementation of a classical cyclic reduction algorithm to tackle the large linear systems arising from the discretized form of the elliptic problem at hand, schedules computing on both the GPU and the CPUs in a cooperative way. The experimental result demonstrates the effectiveness of this approach
Decoupling and stability of algorithms for boundary value problems
The ordinary differential equations occurring in linear boundary value problems characteristically have both stable and unstable solution modes. Therefore a stable numerical algorithm should avoid both forward and backward integration of solutions on large intervals. It is shown that most methods (like multiple shooting, collocation, invariant imbedding and difference methods) derive their stability from the fact that they all decouple the continuous or the discrete problem sooner or later (for instance when solving a linear system). This decoupling is related to the dichotomy of the ordinary differential equations. In fact it turns out that the inherent initial value instability is an important prerequisite for a stable utilization of the decoupled representations from which the solutions are computed. How this stability is related to the use of the boundary conditions is also investigated
On the parallel solution of parabolic equations
Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented
The numerical solution of sparse matrix equations by fast methods and associated computational techniques
The numerical solution of sparse matrix equations by fast methods and associated computational technique
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
- …