810 research outputs found

    Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

    Full text link
    We discuss an approach for solving sparse or dense banded linear systems Ax=b{\bf A} {\bf x} = {\bf b} on a Graphics Processing Unit (GPU) card. The matrix ARN×N{\bf A} \in {\mathbb{R}}^{N \times N} is possibly nonsymmetric and moderately large; i.e., 10000N50000010000 \leq N \leq 500000. The ${\it split\ and\ parallelize}( ({\tt SaP})approachseekstopartitionthematrix) approach seeks to partition the matrix {\bf A}intodiagonalsubblocks into diagonal sub-blocks {\bf A}_i,, i=1,\ldots,P,whichareindependentlyfactoredinparallel.Thesolutionmaychoosetoconsiderortoignorethematricesthatcouplethediagonalsubblocks, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks {\bf A}_i.Thisapproach,alongwiththeKrylovsubspacebasediterativemethodthatitpreconditions,areimplementedinasolvercalled. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called {\tt SaP::GPU},whichiscomparedintermsofefficiencywiththreecommonlyusedsparsedirectsolvers:, which is compared in terms of efficiency with three commonly used sparse direct solvers: {\tt PARDISO},, {\tt SuperLU},and, and {\tt MUMPS}.. {\tt SaP::GPU},whichrunsentirelyontheGPUexceptseveralstagesinvolvedinpreliminaryrowcolumnpermutations,isrobustandcompareswellintermsofefficiencywiththeaforementioneddirectsolvers.InacomparisonagainstIntels, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel's {\tt MKL},, {\tt SaP::GPU}alsofareswellwhenusedtosolvedensebandedsystemsthatareclosetobeingdiagonallydominant. also fares well when used to solve dense banded systems that are close to being diagonally dominant. {\tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.Comment: 38 page

    Efficient approximation of functions of some large matrices by partial fraction expansions

    Full text link
    Some important applicative problems require the evaluation of functions Ψ\Psi of large and sparse and/or \emph{localized} matrices AA. Popular and interesting techniques for computing Ψ(A)\Psi(A) and Ψ(A)v\Psi(A)\mathbf{v}, where v\mathbf{v} is a vector, are based on partial fraction expansions. However, some of these techniques require solving several linear systems whose matrices differ from AA by a complex multiple of the identity matrix II for computing Ψ(A)v\Psi(A)\mathbf{v} or require inverting sequences of matrices with the same characteristics for computing Ψ(A)\Psi(A). Here we study the use and the convergence of a recent technique for generating sequences of incomplete factorizations of matrices in order to face with both these issues. The solution of the sequences of linear systems and approximate matrix inversions above can be computed efficiently provided that A1A^{-1} shows certain decay properties. These strategies have good parallel potentialities. Our claims are confirmed by numerical tests

    Parallel scalable PDE-constrained optimization: antenna identification in hyperthermia cancer treatment planning

    Get PDF
    We present aPDE-constrained optimization algorithm which is designed for parallel scalability on distributed-memory architectures with thousands of cores. The method is based on aline-search interior-point algorithm for large-scale continuous optimization, it is matrix-free in that it does not require the factorization of derivative matrices. Instead, it uses anew parallel and robust iterative linear solver on distributed-memory architectures. We will show almost linear parallel scalability results for the complete optimization problem, which is anew emerging important biomedical application and is related to antenna identification in hyperthermia cancer treatment plannin

    Load-balanced parallel banded-system solvers

    Get PDF
    AbstractSolving banded systems is important in the applications of science and engineering. This paper presents a load-balancing strategy for solving banded systems in parallel when the number of processors used is small. An optimization-based load-balancing analysis is given to determine how many loads should be assigned to each processor in order to minimize the time requirement. Some experimentations are carried out on the nCUBE 2E multiprocessor to demonstrate the speedup advantage of the proposed load-balancing strategy. The speedup improvement ratio ranges from 47% to 66% (from 12% to 24%) when using 4 (8) processors

    Data Structures and Algorithms for Efficient Solution of Simultaneous Linear Equations from 3-D Ice Sheet Models

    Get PDF
    Two current software packages for solving large systems of sparse simultaneous l~neare equations are evaluated in terms of their applicability to solving systems of equations generated by the University of Maine Ice Sheet Model. SuperLU, the first package, has been developed by researchers at the University of California at Berkeley and the Lawrence Berkeley National Laboratory. UMFPACK, the second package, has been developed by T. A. Davis of the University of Florida who has ties with the U. C. Berkeley researchers as well as European researchers. Both packages are direct solvers that use LU factorization with forward and backward substitution. The University of Maine Ice Sheet Model uses the finite element method to solve partial differential equations that describe ice thickness, velocity,and temperature throughout glaciers as functions of position and t~me. The finite element method generates systems of linear equations having tens of thousands of variables and one hundred or so non-zero coefficients per equation. Matrices representing these systems of equations may be strictly banded or banded with right and lower borders. In order to efficiently Interface the software packages with the ice sheet model, a modified compressed column data structure and supporting routines were designed and written. The data structure interfaces directly with both software packages and allows the ice sheet model to access matrix coefficients by row and column number in roughly 100 nanoseconds while only storing non-zero entries of the matrix. No a priori knowledge of the matrix\u27s sparsity pattern is required. Both software packages were tested with matrices produced by the model and performance characteristics were measured arid compared with banded Gaussian elimination. When combined with high performance basic linear algebra subprograms (BLAS), the packages are as much as 5 to 7 times faster than banded Gaussian elimination. The BLAS produced by K. Goto of the University of Texas was used. Memory usage by the packages varted from slightly more than banded Gaussian elimination with UMFPACK, to as much as a 40% savings with SuperLU. In addition, the packages provide componentwise backward error measures and estimates of the matrix\u27s condition number. SuperLU is available for parallel computers as well as single processor computers. UMPACK is only for single processor computers. Both packages are also capable of efficiently solving the bordered matrix problem

    An Algorithmic Approach for Stability of an Autonomous System

    Get PDF
    Many phenomena in biology can be modeled as a system of first order differential equations x = ax + by, y=cx+dy. An example of such a system is the prey-predator model. To interpret the results we have to obtain full information on the system of equations such as the stability of the equilibrium points of the system. This requires in depth knowledge of differential equations. The literature often emphasizes on the analytical methods to obtain results regarding the stability of the equilibrium points. This is possible to achieve for small systems such as a 2 x 2 system. The non-mathematician researchers often do not have the analytical tools to understand the model fully. Very often what they are interested in is the information regarding the critical points and their stability without going through the tedious mathematical analysis. This calls for user friendly tools for the non-mathematicians to use in order to answer their problem at hand. The objective of this research is to establish an algorithm to determine the stability of a more general system. By doing so we will be able to help those who are not familiar with analytical methods to establish stability of systems at hand The following algorithm is' employed in developing the software: L 1. Search for critical point is conducted. L2. Eigenvalues of the linear system are computed. These values are obtained from the characteristic equation IA - All = 0 , where A. is an eigenvalue and F or the nonlinear system, linearization process around the critical points are carried out. L3. Stability of system is determined. L4. Trajectory of the system is plotted in the phase plane. To develop the software we use the C programming language. It is hoped that the software developed will be of help to researchers in the field of mathematical biology to understand the concept of stability in their model
    corecore