81,175 research outputs found

    Efficient ICCG on a shared memory multiprocessor

    Get PDF
    Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially

    Some fast elliptic solvers on parallel architectures and their complexities

    Get PDF
    The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR

    Optimal parallel solution of sparse triangular systems

    Get PDF
    A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees

    Using gauss - Jordan elimination method with CUDA for linear circuit equation systems

    Get PDF
    AbstractMany scientific and engineering problems can use a system of linear equations. In this study, solution of Linear Circuit Equation System (LCES) for an nxn matrix using Compute Unified Device Architecture (CUDA) is described. Solution of LCES is realized on Graphics Processing Unit (GPU) instead of Central Processing Unit (CPU). CUDA is a parallel computing architecture developed by NVIDIA. Linear Circuits include resistance, impedance, capacitance, dependent - independent current sources and DC, AC voltage source. In this study, solutions of circuits that include resistance, independent current sources and DC voltage source have analyzes. Circuit analysis frequently involves solution of linear simultaneous equations that are solved Gauss-Jordan Elimination Method in this study. Gauss-Jordan Elimination is a variant of Gaussian Elimination that a method of solving a linear system equations (Ax=B). Gauss-Jordan Elimination is an algorithm for getting matrices in reduced row echelon form using elementary row operations. Gaussian Elimination has two parts. The first part (Forward Elimination) reduces a given system to triangular form. The second step uses back substitution to find the solution of the triangular echelon form system Because of elements of unknowns column matrix are dependent on each other, second step algorithm is not appropriate for parallel programming. Two parts of Gauss–Jordan Elimination are not like Gaussian Elimination's part so it is preferred. GPU implementation is more faster than solution of linear equation systems on CP

    Ein technologisches Konzept zur Erzeugung adaptiver hierarchischer Netze für FEM-Schemata

    Get PDF
    Adaptive finite element methods for the solution of partial differential equations require effective methods of mesh refinement and coarsening, fast multilevel solvers for the systems of FE equations need a hierarchical structure of the grid. In the paper a technology is presented for the application of irregular hierarchical triangular meshes arising from refinement by only dividing elements into four congruent triangles. The paper describes the necessary data structures and data structure management, the principles and algorithms of refining and coarsening the mesh, and also a specific assembly technique for the FE equations system. Aspects of the parallel implementation on MIMD computers with a message passing communication are included

    Lecture 10: Preconditioned Iterative Methods for Linear Systems

    Get PDF
    Iterative methods for the solution of linear systems of equations – such as stationary, semi-iterative, and Krylov subspace methods – are classical methods taught in numerical analysis courses, but adapting these methods to run efficiently at large-scale on high-performance computers is challenging and a constantly evolving topic. Preconditioners – necessary to aid the convergence of iterative methods – come in many forms, from algebraic to physics-based, are regularly being developed for linear systems from different classes of problems, and similarly are evolving with high-performance computers. This lecture will cover the background and some recent developments on iterative methods and preconditioning in the context of high-performance parallel computers. Topics include asynchronous iterative methods that avoid the potentially high synchronization cost where there are very large numbers of computational threads, parallel sparse approximate inverse preconditioners, parallel incomplete factorization preconditioners and sparse triangular solvers, and preconditioning with hierarchical rank-structured matrices for kernel matrix equations

    A new class of decomposition for inverting asymmetric and indefinite matrices

    Get PDF
    AbstractAn innovative decomposition for inverting a nonsingular, asymmetric, and indefinite matrix [A] of order (n × n) is derived in this paper. The inverse of [A] is written as [A]−1 = [L][D][U] where [L] is a lower triangular matrix, [D] is a diagonal matrix, and [U] is an upper triangular matrix. By the method, the solution of [A]{X} = {B} may be easily and efficiently computed by matrix-vector multiplications as {X} = [L][D][U]{B}. This technique requires a minimal amount of computer memories, and can be easily transformed into parallel procedures with high efficiencies. Performances in inverting asymmetric and indefinite matrices and in solving systems of linear equations on an Alliant/FX8 computer are reported

    The Design and Implementation of a High-Performance Polynomial System Solver

    Get PDF
    This thesis examines the algorithmic and practical challenges of solving systems of polynomial equations. We discuss the design and implementation of triangular decomposition to solve polynomials systems exactly by means of symbolic computation. Incremental triangular decomposition solves one equation from the input list of polynomials at a time. Each step may produce several different components (points, curves, surfaces, etc.) of the solution set. Independent components imply that the solving process may proceed on each component concurrently. This so-called component-level parallelism is a theoretical and practical challenge characterized by irregular parallelism. Parallelism is not an algorithmic property but rather a geometrical property of the particular input system’s solution set. Despite these challenges, we have effectively applied parallel computing to triangular decomposition through the layering and cooperation of many parallel code regions. This parallel computing is supported by our generic object-oriented framework based on the dynamic multithreading paradigm. Meanwhile, the required polynomial algebra is sup- ported by an object-oriented framework for algebraic types which allows type safety and mathematical correctness to be determined at compile-time. Our software is implemented in C/C++ and have extensively tested the implementation for correctness and performance on over 3000 polynomial systems that have arisen in practice. The parallel framework has been re-used in the implementation of Hensel factorization as a parallel pipeline to compute roots of a polynomial with multivariate power series coefficients. Hensel factorization is one step toward computing the non-trivial limit points of quasi-components
    • …
    corecore