71,963 research outputs found

    Parallel Direct Solver for the Finite Integration Technique in Electrokinetic Problems

    Get PDF
    International audienceThe finite integration technique allows the simulation of real-world electromagnetic field problems with complex geometries. It provides a discrete reformulation of Maxwell's equations in their integral form suitable for numerical computing. The resulting matrix equations of the discretized fields can be used for efficient numerical simulations on modern computers and can be exploited to use a parallel computing. In fact, by reordering the unknowns by the nested dissection method, it is possible to directly construct the lower triangular matrix of the Cholesky factorization with many processors without assembling the matrix system. In this paper, a parallel algorithm is proposed for the direct solution of large sparse linear systems with the finite integration technique. This direct solver has the advantage of handling singularities in the matrix of linear systems. The computational effort for these linear systems, often encountered in numerical simulation of electromagnetic phenomena by finite integration technique, is very significant in terms of run-time and memory requirements. Many numerical tests have been carried out to evaluate the performance of the parallel direct solver. Index Terms—Finite element methods, finite integration technique, linear systems, numerical analysis, parallel algorithms

    From hybrid architectures to hybrid solvers

    Get PDF
    International audienceSolving large sparse systems of linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. Consequently, many parallel techniques for sparse matrix solution have been studied, designed and implemented based on factorization or hybrid iterative-direct approaches. In this context, graph partitioning and nested dissection ideas have played a crucial role. The main goal of this presentation will be to give an overview of the continuum between these various algorithmic approaches and to present the improvements of the algorithms and of the associated parallel implementations in a manycore context. Numerical experiments on large irregular real-life problems will illustrate this work

    Partitioning, Ordering, and Load Balancing in a Hierarchically Parallel Hybrid Linear Solver

    Get PDF
    Institut National Polytechnique de Toulouse, RT-APO-12-2PDSLin is a general-purpose algebraic parallel hybrid (direct/iterative) linear solver based on the Schur complement method. The most challenging step of the solver is the computation of a preconditioner based on an approximate global Schur complement. We investigate two combinatorial problems to enhance PDSLin's performance at this step. The first is a multi-constraint partitioning problem to balance the workload while computing the preconditioner in parallel. For this, we describe and evaluate a number of graph and hypergraph partitioning algorithms to satisfy our particular objective and constraints. The second problem is to reorder the sparse right-hand side vectors to improve the data access locality during the parallel solution of a sparse triangular system with multiple right-hand sides. This is to speed up the process of eliminating the unknowns associated with the interface. We study two reordering techniques: one based on a postordering of the elimination tree and the other based on a hypergraph partitioning. To demonstrate the effect of these techniques on the performance of PDSLin, we present the numerical results of solving large-scale linear systems arising from two applications of our interest: numerical simulations of modeling accelerator cavities and of modeling fusion devices

    Parallel triangular solution in the out-of-core multifrontal approach for solving large sparse linear systems

    Get PDF
    We consider the solution of very large systems of linear equations with direct multifrontal methods. In this context the size of the factors is an important limitation for the use of sparse direct solvers. We will thus assume that the factors have been written on the local disks of our target multiprocessor machine during parallel factorization. Our main focus is the study and the design of efficient approaches for the forward and backward substitution phases after a sparse multifrontal factorization. These phases involve sparse triangular solution and have often been neglected in previous works on sparse direct factorization. In many applications, however, the time for the solution can be the main bottleneck for the performance. This thesis consists of two parts. The focus of the first part is on optimizing the out-of-core performance of the solution phase. The focus of the second part is to further improve the performance by exploiting the sparsity of the right-hand side vectors. In the first part, we describe and compare two approaches to access data from the hard disk. We then show that in a parallel environment the task scheduling can strongly influence the performance. We prove that a constraint ordering of the tasks is possible; it does not introduce any deadlock and it improves the performance. Experiments on large real test problems (more than 8 million unknowns) using an out-of-core version of a sparse multifrontal code called MUMPS (MUltifrontal Massively Parallel Solver) are used to analyse the behaviour of our algorithms. In the second part, we are interested in applications with sparse multiple right-hand sides, particularly those with single nonzero entries. The motivating applications arise in electromagnetism and data assimilation. In such applications, we need either to compute the null space of a highly rank deficient matrix or to compute entries in the inverse of a matrix associated with the normal equations of linear least-squares problems. We cast both of these problems as linear systems with multiple right-hand side vectors, each containing a single nonzero entry. We describe, implement and comment on efficient algorithms to reduce the input-output cost during an outof- core execution. We show how the sparsity of the right-hand side can be exploited to limit both the number of operations and the amount of data accessed. The work presented in this thesis has been partially supported by SOLSTICE ANR project (ANR-06-CIS6-010)

    Adapting the interior point method for the solution of LPs on serial, coarse grain parallel and massively parallel computers

    Get PDF
    In this paper we describe a unified scheme for implementing an interior point algorithm (IPM) over a range of computer architectures. In the inner iteration of the IPM a search direction is computed using Newton's method. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system, and the design of data structures to take advantage of serial, coarse grain parallel and massively parallel computer architectures, are considered in detail. We put forward arguments as to why integration of the system within a sparse simplex solver is important and outline how the system is designed to achieve this integration

    Domain Decomposition Based High Performance Parallel Computing\ud

    Get PDF
    The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition techniques. A highly efficient sparse direct solver PARDISO is used in this study. The scalability of both Newton and modified Newton algorithms are tested
    • …
    corecore