231 research outputs found

    High-order div- and quasi curl-conforming basis functions for calderon multiplicative preconditioning of the EFIE

    Get PDF
    A new high-order Calderon multiplicative preconditioner (HO-CMP) for the electric field integral equation (EFIE) is presented. In contrast to previous CMPs, the proposed preconditioner allows for high-order surface representations and current expansions by using a novel set of high-order quasi curl-conforming basis functions. Like its predecessors, the HO-CMP can be seamlessly integrated into existing EFIE codes. Numerical results demonstrate that the linear systems of equations obtained using the proposed HO-CMP converge rapidly, regardless of the mesh density and of the order of the current expansion

    Nonstationary two-stage multisplitting methods for symmetric positive definite matrices

    Get PDF
    AbstractNonstationary synchronous two-stage multisplitting methods for the solution of the symmetric positive definite linear system of equations are considered. The convergence properties of these methods are studied. Relaxed variants are also discussed. The main tool for the construction of the two-stage multisplitting and related theoretical investigation is the diagonally compensated reduction (cf. [1])

    Optimization of a parallel Monte Carlo method for linear algebra problems

    Get PDF
    Many problems in science and engineering can be represented by Systems of Linear Algebraic Equations (SLAEs). Numerical methods such as direct or iterative ones are used to solve these kind of systems. Depending on the size and other factors that characterize these systems they can be sometimes very difficult to solve even for iterative methods, requiring long time and large amounts of computational resources. In these cases a preconditioning approach should be applied. Preconditioning is a technique used to transform a SLAE into a equivalent but simpler system which requires less time and effort to be solved. The matrix which performs such transformation is called the preconditioner [7]. There are preconditioners for both direct and iterative methods but they are more commonly used among the later ones. In the general case a preconditioned system will require less effort to be solved than the original one. For example, when an iterative method is being used, less iterations will be required or each iteration will require less time, depending on the quality and the efficiency of the preconditioner. There are different classes of preconditioners but we will focused only on those that are based on the SParse Approximate Inverse (SPAI) approach. These algorithms are based on the fact that the approximate inverse of a given SLAE matrix can be used to approximate its result or to reduce its complexity. Monte Carlo methods are probabilistic methods, that use random numbers to either simulate a stochastic behaviour or to estimate the solution of a problem. They are good candidates for parallelization due to the fact that many independent samples are used to estimate the solution. These samples can be calculated in parallel, thereby speeding up the solution finding process [27]. In the past there has been a lot of research around the use of Monte Carlo methods to calculate SPAI preconditioners [1] [27] [10]. In this work we present the implementation of a SPAI preconditioner that is based on a Monte Carlo method. This algorithm calculates the matrix inverse by sampling a random variable which approximates the Neumann Series expansion. Using the Neumman series it is possible to calculate the matrix inverse of a system A by performing consecutive additions of the powers of a matrix expressed by the series expansion of (I − A) −1 . Given the stochastic approach of the Monte Carlo algorithm, the computational effort required to find an element of the inverse matrix is independent from the size of the matrix. This allows to target systems that, due to their size, can be prohibitive for common deterministic approaches [27]. Great part of this work is focused on the enhancement of this algorithm. First, the current errors of the implementation were fixed, making the algorithm able to target larger systems. Then multiple optimizations were applied at different stages of the implementation making a better use of the resources and improving the performance of the algorithm. Four optimizations, with consistently improvements have been performed: 1. An inefficient implementation of the realloc function within the MPI library was provoking the application to rapidly run out of memory. This function was replaced by the malloc function and some slight modifications to estimate the size of matrix A. 2. A coordinate format (COO) was introduced within the algorithm’s core to make a more efficient use of the memory, avoiding several unnecessary memory accesses. 3. A method to produce an intermediate matrix P was shown to produce similar results to the default one and with matrix P being reduced to a single vector, thus requiring less data. Given that this was a broadcast data a diminishing on it, translated into a reduction of the broadcast time. 4. Four individual procedures which accessed the whole initial matrix memory, were merged into two processes, reducing this way the number of memory accesses. For each optimization applied, a comparison was performed to show the particular improvements achieved. A set of different matrices, representing different SLAEs, was used to show the consistency of these improvements. In order to provide with insights about the scalability issues of the algorithm, other approaches are presented to show the particularities of the algorithm’s scalability: 1. Given that the original version of this algorithm was designed for a cluster of single-core machines, an hybrid approach of MPI + openMP was proposed to target the nowadays multi-core architectures. Surprisingly this new approach did not show any improvement but it was useful to show a scalability problem related to the random pattern used to access the memory. 2. Having that common MPI implementations of the broadcast operation do not take into account the different latencies between inter-node and intra-node communications [25]. Therefore, we decided to implement the broadcast in two steps. First by reaching a single process in each of the compute nodes and then using those processes to perform a local broadcast within their compute nodes. Results on this approach showed that this method could lead to improvements when very big systems are used. Finally a comparison is carried out between the optimized version of the Monte Carlo algorithm and the state of the art Modified SPAI (MSPAI). Four metrics are used to compare these approaches: 1. The amount of time needed for the preconditioner construction. 2. The time needed by the solver to calculate the solution of the preconditioned system. 3. The addition of the previous metrics, which gives a overview of the quality and efficiency of the preconditioner. 4. The number of cores used in the preconditioner construction. This gives an idea of the energy efficiency of the algorithm. Results from previous comparison showed that Monte Carlo algorithm can deal with both symmetric and nonsymmetric matrices while MSPAI only performs well with the nonsymetric ones. Furthermore the time for Monte Carlo’s algorithm is always faster for the preconditioner construction and most of the times also for the solver calculation. This means that Monte Carlo produces preconditioners of better or same quality than MSPAI. Finally, the number of cores used in the Monte Carlo approach is always equal or smaller than in the case of MSPAI

    A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems

    Get PDF

    Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning

    Get PDF
    When using incomplete factorization preconditioners with an iterative method to solve large sparse linear systems, each application of the preconditioner involves solving two sparse triangular systems. These triangular systems are challenging to solve efficiently on computers with high levels of concurrency. On such computers, it has recently been proposed to use Jacobi iterations, which are highly parallel, to approximately solve the triangular systems from incomplete factorizations. The effectiveness of this approach, however, is problem-dependent: the Jacobi iterations may not always converge quickly enough for all problems. Thus, as a necessary and important step to evaluate this approach, we experimentally test the approach on a large number of realistic symmetric positive definite problems. We also show that by using block Jacobi iterations, we can extend the range of problems for which such an approach can be effective. For block Jacobi iterations, it is essential for the blocking to be cognizant of the matrix structure

    Preconditioning for Sparse Linear Systems at the Dawn of the 21st Century: History, Current Developments, and Future Perspectives

    Get PDF
    Iterative methods are currently the solvers of choice for large sparse linear systems of equations. However, it is well known that the key factor for accelerating, or even allowing for, convergence is the preconditioner. The research on preconditioning techniques has characterized the last two decades. Nowadays, there are a number of different options to be considered when choosing the most appropriate preconditioner for the specific problem at hand. The present work provides an overview of the most popular algorithms available today, emphasizing the respective merits and limitations. The overview is restricted to algebraic preconditioners, that is, general-purpose algorithms requiring the knowledge of the system matrix only, independently of the specific problem it arises from. Along with the traditional distinction between incomplete factorizations and approximate inverses, the most recent developments are considered, including the scalable multigrid and parallel approaches which represent the current frontier of research. A separate section devoted to saddle-point problems, which arise in many different applications, closes the paper

    Computational and numerical aspects of full waveform seismic inversion

    Get PDF
    Full-waveform inversion (FWI) is a nonlinear optimisation procedure, seeking to match synthetically-generated seismograms with those observed in field data by iteratively updating a model of the subsurface seismic parameters, typically compressional wave (P-wave) velocity. Advances in high-performance computing have made FWI of 3-dimensional models feasible, but the low sensitivity of the objective function to deeper, low-wavenumber components of velocity makes these difficult to recover using FWI relative to more traditional, less automated, techniques. While the use of inadequate physics during the synthetic modelling stage is a contributing factor, I propose that this weakness is substantially one of ill-conditioning, and that efforts to remedy it should focus on the development of both more efficient seismic modelling techniques, and more sophisticated preconditioners for the optimisation iterations. I demonstrate that the problem of poor low-wavenumber velocity recovery can be reproduced in an analogous one-dimensional inversion problem, and that in this case it can be remedied by making full use of the available curvature information, in the form of the Hessian matrix. In two or three dimensions, this curvature information is prohibitively expensive to obtain and store as part of an inversion procedure. I obtain the complete Hessian matrices for a realistically-sized, two-dimensional, towed-streamer inversion problem at several stages during the inversion and link properties of these matrices to the behaviour of the inversion. Based on these observations, I propose a method for approximating the action of the Hessian and suggest it as a path forward for more sophisticated preconditioning of the inversion process.Open Acces

    Parallel Runge-Kutta-Nyström methods

    Get PDF
    • …
    corecore