10 research outputs found

    Towards a Parallel Tile LDL Factorization for Multicore Architectures

    No full text
    The increasing number of cores in modern architectures requires the development of new algorithms as a means to achieving concurrency and hence scalability. This paper presents an algorithm to compute the LDL T factorization of symmetric indefinite matrices without taking pivoting into consideration. The algorithm, based on the factorizations presented by Buttari et al. [11], represents operations as a sequence of small tasks that operate on square blocks of data. These tasks can be scheduled for execution based on dependencies among them and on computational resources available. This allows an out of order execution of tasks that removes the intrinsically sequential nature of the factorization. Numerical and performance results are presented. Numerical results were limited to matrices for which pivoting is not numerically necessary. A performance comparison between LDL T, Cholesky and LU factorizations and the performance of the kernels required by LDL T, which are an extension of level-3 BLAS kernels, are presented

    Communication-Avoiding Symmetric-Indefinite Factorization

    No full text
    Abstract. We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen’s triangular tridiagonalization. It factors a dense symmetric matrix A as the product A = P LT L T P T where P is a permutation matrix, L is lower triangular, and T is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hierarchy for almost any cache-line size. Adaptations of the algorithm to parallel computers are likely to be communication efficient as well; one such adaptation has been recently published. The current paper describes the algorithm, proves that it is numerically stable, and proves that it is communication optimal. 1

    An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems

    No full text
    International audienceRandomized algorithms are gaining ground in high-performance computing applications as they have the potential to outperform deterministic methods, while still providing accurate results. We propose a randomized solver for distributed multicore architectures to efficiently solve large dense symmetric indefinite linear systems that are encountered, for instance, in parameter estimation problems or electromagnetism simulations. The contribution of this paper is to propose efficient kernels for applying random butterfly transformations and a new distributed implementation combined with a runtime (PaRSEC) that automatically adjusts data structures, data mappings, and the scheduling as systems scale up. Both the parallel distributed solver and the supporting runtime environment are innovative. To our knowledge, the randomization approach associated with this solver has never been used in public domain software for symmetric indefinite systems. The underlying runtime framework allows seamless data mapping and task scheduling, mapping its capabilities to the underlying hardware features of heterogeneous distributed architectures. The performance of our software is similar to that obtained for symmetric positive definite systems, but requires only half the execution time and half the amount of data storage of a general dense solver

    a

    No full text
    a blocked Aasen’s algorithm wit
    corecore