2 research outputs found

    Optimization of Triangular and Banded Matrix Operations Using 2d-Packed Layouts

    Get PDF
    International audienceOver the past few years, multicore systems have become more and more powerful and, thereby, very useful in high-performance computing. However, many applications, such as some linear algebra algorithms, still cannot take full advantage of these systems. This is mainly due to the shortage of optimization techniques dealing with irregular control structures. In particular, the well-known polyhedral model fails to optimize loop nests whose bounds and/or array references are not affine functions. This is more likely to occur when handling sparse matrices in their packed formats. In this paper, we propose to use 2d-packed layouts and simple affine transformations to enable optimization of triangular and banded matrix operations. The benefit of our proposal is shown through an experimental study over a set of linear algebra benchmarks

    A Scalable Parallel Block Algorithm for Band Cholesky Factorization

    No full text
    In this paper, we present an algorithm for computing the Cholesky factorization of large banded matrices on the IBM distributed memory parallel machines. The algorithm aims at optimizing the single node performance and minimizing the communication overheads. An important result of our paper is that the proposed algorithm is strongly scalable. As the bandwidth of the matrix increases, the number of processors that can be efficiently utilized has a quadratic relationship. 1 Introduction Many of the matrices arising from large scientific applications have a banded structure. Banded solvers, as opposed to dense solvers, are difficult to parallelize because of their lower computation to communication ratios. Therefore, algorithms for such problems need to employ special techniques to reduce communication overheads. Several researchers have investigated parallel algorithms for solving band systems [2, 3, 4, 5, 6, 7, 8]. Most of these algorithms are either developed for special purpose arch..
    corecore