25 research outputs found

    Automatic Generation of Efficient Linear Algebra Programs

    Full text link
    The level of abstraction at which application experts reason about linear algebra computations and the level of abstraction used by developers of high-performance numerical linear algebra libraries do not match. The former is conveniently captured by high-level languages and libraries such as Matlab and Eigen, while the latter expresses the kernels included in the BLAS and LAPACK libraries. Unfortunately, the translation from a high-level computation to an efficient sequence of kernels is a task, far from trivial, that requires extensive knowledge of both linear algebra and high-performance computing. Internally, almost all high-level languages and libraries use efficient kernels; however, the translation algorithms are too simplistic and thus lead to a suboptimal use of said kernels, with significant performance losses. In order to both achieve the productivity that comes with high-level languages, and make use of the efficiency of low level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem and produces as output an efficient sequence of calls to high-performance kernels. In 25 application problems, the code generated by Linnea always outperforms Matlab, Julia, Eigen and Armadillo, with speedups up to and exceeding 10x

    A Case Study in Mechanically Deriving Dense Linear Algebra Code

    Get PDF
    Abstract Design by Transformation (DxT) is a top-down approach to mechanically derive high-performance algorithms for dense linear algebra. We use DxT to derive the implementation of a representative matrix operation, two-sided Trmm. We start with a knowledge base of transformations that were encoded for a simpler set of operations, the level-3 BLAS, and add only a few transformations to accommodate the more complex two-sided Trmm. These additions explode the search space of our prototype system, DxTer, requiring the novel techniques defined in this paper to eliminate large segments of the search space that contain suboptimal algorithms. Performance results for the mechanically optimized implementations on 8,192 cores of a BlueGene/P architecture are given

    Elemental: A new framework for distributed memory dense matrix computations

    Get PDF
    Abstract Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since it will likely not be practical to use MPI-based implementations. Thus, this is a good time to review what lessons we have learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves considerably better performance than the previously developed libraries

    Elemental: A new framework for distributed memory dense matrix computations

    Get PDF
    Abstract Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since it will likely not be practical to use MPI-based implementations. Thus, this is a good time to review lessons learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves competitive, if not superior, performance on large clusters (i.e., on two racks o

    Making Scientific Computing Libraries Forward Compatible

    No full text
    <p>NSF's Software Infrastructure for Sustained Innovation funds the development of community software in support of scientific computing innovation. A requirement is that the developed software be sustainable. Design-by-Transformation (DxT) is an approach to software development that views libraries not as instantiated in code, but as expert knowledge that is combined with knowledge about a target architecture by a tool (DxTer) that synthesizes the library implementation. We argue that this approach makes libraries to some degree forward compatible in that a (disruptive) new architectural advance can be accommodated by encoding knowledge about that architecture. This is particularly important when bugs are not correctness bugs, but instead performance bugs that affect how fast an answer is obtained and/or how much energy is consumed to compute the answer. DxT allows a human expert to focus on developing the primitives from which libraries are constructed and new insights as opposed to the rote application of known ideas to entire libraries. We summarize our success in the domain of dense linear algebra as evidence of DxT's potential.</p
    corecore