25 research outputs found
Recommended from our members
Design by transformation : from domain knowledge to optimized program generation
textExpert design knowledge is essential to develop a library of high-performance software. This includes how to implement and parallelize domain operations, how to optimize implementations, and estimates of which implementation choices are best. An expert repeatedly applies his knowledge, often in a rote and tedious way, to develop all of the related functionality expected from a domain-specific library. Expert knowledge is hard to gain and is easily lost over time when an expert forgets or when a new engineer starts developing code. The domain of dense linear algebra (DLA) is a prime example with software that is so well designed that much of experts' important work has become tediously rote in many ways. In this dissertation, we demonstrate how one can encode design knowledge for DLA so it can be automatically applied to generate code as an expert would or to generate better code. Further, the knowledge is encoded for perpetuity, so it can be reused to make implementing functionality on new hardware easier or it can be used to teach how software is designed to a non-expert. We call this approach to software engineering (encoding expert knowledge and automatically applying it) Design by Transformation (DxT). We present our vision, the methodology, a prototype code generation system, and possibilities when applying DxT to the domain of dense linear algebra.Computer Science
Automatic Generation of Efficient Linear Algebra Programs
The level of abstraction at which application experts reason about linear
algebra computations and the level of abstraction used by developers of
high-performance numerical linear algebra libraries do not match. The former is
conveniently captured by high-level languages and libraries such as Matlab and
Eigen, while the latter expresses the kernels included in the BLAS and LAPACK
libraries. Unfortunately, the translation from a high-level computation to an
efficient sequence of kernels is a task, far from trivial, that requires
extensive knowledge of both linear algebra and high-performance computing.
Internally, almost all high-level languages and libraries use efficient
kernels; however, the translation algorithms are too simplistic and thus lead
to a suboptimal use of said kernels, with significant performance losses. In
order to both achieve the productivity that comes with high-level languages,
and make use of the efficiency of low level kernels, we are developing Linnea,
a code generator for linear algebra problems. As input, Linnea takes a
high-level description of a linear algebra problem and produces as output an
efficient sequence of calls to high-performance kernels. In 25 application
problems, the code generated by Linnea always outperforms Matlab, Julia, Eigen
and Armadillo, with speedups up to and exceeding 10x
A Case Study in Mechanically Deriving Dense Linear Algebra Code
Abstract Design by Transformation (DxT) is a top-down approach to mechanically derive high-performance algorithms for dense linear algebra. We use DxT to derive the implementation of a representative matrix operation, two-sided Trmm. We start with a knowledge base of transformations that were encoded for a simpler set of operations, the level-3 BLAS, and add only a few transformations to accommodate the more complex two-sided Trmm. These additions explode the search space of our prototype system, DxTer, requiring the novel techniques defined in this paper to eliminate large segments of the search space that contain suboptimal algorithms. Performance results for the mechanically optimized implementations on 8,192 cores of a BlueGene/P architecture are given
Elemental: A new framework for distributed memory dense matrix computations
Abstract Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since it will likely not be practical to use MPI-based implementations. Thus, this is a good time to review what lessons we have learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves considerably better performance than the previously developed libraries
Elemental: A new framework for distributed memory dense matrix computations
Abstract Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since it will likely not be practical to use MPI-based implementations. Thus, this is a good time to review lessons learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves competitive, if not superior, performance on large clusters (i.e., on two racks o
Recommended from our members
LDRD final report : autotuning for scalable linear algebra.
This report summarizes the progress made as part of a one year lab-directed research and development (LDRD) project to fund the research efforts of Bryan Marker at the University of Texas at Austin. The goal of the project was to develop new techniques for automatically tuning the performance of dense linear algebra kernels. These kernels often represent the majority of computational time in an application. The primary outcome from this work is a demonstration of the value of model driven engineering as an approach to accurately predict and study performance trade-offs for dense linear algebra computations
Making Scientific Computing Libraries Forward Compatible
<p>NSF's Software Infrastructure for Sustained Innovation funds the development of community software in support of scientific computing innovation. A requirement is that the developed software be sustainable. Design-by-Transformation (DxT) is an approach to software development that views libraries not as instantiated in code, but as expert knowledge that is combined with knowledge about a target architecture by a tool (DxTer) that synthesizes the library implementation. We argue that this approach makes libraries to some degree forward compatible in that a (disruptive) new architectural advance can be accommodated by encoding knowledge about that architecture. This is particularly important when bugs are not correctness bugs, but instead performance bugs that affect how fast an answer is obtained and/or how much energy is consumed to compute the answer. DxT allows a human expert to focus on developing the primitives from which libraries are constructed and new insights as opposed to the rote application of known ideas to entire libraries. We summarize our success in the domain of dense linear algebra as evidence of DxT's potential.</p