414 research outputs found

    On the Performance Prediction of BLAS-based Tensor Contractions

    Full text link
    Tensor operations are surging as the computational building blocks for a variety of scientific simulations and the development of high-performance kernels for such operations is known to be a challenging task. While for operations on one- and two-dimensional tensors there exist standardized interfaces and highly-optimized libraries (BLAS), for higher dimensional tensors neither standards nor highly-tuned implementations exist yet. In this paper, we consider contractions between two tensors of arbitrary dimensionality and take on the challenge of generating high-performance implementations by resorting to sequences of BLAS kernels. The approach consists in breaking the contraction down into operations that only involve matrices or vectors. Since in general there are many alternative ways of decomposing a contraction, we are able to methodically derive a large family of algorithms. The main contribution of this paper is a systematic methodology to accurately identify the fastest algorithms in the bunch, without executing them. The goal is instead accomplished with the help of a set of cache-aware micro-benchmarks for the underlying BLAS kernels. The predictions we construct from such benchmarks allow us to reliably single out the best-performing algorithms in a tiny fraction of the time taken by the direct execution of the algorithms.Comment: Submitted to PMBS1

    Developing numerical libraries in Java

    Full text link
    The rapid and widespread adoption of Java has created a demand for reliable and reusable mathematical software components to support the growing number of compute-intensive applications now under development, particularly in science and engineering. In this paper we address practical issues of the Java language and environment which have an effect on numerical library design and development. Benchmarks which illustrate the current levels of performance of key numerical kernels on a variety of Java platforms are presented. Finally, a strategy for the development of a fundamental numerical toolkit for Java is proposed and its current status is described.Comment: 11 pages. Revised version of paper presented to the 1998 ACM Conference on Java for High Performance Network Computing. To appear in Concurrency: Practice and Experienc

    Computing the Rank Profile Matrix

    Get PDF
    The row (resp. column) rank profile of a matrix describes the staircase shape of its row (resp. column) echelon form. In an ISSAC'13 paper, we proposed a recursive Gaussian elimination that can compute simultaneously the row and column rank profiles of a matrix as well as those of all of its leading sub-matrices, in the same time as state of the art Gaussian elimination algorithms. Here we first study the conditions making a Gaus-sian elimination algorithm reveal this information. Therefore, we propose the definition of a new matrix invariant, the rank profile matrix, summarizing all information on the row and column rank profiles of all the leading sub-matrices. We also explore the conditions for a Gaussian elimination algorithm to compute all or part of this invariant, through the corresponding PLUQ decomposition. As a consequence, we show that the classical iterative CUP decomposition algorithm can actually be adapted to compute the rank profile matrix. Used, in a Crout variant, as a base-case to our ISSAC'13 implementation, it delivers a significant improvement in efficiency. Second, the row (resp. column) echelon form of a matrix are usually computed via different dedicated triangular decompositions. We show here that, from some PLUQ decompositions, it is possible to recover the row and column echelon forms of a matrix and of any of its leading sub-matrices thanks to an elementary post-processing algorithm

    Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion

    Get PDF
    We describe a new data format for storing triangular, symmetric, and Hermitian matrices called RFPF (Rectangular Full Packed Format). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to represent triangular and symmetric matrices waste nearly half of the storage space but provide high performance via the use of Level 3 BLAS. Standard packed format arrays fully utilize storage (array space) but provide low performance as there is no Level 3 packed BLAS. We combine the good features of packed and full storage using RFPF to obtain high performance via using Level 3 BLAS as RFPF is a standard full format representation. Also, RFPF requires exactly the same minimal storage as packed format. Each LAPACK full and/or packed triangular, symmetric, and Hermitian routine becomes a single new RFPF routine based on eight possible data layouts of RFPF. This new RFPF routine usually consists of two calls to the corresponding LAPACK full format routine and two calls to Level 3 BLAS routines. This means {\it no} new software is required. As examples, we present LAPACK routines for Cholesky factorization, Cholesky solution and Cholesky inverse computation in RFPF to illustrate this new work and to describe its performance on several commonly used computer platforms. Performance of LAPACK full routines using RFPF versus LAPACK full routines using standard format for both serial and SMP parallel processing is about the same while using half the storage. Performance gains are roughly one to a factor of 43 for serial and one to a factor of 97 for SMP parallel times faster using vendor LAPACK full routines with RFPF than with using vendor and/or reference packed routines

    A survey of parallel algorithms for fractal image compression

    Get PDF
    This paper presents a short survey of the key research work that has been undertaken in the application of parallel algorithms for Fractal image compression. The interest in fractal image compression techniques stems from their ability to achieve high compression ratios whilst maintaining a very high quality in the reconstructed image. The main drawback of this compression method is the very high computational cost that is associated with the encoding phase. Consequently, there has been significant interest in exploiting parallel computing architectures in order to speed up this phase, whilst still maintaining the advantageous features of the approach. This paper presents a brief introduction to fractal image compression, including the iterated function system theory upon which it is based, and then reviews the different techniques that have been, and can be, applied in order to parallelize the compression algorithm

    Rectangular Full Packed Format for Cholesky's Algorithm:Factorization, Solution and Inversion

    Get PDF

    Variable-Size Batched Condition Number Calculation on GPUs

    Get PDF

    A review of High Performance Computing foundations for scientists

    Full text link
    The increase of existing computational capabilities has made simulation emerge as a third discipline of Science, lying midway between experimental and purely theoretical branches [1, 2]. Simulation enables the evaluation of quantities which otherwise would not be accessible, helps to improve experiments and provides new insights on systems which are analysed [3-6]. Knowing the fundamentals of computation can be very useful for scientists, for it can help them to improve the performance of their theoretical models and simulations. This review includes some technical essentials that can be useful to this end, and it is devised as a complement for researchers whose education is focused on scientific issues and not on technological respects. In this document we attempt to discuss the fundamentals of High Performance Computing (HPC) [7] in a way which is easy to understand without much previous background. We sketch the way standard computers and supercomputers work, as well as discuss distributed computing and discuss essential aspects to take into account when running scientific calculations in computers.Comment: 33 page
    • 

    corecore