34 research outputs found

    Symmetric Orderings for Unsymmetric Sparse Matrices

    No full text
    The efficient solution of large sparse systems of linear equations is one of the key tasks in many computationally intensive scientific applications. The parallelization and efficiency of codes for solving these systems heavily depends on the sparsity structure of the matrices. As the variety of different structured sparse matrices is large, there is no uniform method which is well suited for the whole range of sparse matrices

    On efficiently characterizing solutions of linear Diophantine equations and its application to data dependence analysis

    Get PDF
    In this paper we present several sets of mathematical tools for characterizing the solutions of linear Diophantine equations. First, a number of methods are given for reducing the complexity of the computations. Thereafter, we introduce different techniques for determining the exact number of solutions of linear Diophantine equations. Finally, we present a method for extracting efficiently the solutions of such equations. For all these methods, the main focus has been put on their applicability and efficiency for data dependence analysis. Keywords: linear Diophantine equation, data dependence, data locality, dependence test, number theory 1 Introduction The extensive use of parallelism, fast processors and hierarchical memory systems greatly enhance the performance potential for modern architectures. However, compiler designers and programmers face the difficult task of making optimal use of these architectural improvements. One of the most crucial bottlenecks for the performance of..

    A Large-Grain Parallel Sparse System Solver

    No full text
    . The efficiency of solving sparse linear systems on parallel processors and more complex multicluster architectures such as Cedar is greatly enhanced if relatively large grain computational tasks can be assigned to each cluster or processor. The ordering of a system into a bordered block upper triangular form facilitates a reasonable large-grain partitioning. A new algorithm which produces this form for unsymmetric sparse linear systems is considered and the associated factorization algorithm is presented. Computational results are presented for the Cedar multiprocessor. Several techniques have been proposed to solve large sparse systems of linear equations on parallel processors. A key task which determines the effectiveness of these techniques is the identification and exploitation of the computational granularity appropriate for the target multiprocessor architecture. Many algorithms assume special properties such as symmetric positive definiteness or exploit knowledge of the appl..

    The E-BSP Model: Incorporating General Locality and Unbalanced Communication into the BSP Model

    No full text
    . The BSP model was proposed as a step towards general purpose parallel computing. This paper introduces the E-BSP model that extends the BSP model in two ways. First, it provides a way to deal with unbalanced communication patterns, i.e., communication patterns in which the amount of data sent or received by each processor is different. Second, it adds a notion of general locality to the BSP model where the delay of a remote memory access depends on the relative location of the processors in the interconnection network. We use our model to develop several algorithms that improve upon algorithms derived under the BSP model. 1 Introduction It has been stressed by many authors that the emergence of one or a few computational models is essential to the progress of parallel computing [9, 14], because it enables the programmer to write architecture-independent software. Such a model should strike a balance between simplicity of usage and reflectivity of existing parallel architectures. D..

    Compilation Techniques for Sparse Matrix Computations

    No full text
    The problem of compiler optimization of sparse codes is well known and no satisfactory solutions have been found yet. One of the major obstacles is formed by the fact that sparse programs deal explicitly with the particular data structures selected for storing sparse matrices. This explicit data structure handling obscures the functionality of a code to such a degree that the optimization of the code is prohibited, e.g. by the introduction of indirect addressing. The method presented in this paper postpones data structure selection until the compile phase, thereby allowing the compiler to combine code optimization with explicit data structure selection. Not only enables this method the compiler to generate efficient code for sparse computations, also the task of the programmer is greatly reduced in complexity. Index Terms: Compilation Techniques, Optimization, Program Transformations, Restructuring Compilers, Sparse Computations, Sparse Matrices. 1 Introduction A significant part of ..

    Address Reference Generation in a Memory Hierarchy Simulator Environment

    No full text
    Application driven address reference generation is a popular and frequently used technique for the simulation of architectures. These references can be produced by means of the insertion of instrumentation statements in the source code. However, this requires a major rewriting of the application source code under study. To alleviate this disadvantage, this report describes the use of C++ classes and operator overloading minimizing the amount of application code that is affected. The following frequently used sparse matrix application codes are currently used in conjunction with this simulator: sparse matrix (SpM) LUdecomposition, (SpM x SpM) multiplication, with (SpM x V) sparse matrix vector multiply as a special case, and triangular solve. This report demonstrates the use of this simulator for one of these applications. 1 Introduction A hierarchical memory system consists of several storage levels [7, 8, 13, 16, 17]. Each of these levels is faster and smaller than the level below...

    Implementation of Fourier-Motzkin Elimination

    No full text
    Every transformation of a perfectly nested loop consisting of a combination of loop interchanging, loop skewing and loop reversal can be modeled by a linear transformation represented by a unimodular matrix. This modeling offers more flexibility than the traditional step-wise application of loop transformations because we can directly construct a unimodular matrix for a particular goal. In this paper, we present implementation issues arising when this framework is incorporated in a compiler. 1 Introduction Inherent to the application of program transformations in an optimizing or restructuring compiler is the so-called `phase ordering problem', i.e. the problem of finding an effective order in which particular transformations must be applied. This problem is still an important research topic [WS90]. An important step forwards in solving the phase ordering problem has been accomplished by the observation that any combination of the iteration-level loop transformations loop interchangin..

    Abstract A Quantitative Comparison of Parallel Computation Models

    No full text
    This paper experimentally validates performance related is-sues for parallel computation models on several parallel platforms (a MasPar NIP-1 with 1024 processors, a 64-node GCel and a CM-5 of 64 processors). Our work consists of three parts. First, there is an evaluation part in which we investigate whether the models correctly predict the execution time of an algorithm implementation. Unlike previous work, which mostly demonstrated a close match between the measured and predicted running times, this paper shows that there are situations in which the models do not precisely predict the actual execution time of an algorithm imple-mentation. Second, there is a comparison part in which the models are contrasted with each other in order to determine which model induces the fastest algorithms. Finally, there is an eficiency validation part in which the performance of the model derived algorithms are compared with the performance of highly optimized library routines to show the effectiveness of deriving fast algorithms through the formalisms of the models.
    corecore