6 research outputs found

    An efficient algorithm for pointer-to-array access conversion for compiling and optimizing DSP applications

    Full text link

    Automatic Parallelization of Affine Loops using Dependence and Cache analysis in a Binary Rewriter

    Get PDF
    Today, nearly all general-purpose computers are parallel, but nearly all software running on them is serial. Bridging this disconnect by manually rewriting source code in parallel is prohibitively expensive. Automatic parallelization technology is therefore an attractive alternative. We present a method to perform automatic parallelization in a binary rewriter. The input to the binary rewriter is the serial binary executable program and the output is a parallel binary executable. The advantages of parallelization in a binary rewriter versus a compiler include (i) compatibility with all compilers and languages; (ii) high economic feasibility from avoiding repeated compiler implementation; (iii) applicability to legacy binaries; and (iv) applicability to assembly-language programs. Adapting existing parallelizing compiler methods that work on source code to work on binary programs instead is a significant challenge. This is primarily because symbolic and array index information used in existing compiler parallelizers is not available in a binary. We show how to adapt existing parallelization methods to achieve equivalent parallelization from a binary without such information. We have also designed a affine cache reuse model that works inside a binary rewriter building on the parallelization techniques. It quantifies cache reuse in terms of the number of cache lines that will be required when a loop dimension is considered for the innermost position in a loop nest. This cache metric can be used to reason about affine code that results when affine code is transformed using affine transformations. Hence, it can be used to evaluate candidate transformation sequences to improve run-time directly from a binary. Results using our x86 binary rewriter called SecondWrite on a suite of dense- matrix regular programs from Polybench suite of benchmarks shows an geomean speedup of 6.81X from binary and 8.9X from source with 8 threads compared to the input serial binary on a x86 Xeon E5530 machine; and 8.31X from binary and 9.86X from source with 24 threads compared to the input serial binary on a x86 E7450 machine. Such regular loops are an important component of scientific and multi- media workloads, and are even present to a limited extent in otherwise non-regular programs. Further in this thesis we present a novel algorithm that enhances the past techniques significantly for loops with unknown loop bounds by guessing the loop bounds using only the memory expressions present in a loop. It then inserts run-time checks to see if these guesses were indeed correct and if correct executes the parallel version of the loop, else the serial version executes. These techniques are applied to the large affine benchmarks in SPEC2006 and OMP2001 and unlike previous methods the speedups from binary are as good as from source. We also present results on the number of loops parallelized directly from a binary with and without this algorithm. Among the 8 affine benchmarks among these suites, the best existing binary parallelization method achieves an geo-mean speedup of 1.33X, whereas our method achieves a speedup of 2.96X. This is close to the speedup from source code of 2.8X

    Aplicaciones del cómputo científico: mantenimiento del software heredado

    Get PDF
    Las aplicaciones de cómputo científico pueden considerarse como el tipo de software más longevo que haya sido creado. Hoy en día se pueden encontrar grandes referentes de este tipo de software diseminado en varias disciplinas de la ciencia, como Física, Química, Matemáticas, Biología, Economía, etc. Uno de los ejemplos más vigentes en la actualidad son los llamados Modelos Climáticos Globales o Global Climate Models (en inglés) utilizados para el estudio climático. Los científicos han desarrollado software desde la aparición de los primeros lenguajes de programación ya hace mas de 76 años. Fortran es el primer lenguaje de alto nivel creado, el primer lenguaje en tener su propio estándar y el mas utilizado en HPC junto con C. En la tesis se introduce una nueva metodología de desarrollo de software llamada Change Driven Development (CDD), creada inicialmente para el proceso de mantenimiento, basada tres aspectos: aspectos esenciales del software (el cambio), herramientas de desarrollo altamente integradas y transformaciones de código fuente(restructuring y refactoring). En la misma se describe detalladamente la metodología y se valida mediante 4 casos de estudios de diversa índole.Facultad de Informátic

    Restructuring Fortran Programs for Cedar

    No full text
    This paper reports on the status of the Fortran translator for the Cedar computer at the end of March, 1991. A brief description of the Cedar Fortran language is followed by a discussion of the fortran77 to Cedar Fortran parallelizer that describes the techniques currently being implemented. A collection of experiments illustrate the effectiveness of the current implementation, and point toward new approaches to be incorporated into the system in the near future. 1 Introduction The University of Illinois has been a pioneer in the development of program translation techniques for vector and parallel computers since the late 1960s, when Illiac IV was developed. It is therefore natural that automatic parallelization has become one of the major concerns of the Cedar project, the latest machine building effort of the University of Illinois. The Cedar machine is a hierarchical multi-processor. It supports several levels of parallelism and provides data storage at the processor, cluster, an..

    Restructuring Fortran Programs for Cedar

    No full text
    This paper reports on the status of the Fortran translator for the Cedar computer at the end of March, 1991. A brief description of the Cedar Fortran language is followed by a discussion of the fortran77 to Cedar Fortran parallelizer that describes the techniques currently being implemented. A collection of experiments illustrate the effectiveness of the current implementation, and point toward new approaches to be incorporated into the system in the near future
    corecore