2 research outputs found

    Loop parallelization: revisiting framework of unimodular transformations

    Get PDF
    The paper extends the framework of linear loop transformations adding a new nonlinear step at the transformation process. The current framework of linear loop transformation cannot identify a significant fraction of parallelism. For this reason, we present a method to complement it with some basic transformations in order to extract the maximum loop parallelism in perfect nested loops with tight recurrences in the dependence graph. The parallelizing algorithm solves the important problem of deciding the set of transformations to apply in order to maximize the degree of parallelism, the number of parallel loops within a loop nest, and presents a way of generating efficient transformed code that exploits coarse grain parallelism on a MIMD systemPeer ReviewedPostprint (published version

    Exploiting parallelism within multidimensional multirate digital signal processing systems

    Get PDF
    The intense requirements for high processing rates of multidimensional Digital Signal Processing systems in practical applications justify the Application Specific Integrated Circuits designs and parallel processing implementations. In this dissertation, we propose novel theories, methodologies and architectures in designing high-performance VLSI implementations for general multidimensional multirate Digital Signal Processing systems by exploiting the parallelism within those applications. To systematically exploit the parallelism within the multidimensional multirate DSP algorithms, we develop novel transformations including (1) nonlinear I/O data space transforms, (2) intercalation transforms, and (3) multidimensional multirate unfolding transforms. These transformations are applied to the algorithms leading to systematic methodologies in high-performance architectural designs. With the novel design methodologies, we develop several architectures with parallel and distributed processing features for implementing multidimensional multirate applications. Experimental results have shown that those architectures are much more efficient in terms of execution time and/or hardware cost compared with existing hardware implementations
    corecore