Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine

Abstract

Advances in VLSI technology will enable chips with over a billion transistors within the next decade. Unfortunately, the centralized-resource architectures of modern microprocessors are illsuited to exploit such advances. Achieving a high level of parallelism at a reasonable clock speed requires distributing the processor resources -- a trend already visible in the dual-register-file architecture of the Alpha 21264. A Raw microprocessor takes an extreme position in this space by distributing all its resources such as instruction streams, register files, memory ports, and ALUs over a pipelined two-dimensional interconnect, and exposing them fully to the compiler. Compilation for instruction-level parallelism (ILP) on such distributed-resource machines requires both spatial instruction scheduling and traditional temporal instruction scheduling. This paper describes the techniques used by the Raw compiler to handle these issues. Preliminary results from a SUIF-based compiler for sequential programs written in C and Fortran indicate that the Raw approach to exploiting ILP can achieve speedups scalable with the number of processors for applications with such parallelism. The Raw architecture attempts to provide performance that is at least comparable to that provided by scaling an existing architecture, but that can achieve orders of magnitude improvement in performance for applications with a large amount of parallelism. This paper offers some positive results in this direction

    Similar works

    Full text

    thumbnail-image