73 research outputs found

    Compilation Support

    No full text
    This thesis describes work done in two areas of compilation support for superscalar processors; register allocation and instruction scheduling. Chapter 1 describes an approach to register allocation for superscalar processors that supports dynamic and speculative out-of-order execution of instructions and guarantees precise interrupts without expensive hardware for managing register usage and maintaining an in-order processor state. The approach is called extended register allocation, and is based on a graph-coloring paradigm for storage allocation first introduced by Chaitin in [2]. Chapter 2 presents a novel approach to performing aggressive instruction scheduling in the context of the superscalar IBM RS/6000 processor architecture[4, 5]. The approach seeks to enhance the instruction-level parallelism visible to the processor by speculatively moving instructions across conditional branches at compile-time, and taking appropriate measures to preserve correct program semantics. Results are presented which indicate that speedups of up to 6 % are achievable on the existing RS/6000 implementation, while performance gains of up to 54 % are possible with simple extensions to the current implementation in conjunction with the aggressive instruction scheduler that has been implemented. Chapter 3 explores the interaction of the register allocation and instruction scheduling [35, 42], and makes an attempt at developing a better understanding of the underlying interdependencies between the two techniques. A novel framework for integrating the two techniques, based on the ideas presented and the concept of coagulation [41,40] is also presented

    Superspeculative Microarchitecture for Beyond AD 2000

    No full text
    C, these machines typically achieve only about 0.5 to 1.5 sustained IPC for real-world programs. Worse yet, most studies indicate that machine efficiency drops even lower as we extrapolate to wider machines. One recent study indicated that although a hypothetical 2-instruction-wide machine achieves IPC in the range of 0.65 to 1.40, a similar, hypothetical, 6-instruction-wide machine will achieve only 1.2 to 2.3 IPC. 1 Such data imply that the current superscalar paradigm is running into rapidly diminishing returns on performance. POTENTIAL NEW PARADIGMS Future billion-transistor chips will inevitably implement machines that are much wider (issue more than four instructions at once) and deeper (have longer pipelines). The question is, how do we harvest additional parallelism proportional to increased machine resources? Several approaches have vocal advocates, each with valid reasons; they are . reconfigurable parallel computing engines; . specializ

    Half-Price Architecture

    No full text
    Current-generation microprocessors are designed to process instructions with one and two source operands at equal cost. Handling two source operands requires multiple ports for each instruction in structures--such as the register file and wakeup logic--which are often in the processor critical timing paths. We argue that these structures are overdesigned since only a small fraction of instructions require two source operands to be processed simultaneously. [n this paper, we propose the half-price architecture that judiciously removes this overdesign by restricting the processor capability to handle two source operands in certain timing-critical cases. Two techniques are proposed and evaluated: one for the wakeup logic is sequential wakeup, which decouples half of the tag matching logic from the wakeup bus to reduce the load capacitance of the bus. The other technique for the register file is sequential register access, which halves the register read ports by sequentially accessing two values using a single port when needed. We show that a pipeline that optimizes scheduling and register access for a single operand achieves nearly the same performance as an ideal base machine that fully handles two operands, with 2.2% (worst case 4.8%) IPC degradation

    Implementing optimizations at decode time

    No full text
    This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder

    Understanding Scheduling Replay Schemes

    No full text
    Modern microprocessors adopt speculative scheduling techniques where instructions are scheduled several clock cycles before they actually execute. Due to this scheduling delay, scheduling misses should be recovered across the multiple levels of dependence chains in order to prevent further unnecessary execution. We explore the design space of various scheduling replay schemes that prevent the propagation of scheduling misses, and find that current and proposed replay schemes do not scale well and require instructions to execute in correct data dependence order, since they track dependences among instructions within the instruction window as a part of the scheduling or execution process. In this paper, we propose token-based selective repla
    • …
    corecore