15 research outputs found

    Difficult-path branch prediction using subordinate microthreads

    Full text link

    Tango: a Hardware-based Data Prefetching Technique for Superscalar Processors

    No full text
    We present a new hardware-based data prefetching mechanism for enhancing instruction level parallelism and improving the performance of superscalar processors. The emphasis in our scheme is on the effective utilization of slack time and hardware resources not used for the main computation. The scheme suggests a new hardware construct, the Program Progress Graph (PPG), as a simple extension to the Branch Target Buffer (BTB). We use the PPG for implementing a fast pre-program counter, pre-PC, that travels only through memory reference instructions (rather than scanning all the instructions sequentially). In a single clock cycle the pre-PC extracts all the predicted memory references in some future block of instructions, to obtain early data prefetching. In addition, the PPG can be used for implementing a pre-processor and for instruction prefetching. The prefetch requests are scheduled to "tango" with the core requests from the data cache, by using only free time slots on the existing da..

    Tango: a Hardware-based Data Prefetching Technique for Superscalar Processors

    No full text
    We present a new hardware-based data prefetching mechanism for enhancing instruction level parallelism and improving the performance of superscalar processors. The emphasis in our scheme is on the effective utilization of slack (dead) time and hardware resources not used for the main computation. The scheme suggests a new hardware construct, the Program Progress Graph (PPG), as a simple extension to the Branch Target Buffer (BTB). We use the PPG for implementing a fast pre-program counter, pre-PC, that travels only through memory reference instructions (rather than scanning all the instructions sequentially). In a single clock cycle the pre-PC extracts all the predicted memory references in some future block of instructions, to obtain early data prefetching. In addition, the PPG can be used to implement a pre-processor and for instruction prefetching. The prefetch requests are scheduled to "tango" with the core requests from the data cache, by using only free time slots on the existing..
    corecore