3 research outputs found

    Facilitating superscalar processing via a combined static/dynamic register renaming scheme

    No full text
    A superscalar implementation of a conventional in-struction set architecture (ISA) requires N(N- 1) com-parators to determine dependencies between the N in-structions issuing concurrently [2] and 2N register file read ports to handle the 2 operands that each instruc-tion can potentially source. On the other hand, if the compiler is allowed to specify part of the renaming tag, we show that we can eliminate the comparators needed to detect data dependencies between instructions issu-ing concurrently, and we can reduce the number of read ports from 16 to about 7 without losing performance. Finally, we show that this approach more efficiently im-plements predicated execution than can be done with a convent ional ISA on a machine that renames registers

    Facilitating superscalar processing via a combined static/dynamic register renaming scheme

    No full text

    An energy efficient dependence driven scalable dispatch scheme

    Get PDF
    This thesis proposes two schemes that target superscalar microarchitectures. The first scheme aims at alleviating the complexity of the dispatch logic by reducing the number of ports to the renamer. The second scheme targets L-1 data cache energy reduction by proposing a cache architecture incorporating IPC-aware dynamic associativity management. The dispatch stage constitutes a key critical path in the scalability of current generation superscalar processors. The rename map table (RMT) access and the dependence check logic (DCL) delays scale unfavorably with the dispatch width (DW) of a superscalar processor. It is a well-known program property that the results of most instructions are consumed within the following 4-6 instruction window. This program behavior can be exploited to reduce the rename delay by reducing the number of read/write ports in the RMT to significantly below the current 3xDW. We propose an algorithm to dynamically allocate reduced number of RMT ports to instructions in the current dispatch window, matching dispatch resources to average needs rather than peak needs. This results in shorter RMT access delays as well as in lower energy in the dispatch stage. The IPC reduction due to rename map table read/write port contention in the proposed scheme stays within 2-4%. The cycle time saved can also be leveraged to support wider dispatch in the same cycle time in order to offset this degradation. Data caches are designed with higher associativities to support data sets corresponding to peak load/store bandwidths. The second part of this thesis explores a more general design space for dynamic associativity (for a 4-way associative cache, consider 1-way, 2-way, and 4-way associative accesses). We use the actual instruction level parallelism exhibited by the instructions surrounding a given load to classify it as belonging to a particular IPC packet. The lookup schedule is fixed in advance for each IPC classifier. The energy savings over SPEC2000 CPU benchmarks average 28.6% for a 32KB, 4-way, L-1 data cache. The resulting performance (IPC) degradation from the dynamic way schedule is restricted to less than 2.25%, mainly because IPC based placement ends up being an excellent classifier
    corecore