Search CORE

1 research outputs found

An energy efficient dependence driven scalable dispatch scheme

Author: Nadathur SriRam G
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2003
Field of study

This thesis proposes two schemes that target superscalar microarchitectures. The first scheme aims at alleviating the complexity of the dispatch logic by reducing the number of ports to the renamer. The second scheme targets L-1 data cache energy reduction by proposing a cache architecture incorporating IPC-aware dynamic associativity management. The dispatch stage constitutes a key critical path in the scalability of current generation superscalar processors. The rename map table (RMT) access and the dependence check logic (DCL) delays scale unfavorably with the dispatch width (DW) of a superscalar processor. It is a well-known program property that the results of most instructions are consumed within the following 4-6 instruction window. This program behavior can be exploited to reduce the rename delay by reducing the number of read/write ports in the RMT to significantly below the current 3xDW. We propose an algorithm to dynamically allocate reduced number of RMT ports to instructions in the current dispatch window, matching dispatch resources to average needs rather than peak needs. This results in shorter RMT access delays as well as in lower energy in the dispatch stage. The IPC reduction due to rename map table read/write port contention in the proposed scheme stays within 2-4%. The cycle time saved can also be leveraged to support wider dispatch in the same cycle time in order to offset this degradation. Data caches are designed with higher associativities to support data sets corresponding to peak load/store bandwidths. The second part of this thesis explores a more general design space for dynamic associativity (for a 4-way associative cache, consider 1-way, 2-way, and 4-way associative accesses). We use the actual instruction level parallelism exhibited by the instructions surrounding a given load to classify it as belonging to a particular IPC packet. The lookup schedule is fixed in advance for each IPC classifier. The energy savings over SPEC2000 CPU benchmarks average 28.6% for a 32KB, 4-way, L-1 data cache. The resulting performance (IPC) degradation from the dynamic way schedule is restricted to less than 2.25%, mainly because IPC based placement ends up being an excellent classifier

Digital Repository @ Iowa State University (ISU)