CMOS technology scaling improves the speed and functionality of microprocessors by reducing
the size of transistors. Static power dissipation also increases as a result of scaling however, and
has been identified as a limiting factor in technology scaling. As current technology approaches
that limit, techniques are required both at the technology-level and in the architecture design
to reduce sub-threshold leakage, which accounts for the majority of static power dissipation.
This thesis presents an approach to predict the idle periods of execution units at runtime and
power-gate them during these periods to eliminate their static power leakage. We exploit similar execution characteristics across loop iterations to build a prediction of the units required
to execute an entire loop from the units used over the first few iterations. The utilisation of
each execution unit is monitored for each iteration, and thresholds are used to determine which
units should be power-gated for the remainder of the loop. Three techniques are presented:
Loop-Directed Mothballing (LDM), Extended Loop-Directed Mothballing (ELDM) and schedule balancing. LDM power-gates execution units only during innermost loops, which are simple
to detect at runtime. ELDM extends this method to all loops using loop entry and exit information gathered offline. The balancing scheduler is developed to balance the types of instruction
issued each cycle, to encourage reuse of execution units and make unnecessary units easier to
detect.
Extensive simulation using traces of 16 benchmarks from the SPEC CPU2006 suite demonstrates that LDM reduces the energy-delay product of our simulated superscalar processor by
10.3%. For traces with a low proportion of executed instructions inside innermost loops, ELDM
improves the energy-delay product by up to 13% by allowing the technique to be applied to
other loops in the trace. Employing schedule balancing with ELDM achieves similar savings,
and simplifies the hardware required to make predictions