This paper investigates delay, power and area of critical components in designing energy-eficient control logic. To improve performance and energy e#iciency, a split-slave dual-path (SSDP) register is proposed' which improves the energy eficiency of the prior art by 30%. For multiplexers (MUX), three MUXes are proposed and compared to existing solutions. The proposed MUXes improve performance by 50% or power by 2.2%. The impact of scaling supply voltage alone and scaling threshold voltage with supply voltage on delay andpower is also examined.
Introduction
Registers and multiplexers (MUX) are two of the major functions in finite state machines (FSM), which form a critical part of control logic. It has bieen reported that the control logic of a microprocessor can consume 20% of the total power [2] . With more advanced architecture concepts, such as register renaming and out-of-order execution in a superscalar microprocessor [3] , the control logic will become more complex and its power dissipation will increase.
In addition, to boost processor clock frequency, modem processors typically adopt superpipelinied execution [3] [4], which also makes heavy use of registers. Enhancing a register's speed can either lead to a higher clock rate or allow more logic depth between registers. MUXes are critical as well because they are frequently used in the surrounding logic to select one of many possible sources for the data. MUXes are also frequently used in control logic to perform a select or a decode function.
In this paper, we compare area, speed, and power dissipation of four registers: a conventional register [5], a low-power register 161, a push-pull isolation register [l] and the proposed split-slave dual-path register. Effect on delay, power and energy efficiency from supply voltage scaling is also considered. Three design techniques are proposed to improve area, performance and power of multiplexers: 
Split-Slave Dual-Path Register
To improve performance, not only must the combinational logic block delays be optimized, but the register delays must also be addressed. The energy-efficient registers of choice have historically been a conventional register [5] shown in Fig. l (a) and a low-power register [6] depicted in Fig. l(b) . Recently, a high-efficiency register known as the push-pull isolation (PPI) register shown in Fig. l (c) was proposed [ 13. This PPI register reduces the delay from CLK to Q by inserting a bypass inverter and transmission gate from the master latch.
While the PPI-register is approximately 64% faster than the conventional and low-power registers, its performance can be further improved. When the output Q begins to switch, a momentary voltage contention exists between the two paths connected to Q from the master latch. By eliminating the lower path and slightly restructuring the slave latch, this voltage contention is eliminated. The resulting split-slave dual-path register, SSDP-register, is shown in Fig. l(d) .
Experimental results in Table I indicate that the SSDPregister yields a delay improvement of 28% over the PPIregister and a delay improvement of 55% over the conventional and low-power registers at the same power consumption. Effectively, the SSDP-register is 30% more energy efficient the PPI-register.
Effect of Vdd Scaling on Register Energy
SPICE simulation results for various supply voltages (vdd) using the same device models with constant threshold voltages (V,) are summarized in Figures 2 through 4 . In Fig. 2 , the speed of four registers degrade by a factor of 3.75-3.98 when V&J is scaledl down from 3.5 v to 1.5 v. In Select signals, SO and S I , determine which of the four possible inputs, 11 through 14, is to be sent to the inverting output signal, OUT. The conventional MUX implements a onestage decode of the select lines to enable one of the four transmission gates and requires 38 transistors. One attribute of this implementation is that it minimizes delay from the data inputs to output at the expense of longer delay paths from the select inputs to data output. For the worst case path delay, inputs SO, SI, and Ix have to traverse six, six, and two (6/6/2) transistors to the output (Table U) , respectively.
For a modular design approach, a 4-to-1 MUX can be decomposed into three 2-to-1 MUXes, as shown in Fig.  5(b) . To reduce the transistor count, the 2-to-1 MUX is implemented with two transmission gates and one inverter.
To maintain the attribute of inverting MUX, another inverter has to be added before the output. This leads to a total of 24 transistors for the 4-to-I MUX implementation, which represents a 58% reduction in transistor count relative to the conventional MUX.
To improve both performance and power, the three redundant inverters (IV1, IV2, and N3) in Fig. 5(e) . Compared to the TG-INV-TG MUX, the TG-tristate MUX reduces total power dissipation by about 14% (Table 11) . Although the TG-tristate MUX reduces the worst case delay paths from 4/2/3 of TG-INV-TG MUX to 3/3/2, the stacked MOSFETs in the tristate gates degrades the performance.
Based on the above discussion, it would be desirable to either reduce the power dissipation of the TG-INV-TG MUX or to improve the speed of the TG-TG-INV MUX. The proposed technique is to remove pMOSFETs from the transmission gates of the TG-TG-INV MUX to reduce capacitive loading from junction capacitance and hence shorten the propagation delay. One known issue in CPL is the threshold voltage drop, causing the output inverter to consume static power [7] . As described in [7] , adding a pMOSFET feedback pulls the voltage level of the output inverter to the supply rail and eliminates the static power dissipation. This enhanced-CPL MUX is shown in Fig. 5(f) . 
Effect of V, and Vdd Scaling on MUX Energy
The dependency of the 4-to-1 MUX delay and power on V, and V,, scaling is depicted in Fig. 6 anld Fig. 7 , respectively. Over the voltage range shown in Fig. 6 , the proposed enhanced-CPL MUX is 20-66% faster than the three existing MUXes, conventional MUX, three 2-to-1 MUX, and TG-TG-INV MUX. Another two proposed MUXes, TG-INV-TG MUX and TG-tristate MUX, are also 19-58% faster than the three existing MUX implementations. From a supply voltage of 3.5 V to 1.0 V, the (enhanced-CPL MUX experiences a performance degradation factor of 2.09 while the three existing MUXes suffer by an average factor of 2.27. From Fig. 7 , the proposed enhanced-CPL MUX dissipates 7-52% less power than the other five MUX implementations over the voltage range investigated. Although the proposed TG-INV-TG h4UX and TG-tristate MUX are 1148% faster than the three existing three MUXes (Fig. 6 ), this is achieved at the expense of burning 4-38% more power (Fig. 7) as the implementation exposes internal active gates to data inputs transitions. On average, a power reduction factor of 10.1 is accomplished when V,, is scaled from 3.5 V down to 1.0 V and V, is scaled from 0.7 V to 0.2 v.
Conclusions;
Two of the most critical control logic elements, registers and multiplexers, are investigated in this paper with respect to performance, power, and area. On registers, the split-slave dual-path (SSDP) register has been proposed to improve the performance and reduce the power dissipation. The proposed SSDP-register is capable of yielding a 30% energy efficiency improvement with 28% faster speed over the PPIregister. Compared to the conventional register, the proposed SSDP-register improves the performance and energy efficiency by a factor of 2.3 and 2.4, respectively.
On average, energy utilization of this register improves by a factor of 1.9 when the supply voltage is. scaled from 3.5 V to 1.5 V.
On MUX, the proposed TG-INV-TG MUX is the fastest design. Relative to the three existing MUXes, the TG-INV-TG MUX improves the performance: by 19-50%. The proposed enhanced-CPL MUX dissipates the least amount of power. When compared to the three existing MUXes, it reduces power dissipation by 5-22% and delivers 1 6 4 7 % better performance. With respect to power-delay product, the proposed enhanced-CPL MUX iis the most energyefficient design. It improves the ener,gy efficiency by 22-72% relative to existing designs. 
