This paper explores the deterministic transistor reordering in low-voltage dynamic BiCMOS logic gates, for reducing the dynamic power dissipation. The constraints of load driving (discharging) capability and NPN turn-on delay for MOSFET reordered structures has been carefully considered. Simulations shows significant reduction in the dynamic power dissipation for the transistor reordered BiCMOS structures. The power-delay product figure-of-merit is found to be significantly enhanced without any associated silicon-area penalty. In order to experimentally verify the reduction in power dissipation, original and reordered structures were fabricated using the MOSIS 2 mm N-well analog CMOS process which has a P-base layer for bipolar NPN option. Measured results shows a 20% reduction in the power dissipation for the transistor reordered structure, which is in close agreement with the simulation.
INTRODUCTION
Reordering (or permutation) of elements/primitives in a set/structure is an well-known characteristic in the inner works of nature. Investigative techniques based on this feature is thus a primary research methodology in a multitude of disciplines (for example, Biotechnology, Genetics etc). In digital IC design, topological transformation based on the reordering of transistors in logic gates in order to reduce the power dissipation and/or propagation delay-time has been of contemporary research interest [1, 2] . Recently, several authors [3, 4] have investigated input signal probability based transistor reordering for reducing the dynamic power dissipation in static CMOS logic gates. The authors in Refs. [3, 4] defined signal probability as the likelihood of an input signal being in logic state "1". A reordering based on this likelihood of logic "1" signal is used to reduce the power dissipated at internal logic nodes in a static CMOS logic gate. While Cirit [3] used signal probabilities to measure the dynamic power dissipation in a CMOS NAND gate, the authors in Ref. [4] provided a closed form design optimization through transistor reordering for minimum expected dynamic power dissipation in the internal nodes of the MOSFET chain of a CMOS NAND gate. In this paper, we have focused on the dynamic BiCMOS logic gates (reported by several authors, [5, 7, 9] ), and, have proposed improved transistor reordered structures based on a deterministic (rather than probabilistic) power dissipation measure. Small reduction in the gate delay-time is also achieved by this transistor reordering. This novel technique provides additional power-delay optimization, over that achieved by the application of input signal probability based transistor reordering (which change logical inputs) beforehand. In typical zero DC power dynamic BiCMOS logic gates there is usually a delay in switching off the power dissipation at the end of logic evaluation. In this investigation, we have considered the reduction of this dynamic power dissipation (at both the internal and the output nodes) by reducing the delay (a feedback delay) in turning off the pull-down (load discharging) device at the end of a high-to-low logic evaluation (henceforth this feedback delay-time is referred to as the "Evaluation Termination Delay-time"). This is achieved by reordering the relative placement of the feedback MOS device (henceforth referred to as the "Evaluation TERminator device") driven (controlled) by the output logic transition. Figure 1(a) shows a typical dynamic BiCMOS logic gate (a 4-input OR gate suitable for mux/demux, carry-bypass etc.) which has a precharge cycle during the clock low period and an evaluation cycle during the clock high period as discussed in Ref. [5] (henceforth this circuit is referred to as the Kuo structure). The NPN Q1 is turned off at the end of logic evaluation (switching off dynamic power dissipation) via output transient feedback to the gate of the PMOS Evaluation TERminator device M ETER . Figure 1(b) shows the Evaluation Termination Delay-time (ETD) at the end of a high-to-low logic evaluation. We define ETD as the time interval between the instant the output discharges to 50% VDD ( ø V INV of the loading gate) to the instant the M ETER turns off. The relative placement of the PMOS M ETER is explored so that the ETD is minimized and the NPN Q1 is switched off at an optimal instant (ensuring full output swing to 0 V). The dynamic power dissipation during the evaluation cycle is due to two current components, the hole current that flows through the PMOS input chain (the base current, I b of Q1) and the load discharging electron current (the collector current, I c through Q1). For the same load at the gate output, the emitter current (I c þ I b ) is almost the same. However, for the reordering with a smaller ETD, the duration of this current flow is reduced, and as a result the average dynamic power dissipation per cycle is reduced. The minimization of the ETD is also associated with a reduction in the high-to-low gate delay-time.
PARAMETERS IN TRANSISTOR REORDERING
Some of the important parameters and constraints that constitute the leverages and limitations of transistor reordered dynamic BiCMOS structures are now discussed.
Gradation in the Backgate Bias
In a typical dynamic BiCMOS logic gate (the Kuo structure in Fig. 1(a) ) reordering the position of the M ETER in the PMOS logic input chain (varying the separation from the VDD rail) may result in multiple threshold voltage (V TH ) ranges for the M ETER . This is due to the gradation in the backgate reverse bias as shown in Fig. 2(b) assuming that the body (the substrate) is tied to the VDD rail (e.g. in a P-well process technology). Hence, it is evident from Fig. 2 that for the reordered structures with the M ETER located lower in the PMOS chain, the turn-off point is likely to be effected more by the threshold voltage modulation, compared to the reordered structures with the M ETER placed closer to the VDD rail. It may thus be possible to determine an optimal reordered placement of the M ETER for the smallest ETD. However, if N-well (or twin-well) process is used, all the PMOS sources may be tied to the body (the well), the gradation in the backgate reverse bias would then collapse, and, the threshold voltage of the M ETER would be both time-invariant and position-invariant in the PMOS chain.
Source Potential Profile Across the MOSFET Chain
The effect of the source potential profile (across the MOSFET chain) on the operating range of the PMOS M ETER (in reordered structures) is another important consideration in determining its turn-off point and hence the ETD in the Kuo structure. For the dynamic BiCMOS logic gate in Fig. 1(a) , we consider in general, that all the PMOS input devices (including the M ETER ) have standard aspect ratios (no unusually large devices) and hence comparable PMOS ON-resistances. Also, assuming that all the inputs are at logic low (at the beginning of an evaluation cycle), all the PMOS devices in the input voltage divider chain would be normally on. The resulting spatial potential profile across the PMOS chain with the constraint VDD ¼ VS1 . VS2 . VS3 . VS4 . VS5 (as shown in Fig. 2 (a)), will enforce the condition jVGS1j . jVGS2j . jVGS3j . jVGS4j . jVGS5j on the turn-on operating range(gate overdrive voltages) of the PMOS devices (since all the gates are at 0 V). Hence for the PMOS M ETER positions (in reordered structures) lower in the PMOS voltage divider chain, progressively smaller positive signal transitions at the gate terminal will cause the M ETER to turn-off at progressively earlier time instants. Potential profile across the MOSFET chain can thus effect the optimal placement of the M ETER for reduced ETD and hence for minimal dynamic power dissipation.
Load Driving (Discharging) Capability
As MOSFETs are reordered to minimize the ETD in the Kuo structure, full pull-down to 0 V of the output load (precharged to VDD) has to be ensured during a high-to-low evaluation cycle. For higher output load capacitances, the minimization of the ETD (compression of the evaluation period) may result in incomplete discharge period through the NPN device. As a result there may be residual charge at the gate output node and full pull-down to 0 V may not take place (assuming no high-level injection and Kirk effect related gate switching delay degradation at higher load discharging collector currents). Thus, while a high ETD will increase dynamic power dissipation, a low ETD may on the other hand, result in loss of swing. The minimization of ETD therefore, has to be traded off with the load driving (discharging) capability of transistor reordered dynamic BiCMOS gates.
Charge Redistribution
Next we discuss the effect of charge redistribution on dynamic power dissipation. Figure 3(a) shows the N-cell of a full-swing dynamic BiCMOS circuit technique with precharge (clock low) and evaluation (clock high) cycles as discussed in Ref. [7] (the H&R structure). Here, at the end of a high-to-low logic evaluation (clock still high), the PMOS Py turns on and the NMOS M6 turns off in a bid to pull-up the gate of the M ETER to VDD (and turn it off). But since the NMOS M5 is still on (until the end of the clock high period), a fraction of the charge from the node n1 flows to the node n2. Next, if we interchange the positions of M5 and M6 (as shown in Fig. 3(b) ) M6 turns off at the end of logic evaluation and the redistribution of charge from the node n1 is prevented. The pull-up transient (Vn1(t )) at the node n1 now rises faster to VDD. As a result, the gate-to-source voltage of the M ETER , VGSM ETER ðtÞ ¼ ðVn1ðtÞ 2 VDDÞ will reach V THP (turn-off point) of the M ETER earlier for the reordered circuit compared to the original circuit. The ETD will thus be reduced for the reordered circuit in Fig. 3 (b) resulting in lower dynamic power dissipation.
SIMULATION RESULTS
Extensive SPICE simulations were carried out using a standard 2 mm BiCMOS ðV TH ¼^0:8 VÞ process technology and a power supply of 3.3 V. Several transistor reordered structures of the 4-input OR-gate of Fig. 1 (the Kuo structure) were evaluated. An aspect ratio of 15 was chosen for all the PMOS devices in the chain. An emitter size (WE £ LE) of 8mm £ 8 mm (low perimeter-to-area ratio) was chosen, ensuring low overall b degradation due to sidewall-injected carriers [6] . Table I summarizes the simulation results for 4 different transistor reordered structures. It indicates that, moving the M ETER down the PMOS chain (a column trace in Table I ) reduces the ETD and the power dissipation at the expense of incomplete output discharge level. On the other hand, increasing the output load for a fixed M ETER location (a row trace in Table I ) results in higher ETD, increasing power dissipation and higher output discharge levels. For example, @0.25 pF load, the ETD is maximum for M ETER @PMOS1(original structure in Ref. [5] ) where the source potential of the M ETER is at the apex (refer to Fig. 2(a) ) of the potential slope(requiring the largest VGS transition to turn off the M ETER ) and the least for M ETER @PMOS5 where the source potential of the M ETER is at almost the bottom (refer to Fig. 2(a) ) of the potential slope (requiring much smaller VGS transition to turn off the M ETER ). Although, the structure for M ETER @PMOS5 has the smallest ETD, the NPN Q1 in this case turns off prematurely leaving the output node at around 0.2 V (incomplete pull-down). This is because the smaller ETD in this case causes the NPN Q1 to be ON for a shorter period after V out reaches VDD/2 (refer to Fig. 1(b) ), resulting in loss of load discharging (sinking) capability compared to the original structure. Thus the reordered structure for M ETER @PMOS4 is considered the optimal, where full pull-down is achieved while the ETD is reduced by 0.24 ns compared to the original structure (@0.25 pF load). This results in about 0.13 mW ( ø 20%) reduction in dynamic power dissipation (@0.25 pF load and 3.3 V supply voltage). The simulations were carried out using the model in Ref. [8] at 278C using Ry ¼ 10 12 ohm; Cy ¼ 1 pF and T ¼ 600 ns: This choice satisfies the required inequility RyCy @ T [8] . Also, a value of r x ¼ 10 27 ohm [8] was used. Based on the above, for a 10 K gate ASIC chip/module a reduction in power dissipation by around 1 W can thus be expected which is quite significant. Table II shows a comparison of the performance enhancement achievable by transistor reordering using a power-delay product figure-ofmerit. Considerable performance enhancement is noticeable for several digital sub-systems using the optimized logic gates. The simulations also shows that the NPN turn-on delay (after the start of the evaluation cycle) is about 2.7 ns for the optimal reordered structure(M ETER @PMOS4) compared to about 3.2 ns for the original structure(M ETER @PMOS1). This results in an improvement of the gate delay-time (fall-time to 50% VDD) by around 0.5 ns (@0.25 pF load and 3.3 V supply voltage) for the optimal structure. Overall results in Table II shows an average reduction in the delay-time by about 10% through this transistor reordering.
The structures of Fig. 3 (the original and the reordered H&R N-cell structures) were also simulated and compared. The ETD for the reordered circuit is improved by about 1 ns (@0.25 pF load and 3.3 V supply). This results in a reduction in the dynamic power dissipation by about 0.10 mW ( ø 15%) for the the H&R N-cell. Table III shows the performance comparison for the original and the reordered structures. Unlike the Kuo structure, in this case there is no effect of the transistor reordering on the delay-time (delay-time is reordering invariant). This is because the turn-on times of the PMOS M ETER and the NPN Q1 are reordering invariant for the H&R N-cell structure. It is worthwhile to note here that, as a consequent tradeoff for this, the reordered H&R N-cell structure do not suffer from any loss of load driving capability as a result of the transistor reordering (also, unlike the Kuo structure).
EXPERIMENTAL RESULTS
Test structures of the original and the transistor reordered dynamic BiCMOS circuits were fabricated using the MOSIS 2 mm N-well analog CMOS process(N76Y run) which has a P-base mask layer for NPN bipolar device option. Figure 4 shows the photomicrograph of the fabricated original and the reordered test circuits for the H&R N-gate structure (Fig. 4(a) ) and the Kuo gate structure (Fig. 4(b) ), respectively. Layout area is found to be re-ordering invariant. (lower trace) and the transistor reordered (upper trace) H&R gate. An almost "leaf" shaped performance degradation specter due to charge leakage in case of the original H&R structure is clearly evident. The traces shows that the PMOS M ETER (V TH of the PMOS device being 2 0.988 V for the N76Y process) will turnoff around 2 ns earlier in case of the transistor reordered H&R gate structure (with no charge leakage) resulting in about 2 ns reduction in the ETD. Since the access path to the internal gate nodes is via the output pads, the oscilloscope traces also includes the pad-delay and the effect of the external (off chip) loading, which with reasonable approximation can be considered to mutually cancel in the measurement of the ETD difference. This measured ETD difference is thus quite close to the 1 ns reduction in the ETD by the transistor reordering, observed through the SPICE simulations. The savings in the dynamic power dissipation would thus meet (and possibly exceed) that estimated by the simulation results. Figure 5(a) shows the bottom section of a MOSIS small-chip project implementing a 24-bit CLA array (cascade of six 4-bit carry-lookahead with carry-bypass sections) using the original Kuo structure (top array) and the transistor reordered Kuo structure (bottom array), respectively. Separate VDD and GND pads are provided for each structure (as shown in the photomicrograph) in order to isolate the power supply currents drawn in each case. The optimal structure for each instance of the Kuo structure was decided based on obtaining the minimum ETD with full discharge to logic low @ a gate loading of # 0.25 pF (using the the output load vs. M ETER location data in Table I ). For power dissipation measurement purpose, a cascade of four 24-bit CLA arrays of both the original and the reordered structures were made separately using four 65-pin PGA packaged (from a lot of 12 supplied by MOSIS) smallproject chips (each containing one 24-bit structure of the original and the transistor reordered CLA array as shown). Measurements (with true RMS digital multimeter) using 3.3 V supply voltage and a 160 MHz clock (using QX0-1100 crystal oscillator module and PLL clock synthesizer) indicated an RMS power supply current of 224 mA in case of the top CLA array (using the original Kuo structure) compared to 179.2 mA in case of the bottom CLA array (using the transistor reordered Kuo structure). A reduction in the power dissipation by around 20% has thus been achieved through the transistor reordering. Figure 5(b) shows the oscilloscope traces comparing the pull-down delay for the carryout signal for the two CLA arrays. Although the oscilloscope measurements includes the access path delay due to the output pad, the bonding wires and the package traces, the difference ( ø 2.5 ns) represents the reduction in the delay-time obtained through the transistor reordering. This is in close agreement with the simulation results. The leverage in performance enhancement obtained by the transistor reordering is thus clearly evident through the experimental results.
CONCLUSION
Analysis, simulation and experimental verification of the performance enhancement of dynamic BiCMOS logic gates through a novel deterministic transistor reordering technique has been provided. The average reduction in power dissipation through this deterministic technique is found to generally exceed that achieved through signal probability based transistor reordering in Refs. [3, 4] by around 10%. Application of this deterministic technique presented here along with the probabilistic method (which change logical inputs) assures considerable additional reduction in power dissipation irrespective of the input signal pattern. In addition, taking into account the data correlations of the input streams [10, 11] further enhancements of the transistor reordering can be achieved for the dynamic BiCMOS logic gates. Improvements in the delay-time has also been achieved in the process, resulting in considerable enhancement of the power-delay product figure-of-merit. This technique may also be extended to other dynamic circuit/logic techniques. 
