I. INTRODUCTION

W
ITH advances in CMOS device technology both performance and power consumption of integrated circuits have improved dramatically. In very high performance designs, dynamic circuits like Domino ( [1] , [2] ) are used due to their high speed. However, with continuing scaling of supply voltage and transistor threshold voltage it is more difficult to use Domino circuits because of their noise margin dependence on the threshold voltage variation [14] .
This problem can be solved by using skewed logic circuits [3] , [4] . Skewed circuits are fully complementary static logic circuits. The sizes of PMOS and NMOS transistors are adjusted to enable one of the transitions to be faster than the other. Changing the driving capabilities of PMOS and NMOS transistor networks is referred to as skewing. The same result can be achieved by using different supply voltages or transistors with different threshold voltages to speed up one of the transitions. In the last case the sizes of transistors need not be increased and hence the input capacitance and area are smaller at the expense of multiple power/ground lines or more complex technology process.
Skewed logic gates have performance comparable to that of dynamic circuits, whereas the noise tolerance of skewed logic is better because it has no floating nodes. The floating node in a Domino gate can be eliminated using a keeper device. However, the keeper cannot restore the correct state of the gate if a false transition occurs due to input glitch. Skewed logic allows a tradeoff between the delay of the gate and its noise margins. Because of higher noise tolerance skewed logic is better than Domino logic for high performance low voltage/low power applications.
Similar to dynamic circuits skewed logic falls in the category of precharge-evaluate logic families. Fast transition is used for evaluation while slow transition can be used for precharge.
The rest of the paper is organized as follows: Section II describes operation of skewed logic. Section III discusses pipelining with skewed logic. Two variations of skewed logic are described in Section IV. Section V compares the energy-delay results for static CMOS, Domino and different kinds of skewed logic. Section VI discusses a dynamic noise margin model and compares static CMOS, Domino, and skewed logic in terms of dynamic noise margin. Section VII describes a test chip with a skewed CMOS multiplier.
II. SKEWED LOGIC
Circuit topology of skewed logic gate is the same as that of classical static CMOS logic. Fig. 1(a) shows a NAND-NOR-NAND gates series connection. To speed up high-to-low transition, the sizes of NMOS transistors of the first NAND gate are increased and the sizes of PMOS transistors of pull-up network are decreased. For fast low-to-high transition the transistor widths of the NOR gate are changed in the opposite direction. The ratio of the worst case driving capabilities of pull-down and pull-up networks is called the skew.
In order to achieve performance comparable to Domino circuits, the skewed gates should operate in two phases: precharge and evaluation. During precharge, all nodes are precharged to the initial state. During evaluation, the circuit performs useful work and only fast transitions can occur. To ensure that, the gate skewed for fast high-to-low transition should be followed by gate skewed for fast low-to-high transition and vice versa. An example of such a connection is shown in Fig. 1(a) . Circuits shown in Fig. 1(b) and (c) should be used if fast precharge from the clock is necessary. Precharge of pipelined skewed circuits is further discussed in Section III.
Skewing of a gate affects its performance in two ways: the trip point and the driving capabilities of the transistor networks change. Fig. 2 shows the dependence of the trip point for an inverter on the skew. We used models for 0. PMOS transistors with 2.5 V supply voltage ( ). The skew of 1 corresponds to the case when the ratio between PMOS and NMOS transistor sizes is equal to 2.
Delay dependence of both fast and slow transitions on the skew is shown in Fig. 3 for a NAND gate that is skewed for fast falling transition. Skew of 1 corresponds to the case where the gate has equal worst case high-to-low and low-to-high delays in the standard static CMOS logic mode, in which the worst-case low-to-high transition of NAND gate has a single PMOS device activated. However, the delays shown in Fig. 3 correspond to a NAND gate operated in a precharge/evaluate fashion when the precharge transition has every PMOS transistor activated. Because of that the precharge delay of the skewed NAND gate is less than the worst-case delay of rising transition of static CMOS gate.
III. PIPELINING WITH SKEWED LOGIC
A pipeline with skewed CMOS circuits can be synthesized following the same procedure as in Domino logic [5] . Fig. 4 shows a basic pipeline structure. The logic of each pipeline stage is divided into two blocks separated by latches. During the first half of the cycle, when the clock is high, logic block 1 is evaluating while latch A holds data. At the same time, logic block 2 is being precharged and latch B is transparent. During the second half of the cycle the situation is just the opposite. This technique allows propagation of data without waiting for the precharge of the next stage.
In such a pipeline, precharge, and evaluation delays of each logic block should be less than a half cycle. In the case of Domino or noise tolerant precharge (NTP) circuits (NTP circuit is shown in Fig. 1 (b) with dashed inverter) [6] , [7] it is easy to achieve short precharge delay because each gate is connected to the clock signal. However, in the more general case of skewed logic, which as a class of circuits includes NTP circuits, not all gates should be connected to the clock. Fig. 5 shows a logic block structure for skewed logic. In this figure the fast transition directions are designated by arrows. The gates connected to the clock have structure similar to the NTP circuits. Topology of gate 1 and 7 is shown in Fig. 1(c) . Circuit structure of gate 4 is shown in Fig. 1(b) without dashed inverter. In this example, we assume that the fast (evaluation) transition delay of all gates is and that the slow (precharge) transition is three times longer (3 ). We assume also that the delay from the clock ( ) of the gate connected to the clock is no greater than 3 . The sizing for such skew is shown in Fig. 1 . Waveforms on the outputs of the gates are shown in Fig. 6 . Precharge of first, fourth, and seventh gates starts immediately after the falling edge of the clock. Hence, the precharge delay is less than half cycle.
Such a technique reduces the number of gates connected to the clock in the skewed logic circuit in comparison with Domino and NTP circuits. Therefore, skewed circuits have lower clock capacitance and lower clock power consumption. Moreover, they draw a lower peak current from the power supply.
Also reduction in the number of gates connected to the clock can improve the circuit performance: gates, which are not connected to the clock, are faster than those connected to the clock because they have fewer inputs and fewer transistors connected in series.
Another advantage is the precharge process being more evenly distributed over time. Unlike Domino or NTP, not all skewed gates are precharged simultaneously after clock edge. This can be seen from the above example (Figs. 5 and 6 ). In the beginning of the precharge half cycle only three out of nine gates are precharged. The second gate is precharged only after the precharge of the first gate is completed and so on. Distributed precharge process further reduces peak current and it simplifies physical design.
IV. MULTI AND MULTI SKEWED CIRCUITS
Gate trip point and transistor network driving capability can be changed by using lower threshold voltage ( ) devices in the MOS network which supports faster transition while the higher devices can be used elsewhere. For example, the gate skewed for fast high-to-low transition should have low-NMOS transistors and high-PMOS transistors. Gate properties are affected at the cost of an increased process complexity.
The same effects may be realized by using separate power supply and ground lines for the gates as shown in Fig. 7 . Gates with fast low to high transition are connected to -high and -high while gates with fast high to low transition are connected to -low and -low. Gate trip point remains the same for single gate but changes relatively between any two connected gates since such gates must be connected to different power/ground lines. Output voltage swing is reduced in comparison with single power supply case without reduction of gatesource voltage of transistors in the fast network. Although gate speed is improved without increase of transistor sizes and input capacitance, the improvement is obtained at the expense of noise margin as well as increase of leakage power consumption when the gate is in the precharge state. At the same time dynamic power consumption is reduced because of smaller input capacitance and output voltage swing.
V. ENERGY-DELAY COMPARISONS
In order to evaluate different circuits families, delay, and energy per transition are measured. Various points are obtained by changing the overall gate size.
Static CMOS gates have optimum (in terms of energy-delay product) ratio of about 1.5 between PMOS and NMOS transistor sizes [3] . Domino gates have a fixed keeper transistor size to pull-down network transistor size ratio of 1/6. This ratio is chosen for dynamic logic because very little change in delay or power is observed with smaller ratios. Fig. 8 compares the energy per transition, which includes precharge energy, versus delay for two cascaded two input NAND circuits forming AND-OR structure. This structure drives another NAND gate of the same size as a load. The plot for static CMOS gates is obtained at the optimum sizing. Similar plots are obtained for Domino gates, which consist of a dynamic inverter followed by a skewed (skew ratio 4) static inverter for both Domino gates with and without footer [8] , [13] . However, footless Domino requires additional clock generation with addi- tional power overhead that cannot be taken into account in this simulation.
The curves for skewed gates show fast gate delay versus energy dissipation. They are obtained by varying the skew of the gates while keeping the sum of transistor widths constant. Three curves at different total widths are plotted with the skew being varied from 1 to 5. Substantial improvement in delay is obtained without compromising the energy per transition.
Skewing reduces short circuit power of the gate. To simulate and measure short circuit power we used a method similar to the method described in [9] . Fig. 9 shows that short circuit energy per transition (evaluation-precharge) of the same cascaded NAND gates reduces by about 40% when skew changes from 1 to 5.
As shown in Fig. 10 leakage current increases with skew in the precharge state and decreases in the other state. The average leakage current will depend on the probability of each state of the gate. In general, precharge state is more likely to occur because the gate may not switch during evaluation phase. The simulation results for AND-OR multigates are shown in Fig. 11 . One curve (solid line) shows the dependence of energy-delay on supply voltage difference ( ) with base of 2.5 V for the case when the transistor sizes are constant (skew is equal to 1). The results for simple skewed circuit with skew of 1 and for multicircuit with equal to zero coincide. Second type of curves (dashed lines) for multigates show the dependencies on transistor size skewing with skew varying from 1 to 5 for constant voltage differentials between the supply lines when overall transistor sizes and total area remain the same. The curve for multigates with and the curve for the skewed gate at coincide. Increasing gives linear gains in both energy per transition and delay because of decrease of output voltage swing of the gate. Up to 30% delay and energy reduction is possible. In contrast the increase of reduces the drive of the MOSFET which is responsible for the precharge transition. Consequently, the speed degradation is greater for slow transition than the improvement for fast transition. In the precharge state multigates have higher leakage current because gate-source voltage is not equal to zero. Another drawback is the reduction in noise margin.
Similar to multicircuits two types of curves are plotted for multigates in Fig. 12 . Solid line depicts dependence of delay and energy on the difference between the threshold voltages of fast and slow transistor networks. Dashed lines show the dependencies on transistor size skewing with skew varying from 1 to 5 for constant voltage differences in threshold voltages. The curve for multigates with and the curve for the skewed gate at coincide. Transistor threshold reduction results in the same delay improvement as supply voltage difference. However, power consumption of the multigate does not differ much form simple skewed gate because output voltage swing does not change. Multicircuits have the same disadvantages as multigates, i.e., high leakage power in the precharged state and reduced noise margin, although their noise margin is better than that of multi-.
VI. DYNAMIC NOISE MARGIN
Because of high gain of complementary CMOS circuits static noise margin of skewed circuits-the point where the gain is unity-is close to the input trip point voltage. The dependency of the trip point of an inverter on the skew is shown in Fig. 2 . Such a dependency provides a tradeoff between the noise margin and fast transition time as the skew changes.
For high-performance precharge-evaluate circuit, static noise margin may not be the only metric for measuring the gate robustness. Static noise margins are important for rejecting voltage offset created on the output nodes due to leakage and noise on power supply lines. However, it does not adequately address the problem of capacitive coupling noise [3] , [10] .
In order to evaluate coupling noise we used dynamic noise margin model proposed in [3] , [10] . In this section we present a circuit structure that is used to evaluate the robustness of a gate. Fig. 13 shows the circuit structure with coupling noise. An aggressor gate forces a transition from high to low on its output. This transition is coupled from the aggressor to a victim node driven by gate . The output of gate must be high for a noise problem to occur. The strength of the noise spike at the victim node is dependent on coupling coefficient.
The amplitude of the spike also depends on the rise or the fall time of the aggressor and the drive strength of gate .
The noise voltage induced at the output of victim gate by itself does not cause a failure until it is propagated across affected gate
. If the output voltage of rises above the noise threshold determined by static noise margin of succeeding gate , we assume that a failure in functionality occurs.
We evaluate the dynamic noise margin of standard static CMOS circuits, Domino, skewed multiand multigates for simple inverters. The same HSPICE models for 0.25 m CMOS technology with 2.5 V supply voltage were used for simulation.
The peak voltage of coupling noise on the victim node is shown in the left part of Fig. 14 . The left chart shows maximum voltage of coupling spike for high-to-low transition. In the case of Domino logic the left chart shows an effect of coupling to the dynamic node. For skewed gates this corresponds to gates which are skewed for fast falling transition.
Peak voltages versus coupling coefficient are obtained under assumption that the total victim capacitance is constant at 25 fF. Domino, skewed, multi-, and multigates are assumed to be precharged when the aggressor transition occurs. A skew of four is imposed on the skewed gates and Domino output inverter, whereas keeper is sized to have a size of a sixth of the pull-down network of Domino gates. Supply voltage difference for multigates is equal to 0.4 V. In multigates high-NMOS and PMOS transistors have threshold voltages equal to 0.54 V and 0.52 V, respectively while threshold voltages of lowtransistors are 0.14 V and 0.12 V. All gates have similar average transistor size of 2.0 m.
Left chart in Fig. 14 The left chart shows that Domino exhibits slightly higher peak voltage than the skewed gates primarily because the victim node is very weakly driven by a keeper device.
A coupling capacitance of 22 fF is needed for the static noise margin to be exceeded for Domino, while must exceed 26 fF for the case of skewed logic.
The peak noise voltages at the output of the affected gate are shown in the right chart of Fig. 14 . Simulation conditions are identical to those in previous figure. The static noise margins of the respective succeeding gate ( in Fig. 13 ) are shown as thin horizontal lines.
Static CMOS gates clearly offer very high degree of noise tolerance. A failure occurs only when fF. values greater than 28 fF cause a failure on the input of affected Domino gate in the case of low-to-high noise spike on the affected node. In comparison skewed gates need fF to cause a failure.
The higher performance and lower power consumption of multigates are obtained at the expense of reduced noise margins:
fF causes a failure. Reduction of threshold voltage in multigates also decreases dynamic noise margin of the gate but dynamic noise margin of multigate is better than that of multigates: fF can cause a failure. The results for multigates show that skewed logic gates can be robust even in the case of lowered threshold voltage.
VII. SKEWED LOGIC MULTIPLIER TEST CHIP
In order to check applicability of skewed logic circuits, we designed and fabricated through MOSIS a test chip with 16 16 bit multiplier using design rules for 0.35 m CMOS technology with 3.3 V supply voltage. The multiplier, which block diagram is shown in Fig. 15 , consists of a Booth encoder, a Wallace tree and the final adder. The latency of multiplier is one cycle. During the first half cycle the Booth encoder produces partial products and Wallace tree sums the partial products. Final summation is performed in the second half cycle by the final carry select adder. Test-chip photo is shown in Fig. 16 . Measured Table I . All data are presented for highest possible clock rate.
Cells connected to the clock are designated in Fig. 15 by arrows. Out of 386 logic cells only 116 are connected to the clock. This confirms that skewed circuits have lower clock load and power dissipation than Domino circuits.
VIII. CONCLUSION
This paper describes a new noise tolerant high-performance low-power skewed static logic circuit family and its variations. Skewed circuits have better noise tolerance than Domino circuits while the performance of skewed logic is approximately comparable to that of Domino. Another advantage of skewed logic is reduced clock capacitance, clock power dissipation, and reduced peak current of power supply/ground lines. These characteristics make skewed CMOS very promising for high performance low-power/low-voltage design.
