Abstract-In this paper, a strategy for the design of source-coupled logic (SCL) gates both with and without an output buffer is proposed. Closed-form design equations to size bias currents and transistor aspect ratios to meet assigned specifications are derived from a simple SCL gate analytical delay model, shown to be sufficiently accurate by extensive simulations. The design criteria proposed are simple and provide the designer with a more profound understanding of the tradeoff between delay and power consumption. More specifically, design criteria are derived to consciously manage this tradeoff in practical design cases, i.e., when either high performance or an optimum balance with power dissipation is needed. Therefore, the strategy proposed is useful right from the early design phases, and avoids tedious simulation iterations.
Design Strategies for Source Coupled Logic Gates
Massimo Alioto, Member, IEEE, and Gaetano Palumbo, Senior Member, IEEE Abstract-In this paper, a strategy for the design of source-coupled logic (SCL) gates both with and without an output buffer is proposed. Closed-form design equations to size bias currents and transistor aspect ratios to meet assigned specifications are derived from a simple SCL gate analytical delay model, shown to be sufficiently accurate by extensive simulations. The design criteria proposed are simple and provide the designer with a more profound understanding of the tradeoff between delay and power consumption. More specifically, design criteria are derived to consciously manage this tradeoff in practical design cases, i.e., when either high performance or an optimum balance with power dissipation is needed. Therefore, the strategy proposed is useful right from the early design phases, and avoids tedious simulation iterations.
Index Terms-Analytical delay model, source coupled logic (SCL) gates.
I. INTRODUCTION
T HE diffusion of applications requiring digital signal processing, including digital audio and video applications, has increased market interest in high-speed high-resolution mixedsignal ICs, such as sigma-delta A/D and D/A converters [1] - [5] . The performance of these circuits, that consist of both analog and digital circuits sharing the same substrate, is strongly affected by the noise induced onto the former circuits by the latter. Indeed, due to the supply current spikes during switching, logic gates generate the power-supply switching noise [5] - [10] , that couples with the analog circuits through supply lines and substrate coupling, thus degrading circuit resolution. In particular, the traditional CMOS static logic generates a high amount of noise, and is thus not suitable for high-resolution applications. For instance, in [5] , digital and analog blocks were implemented in two different chips to obtain the required resolution, even using analog circuits with a high noise rejection and exploiting techniques such as diffusion of guard bands, keeping separate analog and digital supply lines, pads and wires [11] . As a consequence, alternative logic styles with reduced supply current spikes during switching are needed to reduce the amount of switching noise.
Until now, various topologies that draw an almost constant supply current have been proposed [11] - [25] . The most successful among these are the source-coupled logic circuits (SCL) [9] , [11] , [20] , [25] , based on the source-coupled pair of MOS Manuscript received January 23, 2002 ; revised June 21, 2002 . This paper was recommended by Associate Editor C.-W. Jen.
M. Alioto was with the Dipartimento Elettrico Elettronico e Sistemistico (DEES), Universita' di Catania I-95125 Catania, Italy. He is now with the Dipartimento di Inggegneria dell Informazione (DII), University of Siena, I 53100 Siena, Italy.
G. Palumbo is with the Dipartimento Elettrico Elettronico e Sistemistico (DEES), Universita' di Catania I-95125 Catania, Italy (e-mail:GPalumbo@ dees.unict.it).
Digital Object Identifier 10.1109/TCSI. 2003.811023 transistors, and their modified versions like folded source coupled logic (FSCL) circuits or Enhanced FSCL (EF SCL [14] , [25] . Indeed, compared to CMOS static logic, SCL logic style allows switching noise to be reduced by two orders of magnitude [12] , [14] . Since the low switching noise feature of SCL is obtained at the cost of static power dissipation, a design strategy of SCL gates is required to meet specifications, while keeping power consumption as low as possible. Moreover, design criteria to consciously manage the power-delay tradeoff are required.
In this paper, a systematic design procedure to size the bias current and transistor aspect ratio to satisfy noise margin (NM) and delay requirements is proposed. To this end, an analytical model of NM and delay is discussed for the SCL inverter both with and without output buffer. The model affords a deeper understanding of the power-delay tradeoff, and, as the simulations confirm, it is also accurate enough for this purpose. The relationships obtained are then used to size the design parameters. More specifically, first the transistor aspect ratios are expressed as a function of the bias current to meet the NM requirement, and then the bias current is sized according to either power-efficient or high-speed design criteria. Successively the results are extended to the case with the output buffer, providing an optimal size for its transistors and bias current to minimize delay, assuming practical design conditions.
The proposed design strategy gives simple closed-form expressions of design parameters. Moreover, model and design equations simply relate delay and the bias current, thus helping designers to suitably balance speed and power without resorting to tedious simulation iterations.
Modeling of the NM and delay of SCL gates with and without an output buffer is discussed in Sections II and III, respectively. Design strategies for SCL gates with and without output buffer are presented in Sections IV and V, respectively, where design examples are provided for each criteria considered. Finally, conclusions are reported in Section VI. Moreover, to enhance the readability of this paper, two appendices have been added.
II. SCL GATES MODEL
The SCL inverter, shown in Fig. 1 (where models the load), is made up of an NMOS source-coupled pair having transistors working in the saturation or cutoff region, that approximate well the behavior of a voltage-controlled current switch. The gate bias current, , is steered to one of the two output branches and converted into a differential output voltage by two PMOS transistors working in the linear region. Defining the voltage drop of M3 (M4) due to a drain current equal to , the logic swing of the gate, , becomes equal to 2 . 
A. Modeling of DC Parameters
As demonstrated in Appendix I using the standard BSIM3v3 MOSFET model [26] , under the static condition, PMOS transistors can be suitably approximated by an equivalent linear resistance given by [27] (1a)
where parameter models the source/drain parasitic resistance and depends on the empiric model parameter as well as the PMOS transistor effective channel width , with parameter being given by (1b) that represents the "intrinsic" resistance of the PMOS transistor in the linear region (i.e., it does not account for the parasitic drain/source resistance). In (1b), the term represents the effective hole mobility (A1.3a) defined in Appendix I, parameter is the PMOS effective channel length, is the oxide capacitance per area, and is the threshold voltage. Using (1), the logic swing is equal to (2) Due to symmetry, the circuit logic threshold is equal to zero, and the associated small-signal voltage gain is , where is the small-signal transconductance of transistors M1-M2 with . Transconductance for short-channel transistors can be evaluated using the well-known expression valid for long-channel devices [32] where the effective electron mobility, , given by (1) must be used instead of the longchannel mobility.
Since -and when the gate is biased around the logic threshold, voltage of transistors M1-M2 is equal to their , which can be underestimated to for the sake of simplicity. Hence, the resulting expression of the voltage gain is
where relationship was substituted. From algebraic manipulations developed in Appendix II, NM can be expressed as NM (4) that is proportional to parameter (i.e., half the logic swing) and roughly equal to it if is in the order of 4. The dc parameters expressions (2)-(4) were compared to Spectre simulation results for an SCL inverter, using a 0.35-m CMOS process, whose main parameters are summarized in Table I . Simulations were performed by setting V, with ranging from 5 to 100 A, and choosing the transistors aspect ratio to obtain ranging from 2 to 7, and from 200 to 900 mV. The maximum error for , and NM is equal to 25%, 26%, and 25%, respectively, and the average error is 15%, 13%, and 8%, respectively.
B. Modeling of Propagation Delay
To model the propagation delay, , of the SCL inverter in Fig. 1 , it is useful to observe that NMOS transistors work in the saturation region most of the time, and their source voltage is the same for both input logic values (it is fixed by the NMOS transistor in the ON state). Thus, after linearizing the circuit around the logic threshold , the half-circuit concept applies, since the circuit is symmetrical and input is differential [27] .
In the half circuit obtained, shown in Fig. 2 , transistor M1 (M2) is replaced by its small-signal model and transistor M3 (M4) is substituted by its equivalent resistance . The parasitic capacitances of PMOS and NMOS transistors (associated with subscripts and , respectively) affecting circuit switching are , which represents the drain-bulk junction capacitance, and , which schematizes the channel and the overlap contribution between the gate and the drain.
The first-order circuit in Fig. 2 has a time constant ( ) that can be evaluated by applying the open-circuit time constant method [28] , and whose resulting delay is 0.69 , assuming a step input waveform and after neglecting the high-frequency zero. Hence, the propagation delay of the SCL gate is given by (5) The NMOS capacitance in (5) is evaluated in the saturation region, it is thus almost equal to the overlap capacitance between the gate and the drain. Junction capacitances and can be linearized by modifying their value in a zero-bias condition via coefficients according to [29] . 1 Capacitance is equal to the sum of the overlap contribution and the intrinsic contribution associated with the channel charge of the PMOS transistors working in the linear region,
. In particular, we can adopt the BSIM3v3 capacitance model [26] , that expresses capacitance as the derivative of charge flowing into drain with respect to the drain voltage [27] ( 6) obtained by considering that the gate, source, and bulk voltages are constant, using parameter evaluated in Appendix I slightly greater than unity, and assuming . It is apparent that the expression valid for long-channel devices (i.e., [30] , [31] ) is significantly different from (6) .
To validate the delay model, the bias current was varied from 5 to 100 A, the transistor aspect ratios were sized to obtain the typical values mV and , and the load capacitance was set to 0 F, 50 fF, 200 fF, and 1 pF. In order to show typical behavior, the predicted and simulated delay is reported in Fig. 3 , versus bias current for equal to 50 fF. The error of the model, plotted in Fig. 4 , is always within 19% even in the nonrealistic case of F.
C. Evaluation of SCL Gates Input Capacitance
In the previous subsection, the delay was evaluated with an assigned load capacitance , which modeled the wiring capacitances and the input capacitances associated with the subsequent gates. Therefore, to evaluate the delay of cascaded gates one must express the input capacitance of an SCL gate to compute the delay of the previous gate. 1 Coefficient K is equal to
where is the built-in potential across the junction, m is the grading coefficient of the junction, and V and V are the minimum and maximum direct voltages across the junction, respectively. As highlighted in Section II-B, the source voltage of transistors M1-M2 in Fig. 1 is independent of the input logic value; thus, the input capacitance seen from the gate of M1 (or M2) can be assumed equal to its gate-source capacitance evaluated in the saturation region (7) which shows from the actual value, an error of at most 15%, and is typically less than 10%.
III. SCL GATES WITH AN OUTPUT-BUFFER MODEL
An output buffer can be added to the basic SCL gate as in Fig. 5 either to improve its driving capability (and its switching speed) or to shift the output common-mode voltage level. The output buffer is implemented with a source-follower biased by the current source .
A. Modeling of DC Parameters
Parameters , , and NM of the SCL gate in Fig (8) where and are respectively, the body effect transconductance, and the transistor transconductance, and their ratio is almost constant 2 .
The logic swing , and the overall voltage gain , are then equal to the product of (8) with (2) and (3), respectively. Therefore, the NM can be evaluated by substituting the parameters obtained and in (4) . Simulation results showed that, with respect to the case without the buffer, the logic swing and the voltage gain were invariably reduced by 12%, as accurately predicted by (8) [27] . 2 Typically, g =g ranges from 0.1 to 0.2, and it is g = g = 0:13 with the process used, giving v =(v ; buf10v ; buf2) = 0:88 
B. Modeling of Propagation Delay and Validation
The propagation delay of the SCL gate with the output buffer in Fig. 5 , is equal to the sum of the SCL gate delay , and that of the buffer delay (9) Delay is given by (5) setting to zero, 3 and that of the buffer is evaluated by driving it with the Thevenin equivalent circuit seen at the output of the internal SCL gate, modeled with a voltage source and a resistance (equal to ). More specifically, is evaluated by analyzing the circuit in Fig. 6 , where the linearized buffer circuit is driven by the Thevenin equivalent circuit of the SCL gate and the source-bulk capacitance was neglected with respect to . In Fig. 6 , capacitances and represent the gatedrain and the gate-source contributions, and are evaluated as 3 In order to model interconnects between the inverter and the buffer, C can be set to its equivalent value. small-signal capacitances in the saturation region. For the sake of simplicity, the body effect was taken into account only when evaluating the contribution of capacitance to the input loop (i.e., in parallel with ), neglecting that in the output loop with respect to load capacitance. More specifically, in applying the Miller theorem to , its contribution to the input loop is obtained by multiplying with parameter that is equal to ( ) and results in
The transconductance of transistors implementing the buffer can be evaluated [27] as shown in (11) at the bottom of the page.
Despite the circuit in Fig. 6 having two poles and one highfrequency zero, it exhibits a dominant-pole behavior for practical values of and (detailed analysis shows that the (11) second pole is greater than the first at least by one order of magnitude), thus, we obtain the following expression of the buffer delay:
From (12), we can see that buffer delay is equal to the sum of two terms: the first is proportional to and models the loading effect of the buffer on the internal SCL gate, the second is inversely proportional to , and hence it depends on the buffer driving capability.
The buffer delay model was validated with many simulations. The bias current was varied from 5 to 100 A, and bias current was set to 5, 20, 50, and 100 A, with V (the transistor aspect ratios and load capacitance were sized as in the previous subsection).
The observed model error was always lower than 20% (unless for unpractical cases in which it reach 35%), and the average error found was 15%. AN OUTPUT BUFFER
IV. DESIGN STRATEGIES OF SCL GATES WITHOUT
In SCL gates, the transistors' aspect ratio and bias current must be sized to satisfy NM and speed requirements. Using the model developed, in the following Sections IV-A and IV-B, the aspect ratio which meets the specification and the associated delay are expressed as a function of ; subsequently, the bias current is set for a power-efficient (Section IV-C) or high-speed design (Section IV-D).
A. Transistor Sizing Versus
The analysis of SCL gates without a buffer carried out in Section II provides closed-form expressions of their performance that can be used in the design phase. More specifically, they help us to choose proper values for transistor aspect ratios and bias current to satisfy NM and speed requirements.
By inspection of (4), an adequate NM is obtained if and are sufficiently high. However, increasing leads to a logic swing increase that in turn determines speed degradation, thus, for an assigned value of NM, from (3) we have to choose a sufficiently high with respect to . However, from (3), high values of can only be achieved by increasing the NMOS transistor aspect ratio, thus increasing parasitic capacitances and slowing down the circuit. A good compromise for an adequate NM without excessively degrading speed performance is obtained by setting (for instance, by choosing we get NM). In the following, and are assumed to have already been chosen in a preliminary design phase on the basis of the NM requirement.
The PMOS transistor aspect ratio has to be sized to obtain the resistance value that, for a given value of , ensures the desired . Desired resistance is achieved by properly setting the aspect ratio of PMOS, whose value may be greater or lower than that obtained for the minimum transistor size , depending on . To explain this point, we first define as the PMOS resistance obtained for minimum transistor size, and as (13) that, for example, is equal to 24.6 A for the 0.35-m technology described above and setting typical values and mV. If (or, equivalently, the desired is greater than ) we have to set and increase according to 4 (14) that was derived by inverting (1) . If (i.e., ), we have to set and increase as shown in (15) at the bottom of the page.
The NMOS transistor aspect ratio has to be sized to guarantee a voltage gain higher than (or equal to) the desired value , for a given . By inspection of (3), assuming a minimum NMOS aspect ratio, a sufficiently high value of (i.e., ) is achieved for bias currents such that (16) For example, using the considered technology and setting and mV, results in 1.45 A, which is very low with respect to practically used values. For , (3) shows that the NMOS aspect ratio must be increased to guarantee the assigned value of . Hence, we have to set and according to (17) It is worth noting that regardless of the process used (indeed, under the rough approximation , the ratio is equal to , that is largely greater than unity). In summary, the PMOS transistors' sizing depends on whether is lower or higher than , while the NMOS transistors' sizing depends on whether is lower or greater than , according to the scheme reported in Table II , 4 To simplify the inversion, (14) was obtained by expanding (1) in Taylor series truncated to the first order, R = R (1 + R =R ) (for the 0.35-m process described before, this leads to an error of 20%) where three ranges of bias current can be recognized: low current ( ) for , medium current ( ) for , and high current ( ) for . It is worth noting that in practical cases, logic gates are biased in the or range.
B. Expression of Delay After Transistor Sizing Versus
The value of parasitic capacitances in the delay expression (5) is affected by the transistor sizing, whose dependence on changes according to the range ( , or ) which the bias current belongs to.
For now, consider an SCL gate biased in the region , where, according to Table II, and are given by (17) and (14), respectively, whilst and are minimum. Substituting these transistor sizes, expressions of capacitances , , and have the same dependence on bias current, that can be expressed in the following compact way: (18) where and are the terminals that the capacitance considered refers to, and the superscript refers to the biasing region . For instance, represents the NMOS gate-to-drain capacitance expression for ranging in the interval , and its dependence on is described by the associated coefficients , and . The analytical expressions of the three coefficients for all the transistor capacitances are explicitly reported in Table III (those not reported are equal to zero). It is worth noting that they depend only on the previously assigned values of and , as well as the process used, hence they are constant in the design.
Substituting (18) for all the capacitances into (5) using Table III , and defining the coefficients as
we get the following expression of propagation delay of the SCL gate versus : (20) Although (20) was derived assuming the gate was biased in the region , it can also be used in the other regions. Indeed, according to Table II , for low-bias currents (both in the range and at the beginning of region ), the dominant contribution to delay is from due to the high value of , and has the same expression (14) in both ranges and . Therefore, the delay equation in the range can be extended to region without significant error. Analogously, for high-bias currents (both in the range and at the end of range ), capacitances and are dominant owing to the high value of , which has a unique expression (17) in both ranges and , allowing the model valid in the range to be extended to the interval . From an analytical point of view, the extension of (20) to other regions can be understood by evaluating delay in regions and , after applying the same procedure as for region . It can be found that (18) still holds, and the resulting values of coefficients , and associated with each capacitance (reported in Tables IV and V, for region and ) justify the extension of (20) , as numerical results in Table VI for the 0.35-m CMOS process confirm.
In summary, the SCL gate delay can be approximately modeled with (20) regardless of the bias current, although this relationship has been derived assuming . As an example, for the 0.35-m CMOS process used, this approximation leads to the error in Fig. 7 (plotted versus in logarithmic scale in the worst case ) with respect to expressions rigorously derived in each region. By inspection of Fig. 7 , it is apparent that the error is always lower than 14%, and it can be shown that for more realistic load conditions, it is in the order of a few percentage points.
C. Sizing of for Power-Efficient Design
In Section IV-A, transistor aspect ratios have been expressed on the basis of the NM requirement, with bias current being the only unknown design parameter. At this point, different criteria can be used to size , depending on the specific application. In this subsection, we discuss bias current sizing to minimize the power-delay product (PDP) that quantitatively expresses the efficiency of the tradeoff between delay and power dissipation [29] to optimally balance delay and power consumption.
Since SCL gate power consumption is dominated by the static contribution , expression of PDP obtained from (20) is PDP (21) where a given load capacitance has been considered. Equation (21) is minimum for the bias current given by (22) It is worth noting that the optimum current (22) does not depend on the load, but only on coefficients and , which in turn depend only on the NM specification and the process. For example, using the 0.35-m technology above described and setting , mV, results in 10.7 A (evaluated aspect ratio of NMOS and PMOS transistors are 4.2/0.3 and 0.6/1.8, respectively). Assuming a load capacitance of 50 fF, predicted delay (20) 
D. Sizing of for High-Speed Design
When high-performance is required, two cases should be considered during the design of SCL gates. In the former, a delay constraint is given from considerations at gate level, and the value of bias current that allows us to satisfy this specification is obtained by solving (20) for (obviously, no solution exists if ). In the latter, no delay constraint is given and the delay has to be kept as close to the asymptotic value (obtained for ) as possible, in order to exploit the speed potential of the circuit and the process used, but keeping the bias current within a reasonable range.
From (20) , for a given load capacitance , it is apparent that an increase in leads to a significant decrease in only when (23) and in the case of strict equality, the delay is only twice the minimum achievable. Therefore, a reasonable choice to get almost the best performance without wasting too much current is obtained when strict equality is satisfied, i.e., when is equal to (24) that is much greater than especially for high , as shown by comparing (24) with (22) . This means that the criteria proposed leads to a high speed at the cost of a significantly worse PDP than the minimum achievable. For example, under the same conditions as in the previous subsection and fF, results in the high value of 110 A that leads to PDP fJ [greater than value of 50 fJ obtained using (22) ]. The aspect ratio of NMOS and PMOS transistors are 41/0.3 and 2.4/0.3, respectively, while the predicted and simulated delay are 226 and 196.5 ps, respectively.
E. Remarks on Technology Scaling
To evaluate the effect of scaling on power consumption and propagation delay for the design strategies proposed, let us consider coefficients and . From Table III, considering an TABLE V  CAPACITANCE COEFFICIENTS IN REGION H   TABLE VI  VALUES OF CAPACITANCE COEFFICIENTS equal scaling of NMOS and PMOS parameters, the following relationships are obtained:
where is the unity length junction capacitance between the source/drain and the substrate (including area and sidewall contributions) and is the overlap length, that is assumed to scale like the minimum length.
Although the ratio between the junction capacitance and the channel capacitance, , as well as the ratio between the threshold voltage and the power supply, increases with technology scaling [34] - [38] , to get a rough estimation we can neglect this dependence and simplify (25a) and (26a) into (25b) (26b) By using (25) and (26) in (22), the bias current that minimizes the PDP results (27) By scaling the channel length by a factor , the oxide capacitance and the power-supply voltage both scale by a factor , while aspect ratio slightly increase [34] - [38] , thus determining an increase in the optimum bias current by a factor higher than . Regarding the corresponding speed, by neglecting term with respect , from (20) we get (28a) where (7) has been used to evaluate the scaling factor of the load capacitance and represents the fan-out of the gate. From (28a), the propagation delay results equal to the sum of two terms. However, by considering that and have opposite scaling factors and neglecting the scaling factor of the aspect ratio, we can approximate (28a) by (28b) Thus, the propagation delay scales as the square of the minimum length, which is much higher than propagation delay scaling forecast for the traditional CMOS gates. Of course, consideration on PDP can be simply achieved by multiplying (27) and (28) .
Regarding the design strategy for high speed, we can simply derive that the propagation delay is only proportional to coefficient , hence (28) still holds. Moreover, from (24) the bias current results (29) where parameter represents the scaling factor of the part in the bracket which increases in a complex way as reducing the minimum channel length.
V. DESIGN STRATEGIES OF SCL GATES WITH OUTPUT BUFFER
As clarified in Section III-A, the relationships between and with bias current obtained in Section IV are still valid in the case with an output buffer. Moreover, the delay of the internal SCL gate in (9) , can be approximated to (20) where must be substituted with the interconnections capacitance , otherwise is set to zero. Compared to the simple SCL gate, there are two other design parameters other than : the bias current of the buffer, , and the aspect ratio of its transistors, , whose appropriate sizing depends on whether a buffer is added to intentionally introduce a level shifting (Section V-A) or to improve speed performance (Section V-B). In both cases, design criteria are provided to split the assigned current allowed per gate, , into contributions and to maximize speed.
A. Buffer Used as a Level Shifter
When a buffer is used to implement a level shifter, it reduces the common-mode output voltage of an SCL gate by the voltage of its transistor, equal to (30) By inverting this expression, we determine the value of the ratio from the assigned shift voltage, , resulting in (31) where has been assumed to be minimum, as occurs for practical values of . Therefore, design parameters are only the bias currents and . In particular, the current per gate , must be split into and , according to a factor that defines the amount of the total bias current used in the internal SCL (accordingly, ). For a given , factor must be optimally chosen to minimize delay, and extensive numerical analysis shows that in practical cases (typically ). Hence, is so small ( ) that terms proportional to are negligible with respect to those proportional to . Using (31), substituting and , as well as substituting (20) in (9) after neglecting its terms proportional to , leads to the following expression of derivative , shown in (32) at the bottom of the page, where has been assumed and (31) has been used. Equating (32) to zero and solving for , its value that minimizes is approximately given by (33) which shows that, by increasing the load capacitance or the gate current, the fraction of used for decreases as and . Extensive verification of (33) shows that it agrees well with optimum evaluated numerically from (32), with their difference being always lower than 25% of the latter, and typically lower than 10%. The effect of this difference on delay is reduced typically to a few percentage points since the minimum of with respect to is rather flat, as can be observed in Fig. 8 , which shows the delay of the SCL gate versus assuming pF, A, V under the conditions explained in the previous (32) design examples. In this case, (33) provides , which differs from the exact minimum 0.148 by 7%, while the optimum delay (9 ns) is overestimated only by 0.15%.
B. Buffer Used to Improve Speed
In some cases, for a given total gate current , the choice of an SCL gate with a buffer (properly splitting into and ) leads to a better speed performance than an SCL gate without a buffer (biased with ). In particular, this occurs when the available gate current is low and/or the load capacitance is large, as it can be seen by comparing (20) and (9) .
When a buffer is used to improve speed, design parameters , , and must be evaluated. More specifically, design criteria are needed to split the assigned current per gate into and , and to size the buffer transistor channel width (differently from Section V-A, no constraint between and exists). To this end, let us rewrite the expression of to make its dependence on more explicit (34) This has been obtained by substituting (20) into (9), and expressing capacitances , as the product of their value at minimum (named , ) and the factor (35) that represents the channel width normalized to the minimum allowed. Moreover, from (11) , results in its value at minimum , , multiplied by (minimum channel length has been assumed, as before).
Since (34) has to be optimized both for and , for the sake of simplicity, first assume (i.e., minimum ) and evaluate the optimum ratio as in the previous subsection, and then optimize itself.
First, we minimize delay (34) with respect to the ratio setting , and, as in Section V-A, assuming as well as neglecting terms proportional to in (34) . Moreover, since the addition of a buffer makes sense only for a high , we assume it to be much greater than (frequently satisfied in practical cases). Therefore, the derivative of (34) with respect to becomes (36) , shown at the bottom of the page, that, equated to zero, gives the following value of that minimizes : From (37) , the fraction of used in decreases as when the load capacitance increases, while it decreases as when the gate current increases. Verification of (37) shows that, in the cases of interest (i.e., when the SCL gate with a buffer is faster than the one without a buffer) it agrees well with an optimum , the difference always being below 30% of the latter (typically lower than 15%), and the difference in associated delay of only a few percentage points. For example, Fig. 9 shows the delay of SCL gate versus assuming pF, A, under conditions explained in the previous design examples. In this case, (37) provides , while the exact value is 0.17, leading to about the same delay of 11.4 ns.
Once factor is optimized, further optimization is possible by properly setting the normalized transistor channel width, , to minimize (34) after substituting and . Intuitively, the contribution of the load capacitance to delay (34) can be reduced by increasing , that in turn determines an increase in the terms associated with and . Hence, an optimum value of exists, and can be found by differentiating (34) for to obtain (38) shown at the bottom of the page. Setting (38) to zero and solving for leads to its optimum value, , that minimizes delay as shown in (39) at the bottom of the page.
In the cases of interest, (39) exhibits a lower than 25% error with respect to numerical evaluation, and the delay is within 9% of numerically minimized results due to the flat minimum, as can be noted in Fig. 10 , that shows the delay of an SCL gate versus assuming pF, A, under conditions explained in the previous design examples. Equation (39) gives , while the exact value is 63, and the associate delay values are 2.65 and 2.74 ns, respectively, differing by 3.3%. It is worth noting that the optimization of the buffer transistors' size has led to a delay reduction by a factor of four, compared to the case with the minimum devices dealth with above in this subsection.
In general, (39) provides values of the order of several tens, which are not always acceptable due to the area increase. However, the minimum of with respect to is very flat, as shown in Fig. 10 , therefore can be reduced without a significant delay increase. Typically, reducing by a factor of two leads to a delay increase of 10%, whilst with a factor of four the delay increase is about 30%. To accurately estimate the delay increase for a given reduction of with respect to (39), it is preferable to resort to (34) . (38) (39) 
VI. CONCLUSION
In this paper, a strategy for the design of SCL gates has been proposed. The strategy is based on simple analytical models for the delay and NM of SCL gates, both with and without an output buffer. The models have been validated by extensive Spectre simulations using a 0.35-m CMOS process, and the results confirm that the model is accurate enough for design purposes.
Starting from model equations, closed-form design equations were derived both for the transistor aspect ratios and bias current to meet design goals concerning NM and delay. To this end, the transistor aspect ratios were first expressed as a function of bias current for an assigned value of NM, according to the practical criteria introduced. Successively, the bias current was sized in practical design conditions, assuming that either a high-performance or a power-efficient design is required. The design criteria developed have also been extended to the case of SCL gates with an output buffer, providing guidelines for optimally distributing the available bias current per gate among the internal stage and the buffer, as well as for sizing the transistor buffer aspect ratio.
Due to the static power dissipation of SCL gates, the powerdelay tradeoff (measured by the PDP) was analytically evaluated, with the resulting design equations being quite simple, thus providing the designer with the necessary understanding of the mutual dependence between delay and power consumption. Therefore, the criteria proposed allow the designer to consciously manage tradeoff from the early design phases, avoiding tedious simulation iterations.
APPENDIX I
According to the BSIM3v3 MOSFET model, the expression of the drain current valid for both NMOS and PMOS transistors working in the linear region is [26] (A1.1) where models source-drain parasitic resistances, Taylor expansion has been applied assuming [27] , and parameter is given by (A1.2) whose parameters are reported in the following with the subscript if referred to a PMOS transistor, and subscript in the case of an NMOS transistor. In (A1.2), parameters and are the effective channel width and length, is the oxide capacitance per area, is the threshold voltage, and are the gate-source and drain-source voltages, and is the effective carrier mobility defined as shown in (A1.3a) at the bottom of the page, where is the critical electric field at which carrier velocity becomes saturated, , , and are model parameters, and is oxide thickness. It is worth noting that terms including model the mobility degradation due to the vertical electric field in the MOS transistor, while those including model the carrier velocity saturation due to the lateral electric (A1.3a) field. Moreover, in (A1.2) parameter is slightly greater than unity and is given by (A1. 4) which depends on , and various other BSIM3v3 model parameters. Function can be simplified by considering its maximum value, , setting to its minimum value and maximizing the resulting function with respect to , with straightforward calculations (as an example, for the PMOS transistor in a 0.35-m CMOS process and V we get ). It is worth noting that with the gate-source voltage to be small enough to neglect mobility degradation such as for M1 and M2, (A1.3a) simplifies into (A1.3b) whose value depends on the model parameter , the effective channel length and the bias value of .
To evaluate the PMOS equivalent resistance defined as , consider (A1.2) and observing that is small for the PMOS transistors, neglect terms and both in (A1.2) and (A1. The NM is defined as (or, equivalently, as , due to its symmetry with respect to zero), where and are the input voltage values such that , while and are the corresponding output voltages (i.e., and ), as shown in Fig. 11 
