A novel high-performance time-balanced wide fan-in CMOS circuit  by Sharroush, Sherif M.
Alexandria Engineering Journal (2016) 55, 2565–2582HO ST E D  BY
Alexandria University
Alexandria Engineering Journal
www.elsevier.com/locate/aej
www.sciencedirect.comORIGINAL ARTICLEA novel high-performance time-balanced wide
fan-in CMOS circuitE-mail address: smsharroush@gmail.com
Peer review under responsibility of Faculty of Engineering, Alexandria
University.
http://dx.doi.org/10.1016/j.aej.2016.06.013
1110-0168  2016 Faculty of Engineering, Alexandria University. Production and hosting by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).Sherif M. SharroushDept. of Electrical Engineering, Fac. of Engineering, Port Said, Port Said Univ., EgyptReceived 15 November 2015; revised 23 March 2016; accepted 4 June 2016
Available online 12 July 2016KEYWORDS
Area;
CMOS technology;
Power consumption;
Process variations;
Time delayAbstract There is no doubt that static complementary CMOS logic is one of the most dominant
logic-circuit families available. However, CMOS circuits with wide fan-in suffer from a relatively
poor performance that is apparent in increased area, large time delay, and large power consump-
tion. This is typically the case with CMOS circuits containing NMOS or PMOS stacks (i.e. branches
containing a relatively large number of serially connected transistors). In this paper, a novel circuit
that depends on applying the input signals in the form of pulses with a certain width will be pre-
sented as an alternative to stack circuits. The proposed scheme will be investigated quantitatively
with the effect of the pulse width on the performance of the proposed scheme taken into account.
The proposed scheme will be compared with the conventional CMOS logic from the points of view
of area, high-to-low propagation delay, and average power consumption. The parameter variations
and second-order effects will also be taken into account. Simulation results verify the correct oper-
ation of the proposed scheme and that the percentage reduction in the average propagation delay is
15.8% and 61.25% in cases of four and eight inputs, respectively, adopting the 45 nm CMOS tech-
nology with VDD = 1 V.
 2016 Faculty of Engineering, Alexandria University. Production and hosting by Elsevier B.V. This is an
open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).1. Introduction
There are various and well known circuit families that can be
used in realizing logic gates: the static complementary CMOS,
the pass-transistor logic, the dynamic MOS logic, the dynamic-
cascode voltage swing logic, the pseudo-NMOS logic, and the
current-mode logic [1]. Among the various CMOS logic-circuit
families, the static complementary CMOS proves the most
important and robust one in realizing logic gates. However,
its performance, like the other families, degrades with increas-ing the number of the inputs [2]. The word ‘‘performance” here
includes the area, the low-to-high and the high-to-low propa-
gation delays, the power consumption, and the noise margin.
So, ingenious circuit techniques must be adopted in order to
improve the performance of such circuits or alternative circuits
can instead be used.
Specifically, CMOS circuits such as NAND or NOR gates
with wide fan-in contain branches with a large number of seri-
ally connected NMOS or PMOS transistors, respectively. As a
result, their performance degrades. In this paper, a novel circuit
that is based on applying the input signals in the form of pulses
with a suitable width will be proposed as a wide fan-in NAND
gate, hence the name ‘‘time balanced.” The proposed circuit has
a smaller area, time delay, and power-delay product compared
2566 S.M. Sharroushto the conventional CMOS logic when the number of the inputs
exceeds a certain limit to be determined in this paper.
The remainder of this paper is organized as follows: Sec-
tion 2 provides a quick survey of the previous solutions to
the problem at hand (the degradation of the performance of
the logic families with increasing the fan-in). The proposed
solution is presented qualitatively in Section 3 with the quan-
titative analysis presented in Section 4. The impacts of
second-order effects and process variations on the proposed
scheme are discussed in Sections 5 and 6, respectively. The pro-
posed scheme is verified by simulation in Section 7. Finally, the
paper is concluded in Section 8.
2. Previous work
The previous work related to the problem at hand can be classi-
fied into three fronts: reordering, resizing, and synthesis.
Reordering techniques can be achieved on the input and the
transistor levels in order to reduce the power consumption in
CMOS circuits. The reordering of the inputs does not modify
the circuit schematic of the gate; however, it merely changes
the order of the inputs. On the other hand, the reordering of
the transistors modifies both the order of the inputs and the
order at which the transistors are serially connected. Lowering
the power consumption by reordering schemes, however, is usu-
ally associated with a delay penalty. This is because reordering
usually causes movement of the inputs that arrive lately farther
away from the output of the gate, thus resulting in an increase in
the delay. In [3], Prasad and Roy proposed an algorithm for
reordering the multi-pass transistors. In [4], an algorithm that
includes transitions at the internal nodes of a complex CMOS
gate to derive the optimal configuration was presented.
On the other side, some forms of transistor resizing tech-
niques depend on resizing the transistors in the gate such that
minimum power consumption is achieved, however, with no
increase in the delay. These techniques depend on evaluating
the delay in the several paths of the circuit and determining
those paths with delays that are lower than that in the critical
path (i.e. paths with a positive slack), then reducing the sizes of
the transistors in these paths. The process repeats until either
the slack becomes equal to zero or the transistors reach the
minimum possible size.
Added to the resizing techniques is the following: The low-
ermost transistor is fabricated with the largest size with the
aspect ratios of the upper NMOS transistors decreasing as
we move away from bottom to top. Several sizing schemes
including the linear and exponential sizing or a combination
of the two [5] can be adopted. The effect of sizing on the per-
formance of CMOS circuits was investigated in [6–11]. A com-
bination of both the input reordering and the transistor
resizing approaches was presented by Tan in [12]. Finally,
some synthesis techniques that depend on creating novel cir-
cuits that have the same output as the conventional stack
but with improved performance were presented in [13,14].
The scheme presented in this paper lies in the third category.
In the next section, the proposed scheme will be presented.
3. The proposed scheme
Refer to Fig. 1(a) for illustration of the proposed n-input
NAND gate. First, the dis signal will be activated to turn onMN and thus discharge any remnant charge on CL. Then,
the dis signal will be deactivated and the inputs, A1, A2, . . .,
An, will be applied, thus turning on the related NMOS transis-
tors, MN1, MN2, . . ., and MNn. Assume that the input signals
are in the form of pulses with a certain width, T. The charging
current of CL depends on the number of the activated inputs
and the level to which CL settles as is determined by both
the number of the activated inputs and the pulse width, T.
The main idea is simply as follows: T is chosen such that the
voltage across CL in case of n  1 activated inputs, Vn1, will
be smaller than the threshold voltage of the inverter, Vthinv,
and thus the inverter output will be at logic ‘‘1.” On the other
hand, if all the n inputs are activated, then the voltage across
CL, Vn, will be larger than Vthinv and thus the inverter output
will be at logic ‘‘0.” Of course, if the number of the activated
inputs is smaller than n  1, the voltage across CL, VCL, will
be smaller than Vthinv and thus the inverter output will be at
logic ‘‘1” as it must be. A buffer consisting of two cascaded
inverters can be used at the output to obtain a full-swing out-
put. Note that if the power-supply voltage feeding the circuit
was applied as a pulse with a certain width and the NMOS
transistors were kept always activated, the capacitor, CL,
would discharge to ground upon turning off this pulse through
the activated NMOS transistors.
Alternatively, a sense amplifier can be used to latch the out-
put data by comparing the voltage across CL by a reference
voltage, Vref, which is ideally the arithmetic average of Vn
and Vn1, i.e.
Vref ¼ Vn þ Vn1
2
: ð1Þ
Refer to Fig. 1(b) for the sense-amplifier based scheme. It
must be noted that the reference voltage, Vref, must be acti-
vated after the deactivation of the input signals. This is to
ensure obtaining a correct output. If it were not for this syn-
chronization, the output will be at logic ‘‘0” in all cases
because initially the voltage across CL is smaller than Vref. If
the number of the inputs is large such that the worst-case volt-
age difference across CL, Vn  Vn1, is smaller than the accept-
able limit for reliable operation, then the scheme of Fig. 1(a)
can be extended to any number of inputs using an OR gate
as shown in Fig. 1(c). The OR gate can be implemented using
a static CMOS NOR gate and an inverter. According to the
circuit of Fig. 1(c), there is no need to use a buffer at the out-
put of each n/2 inputs’ stage as the static CMOS inverter after
the two-input NOR gate provides a rail-to-rail output swing.
The corresponding alternatives to PMOS stacks are shown in
Fig. 2(a), (b), and (c), respectively.4. Circuit design issues
In this section, the first version of the proposed scheme shown
in Fig. 1(a) will be investigated quantitatively assuming that
two cascaded inverters are added at the output to obtain a
rail-to-rail swing. Obviously, the robustness of the proposed
scheme depends to a large extent on the difference between
the two values of the voltage, VCL, in cases of all-activated
and all-except-one activated inputs. This difference represents
the smallest difference and thus this analysis represents the
worst-case scenario. This difference also represents the valid
range within which the threshold voltage of the first inverter
High-performance time-balanced wide fan-in CMOS circuit 2567can be chosen. An expression for this range will be derived. An
expression for the optimum value of the pulse width, Topt, (at
which the voltage difference is maximum) will also be derivedDD
An
V
C L
V
MNn
(a
nA 2A
V
C L MN
DDV
NnM
(b
(c
2n/2
LC MN1
A
VDD
A 1
dis
A
N2MC L
A
DDV
dis
An n/2 + 2 n/2 + 1A
Figure 1 (a and b) The two proposed alternatives to NMOS stacks.
number of inputs using an OR gate.along with the corresponding maximum value of the voltage
difference. In addition, the allowable range of T will also be
discussed. The proposed scheme will be compared with theA A2 1
dis Cout
outV
MN
N1MMN2
) 
A1
dis
VDD
Vout
Vref
M
N1N2
M
) 
) 
Cout
out1V
outC
Vout2
outV
(c) A circuit represents the extension of the scheme in (a) for any
2568 S.M. Sharroushconventional CMOS logic from the points of view of area,
propagation delay, power consumption, and power-delay
product.An A2
LC M
V
DDV
PnM
(a
C L MN
An
V
A2
VDD
PnM
(b
(c
C disML N2
C
An
An/2
disN1ML
DDV
AA2
DDV
1
n/2 + 2A n/2 +A
Figure 2 (a and b) The two proposed alternatives to PMOS stacks.
number of inputs using a NOR gate.In the following analysis, iD, vGS, vDS, and vBS represent the
drain current, the gate-to-source voltage, the drain-to-source
voltage, and the body-to-source voltage, respectively. TheN
1A
dis
M
P1P2
M
V
Cout
out
) 
dis
Vout
refV
VDD
1A
P1
MP2M
) 
) 
outV
 1
(c) A circuit represents the extension of the scheme in (a) for any
High-performance time-balanced wide fan-in CMOS circuit 2569small letters represent the variables as functions of time while
the capital ones represent certain values for these variables.
Capital subscripts are used in the two cases. Unless otherwise
specified, the following values will be adopted [15,16]: W
(channel width) = L (channel length) = 45 nm, n (number of
inputs) = 8, VDD (power-supply voltage) = 1 V, Vthn0 (thresh-
old voltage of NMOS transistors at VBS = VDS = 0 V)
= 0.25 V, Vthp0 (threshold voltage of PMOS transistors at
VSB = VSD = 0 V) = 0.32 V, kn0 (process-
transconductance parameter of NMOS devices) = 638 lA/
V2, kp
0 (process-transconductance parameter of PMOS
devices) = 249 lA/V2, c (body-effect coefficient) = 0.4, kn
(channel-length modulation effect parameter of NMOS
devices) = kp (channel-length modulation effect parameter of
PMOS devices) = 0.1 V1, a (a factor representing the short-
channel effects) = 1.3, as (switching activity) = 1, Cox (gate-
oxide capacitance per unit area) = 0.0172 F/m2, T (width of
input pulses) = 15 ps, vsat (free-electron saturation velocity)
= 105 m/s [17], and fs (frequency of switching) = 1 GHz.
Adopting the convention that the PMOS transistor has
twice the area of the NMOS one to compensate for the mobil-
ity difference and assuming that the parasitic capacitance at
each terminal is proportional to the aspect ratio of the associ-
ated transistors [18], then CL can be expressed as (4 + n)C,
where C is the parasitic capacitance associated with each termi-
nal of a minimum-sized device and will be taken equal to 1 fF.
In the following analysis, all the NMOS devices will be
assumed minimum-sized while the PMOS ones have an aspect
ratio equal to 2 unless otherwise specified.
4.1. Allowable range of Vthinv
In the following analysis, the short-channel MOSFET model
will be adopted. According to this model, the iD  vGS  vDS
relationship in the saturation region is [17]
iD ¼ WCoxvsatðvGS  VthnÞð1þ kvDSÞ: ð2Þ
The saturation region according to this model occurs as long
as vDSP vDSsat where
vDSsat ¼ ð1 kÞðvGS  VthnÞ: ð3Þ
where k is a parameter that models the velocity-saturation
effect. The value of k depends on the MOS technology and0 2 4 60
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
The numbe
Th
e 
w
or
st
-c
as
e 
vo
lta
ge
 d
iff
er
en
ce
 in
 V
ol
ts
.
The voltage difference v
Figure 3 The worst-case voltagincreases with the overdrive voltage, though it may be
regarded as constant under certain conditions [19]. For deep-
submicron devices, k varies between 0 and 1 [20]. The
threshold-voltage variation with the body effect will be
approximated by the following relationship [18]:
Vthn ¼ Vthn0  cVBS; ð4Þ
where Vthn0 is the threshold voltage for zero source-to-body
voltage and c is the linearized body-effect coefficient (assuming
that the source-to-body voltage of the transistors are small
such that this effect can be linearized [18]). Taking into
account that the body terminals of the NMOS transistors
are connected to the most-negative terminal which is 0 V
results in
Vthn ¼ Vthn0 þ cVS: ð5Þ
Neglecting the channel-length modulation effect results in
the following equation for the voltage, VCL, across the para-
sitic capacitance, CL, in case of n activated inputs:
nWCoxvsatðVDD  Vthn  VCLÞ ¼ CL dVCL
dt
: ð6Þ
Substituting for Vthn from Eq. (5) into Eq. (6) results in
nWCoxvsat½VDD  Vthn0  ð1þ cÞVCL ¼ CL dVCL
dt
: ð7Þ
Solving this equation by separation of variables and taking
into account that CL was treated here as initially discharged
results in
VCLðtÞ ¼ ðVDD  Vthn0Þð1þ cÞ 1 e
ð1þcÞnWCoxvsatt
CL
h i
: ð8Þ
Assuming that the width of the input pulses is T, then in
cases of all-activated inputs and all-except-one activated
inputs, the two voltages across CL (at t= T), Vn and Vn1, will
be
Vn ¼ ðVDD  Vthn0Þð1þ cÞ 1 e
ð1þcÞnWCoxvsatT
CL
h i
ð9Þ
and
Vn1 ¼ ðVDD  Vthn0Þð1þ cÞ 1 e
ð1þcÞðn1ÞWCoxvsatT
CL
h i
; ð10Þ
respectively. The difference between these two voltages is8 10 12 14 16
r of inputs, n.
ersus the number of inputs.
e difference at CL versus n.
2570 S.M. SharroushDV ¼ Vn  Vn1
¼ ðVDD  Vthn0Þð1þ cÞ e
ð1þcÞðn1ÞWCoxvsatT
CL  eð1þcÞnWCoxvsatTCL
h i
: ð11Þ
Refer to Fig. 3 for the plot of the worst-case voltage differ-
ence across CL versus the number of the inputs which shows aFigure 6 The relationship between the volta
0 2 4 61
1.5
2
2.5
3
3.5
4 x 10
-11
The numbe
Th
e 
op
tim
um
 p
ul
se
 w
id
th
 in
 S
ec
on
ds
.
The optimum pulse width 
Figure 4 The optimum
0 2 4 60
0.02
0.04
0.06
0.08
0.1
0.12
0.14
The numbe
Th
e 
op
tim
um
 v
ol
ta
ge
 d
iff
er
en
ce
 in
 V
ol
ts
.
The optimum voltage differe
Figure 5 The optimum vomonotonic decrease as expected. This voltage difference repre-
sents the valid range within which the threshold voltage of the
first inverter can be chosen. So, the larger this voltage differ-
ence, the more robust the scheme will be to the effect of the
process variations. To determine the value of T at which DV
is maximum, let it be Topt, simply differentiate DV with respectge difference, DV, and the pulse width, T.
8 10 12 14 16
r of inputs, n.
versus the number of inputs.
pulse width versus n.
8 10 12 14 16
r of inputs, n.
nce versus the number of inputs.
ltage difference versus n.
High-performance time-balanced wide fan-in CMOS circuit 2571to T and equate the first derivative to zero. So,
dðDVÞ
dt
¼ d
dt
ðVDD  Vthn0Þ
ð1þ cÞ e
ð1þcÞðn1ÞWCoxvsatT
CL  eð1þcÞnWCoxvsatTCL
h i 
¼ 0;
ð12Þ
from which we obtain
Topt ¼ CL ln½n=ðn 1Þ
WCoxvsatð1þ cÞ : ð13Þ
Refer to Fig. 4 for the plot of Topt versus n. As shown, Topt
decreases with increasing n as expected. The maximum value of
DV, let it be (DV)max, can be determined by substituting for
Topt from Eq. (13) into Eq. (11). So,
ðDVÞmax ¼
ðVDD  Vthn0Þ
ð1þ cÞðn 1Þ
n 1
n
 n
: ð14Þ
Note that the maximum voltage difference does not depend
on CL. Refer to Fig. 5 for the plot of (DV)max versus n which
shows a monotonic decrease.0 0.1 0.2 0.3 0.4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time i
Th
e 
tw
o 
vo
lta
ge
s a
cr
os
s C
L,
 V
n 
an
d 
V
n-
1,
 in
 V
ol
ts
. Vn and Vn
Figure 7 The two voltages across
0 0.1 0.2 0.3 0.40
0.005
0.01
0.015
0.02
0.025
0.03
Time 
Th
e 
vo
lta
ge
 d
iff
er
en
ce
 a
cr
os
s C
L,
 V
n 
- V
n-
1,
 in
 V
ol
ts
.
The voltage difference acr
Figure 8 The voltage difference ac4.2. Allowable range of T
The plot of DV= Vn  Vn1 versus T is shown in Fig. 6. The
optimum behavior of DV versus T can be expected from the
plots of Vn and Vn1 along with the plot of DV versus time
as shown in Figs. 7 and 8, respectively. In fact, the NMOS
transistors charging CL act as a variable resistor whose resis-
tance depends on the number of the activated inputs. In case
of all activated inputs, this resistance is smaller than that in
case of all-except-one activated inputs. So, CL charges faster
in the first case but in the two cases, the steady-state voltages
reach the same value which is independent of the number of
the activated inputs. Hence, the voltage difference, DV,
approaches 0 V at steady state as shown in Fig. 8.
Assuming that the minimum acceptable value for DV is
(DV)min such that the scheme still operates satisfactorily in
spite of the effect of the process variations. Substituting by
DV into Eq. (11) by (DV)min results in
ðDVÞmin ¼
ðVDD  Vthn0Þ
ð1þ cÞ e
ð1þcÞðn1ÞWCoxvsatT
CL  eð1þcÞnWCoxvsatTCL
h i
;
ð15Þ0.5 0.6 0.7 0.8 0.9 1
x 10-11n Seconds.
-1 versus time.
Vn
Vn-1
CL, Vn and Vn1, versus time.
0.5 0.6 0.7 0.8 0.9 1
x 10-11in Seconds.
oss CL, Vn - Vn-1, versus time.
ross CL, Vn - Vn1, versus time.
0.005 0.01 0.015 0.02 0.025
0
1
2
3
4
5
6 x 10
-11
The smallest voltage difference across CL, vn - vn-1, in Volts.
Th
e 
al
lo
w
ab
le
 r
an
ge
 o
f t
he
   
   
pu
lse
 w
id
th
, T
ra
ng
e,
 in
 S
ec
on
ds
.
The allowable range of the pulse width, Trange, versus 
 the smallest voltage difference across CL, vn - vn-1. 
Figure 9 The allowable range for the pulse width versus the smallest voltage difference across CL for n= 8.
2
A
2n 2
A
VDD
A
2 1
2572 S.M. Sharroushwhich is a transcendental equation of T. So, T cannot be found
explicitly in terms of (DV)min. However, as obvious from
Fig. 6, as (DV)min decreases, the allowable range of T, Trange,
increases. The relationship between (DV)min and Trange is
shown in Fig. 9 for n= 8 (which is obtained point by point).
4.3. The logic swing of the first inverter
From the qualitative discussion of the proposed scheme in Sec-
tion 3, it is obvious that the best robustness to process varia-
tions is achieved when the threshold voltage of the first
inverter, Vthinv, lies midway between Vn and Vn1, that is,
Vthinv ¼ Vn þ Vn1
2
: ð16Þ
Now, the logic swing at the output of the first inverter, LS,
can be determined from the voltage-transfer characteristics
(VTC) of this inverter which is shown qualitatively in Fig. 10.
It is obvious that the logic swing at the inverter output can
be maximized by increasing the slope of the VTC in the tran-
sition region. The slope of the VTC in the transition region is
given by [2]
slope ¼ ðgmN þ gmPÞðroN==roPÞ;
where gmN and gmP represent the transconductances of the
constituting NMOS and PMOS transistors of the first inverterVout
Vin
V
VDD
Vnn-1V Vthinv
thinvVLS
with slope = 1
(Vout in= V ) line
Figure 10 The voltage-transfer characteristics (VTC) of the first
inverter. LS represents the logic swing at the inverter output.and roN and roP represent the output resistances of the consti-
tuting NMOS and PMOS transistors, respectively. The logic
swing at the output of the first inverter, LS, is given by
LS ¼ ðgmN þ gmPÞðroN==roPÞDV: ð17Þ
In order to increase LS, the two output resistances, roN and
roP, must be increased which can be achieved by increasing the
channel lengths of both transistors. In order to obtain the best
performance, the channel width must also be increased which
translates to a larger area. The parasitic capacitances associ-
ated with these transistors increase with increasing their
dimensions. This is obviously a tradeoff between the robust-
ness of the proposed scheme on one side and the area and
speed on the other.
4.4. Area considerations
In comparing the areas of the proposed schemewith the conven-
tional CMOS n-input NAND gate shown in Fig. 11, we will
adopt the convention that the area of a certain transistor is equalAn n
n1A
A2 n
Vout
Figure 11 The conventional CMOS n-input NAND gate with
the sizing illustrated.
An
A
1A
A2
An
C
A2 1
DDV
3n
2n C
2n C
2n C
222
n
n
n
Coutc =
Figure 13 The circuit diagram of the conventional CMOS n-
input NAND gate with the sizing and the internal capacitances
affecting the propagation delay illustrated.
High-performance time-balanced wide fan-in CMOS circuit 2573to its channel area [2]. Adopting the convention that the PMOS
transistor has twice the area of the NMOS one to compensate
for the mobility difference and adopting the conventional sizing
strategy of increasing the aspect ratio of the transistors in the
stack with n transistors by n in order to compensate for the delay
increase [2], then the areas of the conventional and proposed
schemes, Ac and Ap, can be approximated by
Ac ¼ ðn2 þ 2nÞWL; ð18Þ
and
Ap ¼ ð10þ nÞWL; ð19Þ
respectively. Refer to Fig. 12 for the plots of Ac and Ap versus
n for W= L= 45 nm. It can be concluded from this rough
estimation of the area that the proposed scheme has an area
advantage when n exceeds 3.
4.5. High-to-low propagation delay
We will in this subsection compare the high-to-low propaga-
tion delays of the conventional and the proposed schemes,
tPHLc and tPHLp, respectively. tPHLc was derived in [21] for
the case of n+ 1 series-connected NMOS transistors and
was found to be
tPHLc ¼ Coutc½Vthn þ aðVDD  VthnÞ½1þWCoxRtota lnð1þ cÞvsat
WCoxðVDD  Vthn0Þvsat
þ 2:3ðnþ 1ÞR1Coutc
ð20Þ
where Rtota ln is given by
Rtota ln ¼ n
k0n
W
L
 
VDD
2
 Vthn
  ; ð21Þ
and R is the equivalent resistance of each of the NMOS tran-
sistors. Modifying the last two equations to be valid for the
case of n series-connected NMOS transistors results in the
high-to-low propagation delay of the conventional stack with
n inputs being given by
tPHLc ¼ Coutc½Vthn þ aðVDD  VthnÞ½1þWCoxRtota ln1ð1þ cÞvsat
WCoxðVDD  VthnÞvsat
þ 2:3nRCoutc;
ð22Þ0 2 4 60
1
2
3
4
5
6
7 x 10
-13
The numbe
Th
e 
ev
al
ua
te
d 
ar
ea
s o
f t
he
   
   
 
co
nv
en
tio
na
l a
nd
 p
ro
po
se
d 
sc
he
m
es
 in
 m
2.
The evaluated areas of the convent
Figure 12 The relationship bewhere Rtota ln1 is the equivalent resistance of the n  1 lower-
most transistors in the stack and is given by
Rtota ln1 ¼ n 1
k0n
W
L
 
VDD
2
 Vthn
  : ð23Þ
In Eq. (20), Coutc is the parasitic capacitance at the output
node of the conventional stack. When adopting the previously
described convention for evaluating the parasitic capacitances
and the conventional sizing strategy of multiplying the aspect
ratio of the series-connected transistors by their number in
order to get the same performance as the inverter, we get
Coutc = 3n fF (refer to Fig. 13).
For the scheme of Fig. 1(a), tPHLp contains four subcompo-
nents; the time required to charge CL, tPHLp1 = T, the high-to-
low propagation delay of the first inverter, tPHLp2, the low-to-
high propagation delay of the second inverter, tPHLp3, and the
high-to-low propagation delay of the third inverter, tPHLp4. We
have neglected the time required to initially discharge CL from
Vn to 0 V, which is very small compared to the other8 10 12 14 16
r of inputs, n.
ional and proposed schemes versus n.
Ac
Ap
tween Ac and Ap versus n.
2574 S.M. Sharroushsubcomponents (we will return to this point in Section 7). The
high-to-low and the low-to-high propagation delays of the
inverter can be written as [22]
tPHL ¼ 2Coutp
k0n
W
L
 
n
ðVDDVthnÞ
Vthn
VDDVthnþ
1
2
ln
3VDD4Vthn
VDD
  
ð24Þ
and
tPLH ¼ 2Coutp
k0p
W
L
 
p
ðVDDjVthpjÞ
jVthpj
VDDjVthpjþ
1
2
ln
3VDD4jVthpj
VDD
  
; ð25Þ
respectively, where Coutp is the parasitic capacitance at the
inverter output. So, the high-to-low propagation delay of the
proposed scheme is
tPHLp ¼Tþ 2Coutp1
k0n
W
L
 
n
ðVDDVthnÞ
Vthn
VDDVthnþ
1
2
ln
3VDD4Vthn
VDD
  
þ 2Coutp2
k0p
W
L
 
p
ðVDDjVthpjÞ
jVthpj
VDDjVthpjþ
1
2
ln
3VDD4jVthpj
VDD
  
þ 2Coutp3
k0n
W
L
 
n
ðVDDVthnÞ
Vthn
VDDVthnþ
1
2
ln
3VDD4Vthn
VDD
  
;ð26Þ
where Coutp1, Coutp2, and Coutp3 are the parasitic capacitances at
the outputs of the three inverters in their order. Refer to
Fig. 14 for the plot of tPHLc versus n. The high-to-low propa-
gation delay of the proposed scheme is 80.646 ps which is rel-
atively independent of n. Thus, the proposed scheme is faster
than the conventional stack for all values of n.
4.6. Average power consumption
In this subsection, the average power consumption of the con-
ventional and proposed schemes will be compared for a circuit
with n inputs. For the conventional CMOS n-input NAND
gate, refer to Fig. 13 in which the sizing of each transistor
and the parasitic capacitances at each node are shown.
In our estimation, the short-circuit and leakage power con-
sumption components will be neglected for the conventional
stack. So, the only component that will be taken into account
is the dynamic-switching power consumption associated with
charging the parasitic capacitances indicated to VDD. The total
dynamic-switching power consumption of an IC is given by
[23]0 2 4 60
0.5
1
1.5
2
2.5 x 10
-8
The numb
T
he
 h
ig
h-
to
-lo
w
 p
ro
pa
ga
tio
n 
de
la
y 
of
 th
e 
co
nv
en
tio
na
l s
ta
ck
 in
 S
ec
on
ds
.
The high-to-low propagation delay of th
Figure 14 The plot of the high-to-low propagation delayPswitching ¼ fsVDD
XP
i¼1
asiCLiVswingi ð27Þ
where P is the total number of nodes within a CMOS circuit,
CLi is the equivalent parasitic capacitance of the ith node, asi is
the switching activity of the ith node, and Vswingi is the voltage
swing on the ith node assuming that the power supply provid-
ing the charge is VDD. Now, for the stack of Fig. 13, Coutc =
3nC charges in all the input combinations except the one in
which all the inputs are activated which corresponds to all
off PMOS devices. So, there are 2n  1 input combinations
that Coutc will charge in. For the upper parasitic capacitance
with value 2nC, it charges for all the input combinations which
have A1 equal to 1 except the case of all-activated inputs in
which all the PMOS devices are off. The last number of input
combinations is obviously 2n1  1. The same procedure can
be applied for the lower parasitic capacitances with value
2nC including the lowermost one which charges only when
all the input combinations are 1 except the lowermost one,
An. The latter case occurs in only one input combination.
Combining all these terms and dividing by the number of the
input combinations, 2n, results in the average power consump-
tion of the conventional stack being given by
Pavgc ¼ asfsV
2
DD
2n
½3nCð2n  1Þ þ 2nCð2n1  1Þ þ 2nCð2n2  1Þ
þ . . . þ 2nCð2nðn2Þ  1Þ þ 2nCð2nðn1Þ  1Þ
Pavgc ¼ asfsV
2
DD
2n
½3nCð2n  1Þ þ 2nC½2n1 þ 2n2 þ . . . þ 22 þ 21
 ðn 1Þ
) Pavgc ¼ asfsV
2
DD
2n
½3nCð2n  1Þ þ 2nCð2n  n 1Þ: ð28Þ
Now, for the proposed scheme of Fig. 1(a), the power con-
sumption includes the dynamic-switching power consumption
associated with charging CL and the parasitic capacitances at
the outputs of the three inverters, the dc power consumption
of the first inverter, and the short-circuit power consumption
associated with the three inverters. The voltage across CL in
case of n activated inputs is given by the following equation:8 10 12 14 16
er of inputs, n.
e conventional stack versus the number of inputs.
of the conventional stack versus the number of inputs.
High-performance time-balanced wide fan-in CMOS circuit 2575Vn ¼ ðVDD  Vthn0Þð1þ cÞ 1 e
ð1þcÞnWCoxvsatT
CL
h i
: ð29Þ
In case of only one activated input, the voltage across CL
charges to V1 which can be found by substituting n by 1 into
Eq. (29). So, the associated switching power consumption is
Pp1 ¼ asfsCLVDDV1:
The case of only one activated input occurs for n input
combinations. Stated another way, the case of only one acti-
vated input occurs for a number of input combinations equal
to the combination of n taken one at a time which is nC1. Sim-
ilarly, when there are only two activated inputs, CL charges to
V2 with an associated switching power consumption given by
Pp2 ¼ asfsCLVDDV2: ð30Þ
The case of any two activated inputs occurs for a number of
input combinations equal to nC2 which is the combination of n
taken 2 at a time as the order of the activated inputs is not
important [24]. nCm is given by [24] (where n is a positive inte-
ger and m is a nonnegative integer)
nCm ¼ n!
m!ðnmÞ! : ð31Þ
Repeating this procedure with the other voltages results in
the average switching-power consumption of the proposed
scheme being given by
PCL ¼ asfsCLVDD
2n
½ð0ÞnC0 þ ðV1ÞnC1 þ ðV2ÞnC2 þ . . .
þ ðVn1ÞnCn1 þ ðVnÞnCn: ð32Þ
The first and last two terms correspond to the two cases of
no and all-activated inputs, respectively. The average
switching-power consumption associated with the three invert-
ers is
Pp3 ¼ asfsV2DDðCoutp1 þ Coutp2 þ Coutp3Þ: ð33Þ
Note that Eq. (33) overestimates the switching-power con-
sumption of the first inverter as its output is not rail-to-rail
swing. Concerning the dc power consumption of the first inver-
ter, PDC, it does not flow for all of the input combinations;
rather, it flows when both the NMOS and PMOS devices of
the first inverter become activated. However, we will simplify1 2 3 40
1
2
3
4
5
6 x 10
-5
The numbe
Th
e 
av
er
ag
e 
po
w
er
 c
on
su
m
pt
io
n 
of
 th
e 
co
nv
en
tio
na
l
 a
nd
 p
ro
po
se
d 
sc
he
m
es
 in
 W
at
ts
.
The average power consum
 proposed schemes ver
Figure 15 The plots of Pits estimation and assume that it flows in all the input combi-
nations. Also, a value of VDD/2 is assumed at its input. If this
inverter is assumed to be matched, then its output will also be
at VDD/2. For these two reasons, the value of the dc power
consumption is overestimated. PDC can be written as
PDC ¼ VDD
2
k0n
W
L
 
n
VDD
2
 Vthn
 2
1þ kVDD
2
 
; ð34Þ
where the NMOS transistor of the first inverter is certainly in
saturation as its vGS and vDS are both equal to VDD/2. Now,
the short-circuit power consumption of an inverter is given
by [1]
Psc ¼ asKsfsðVDD  2VthnÞ
3
12
; ð35Þ
assuming a matched inverter where s is the rise or fall time
(assuming that they are equal) of the input waveform, K is
the device-transconductance parameter. Multiplying Psc of
Eq. (35) by 3 to account for the short-circuit power consump-
tion of the three inverters and adding the result to that in Eqs.
(32)–(34) result in the average power consumption of the pro-
posed scheme being equal to
Pavgp ¼ asfsCLVDD
2n
½ð0ÞnC0 þ ðV1ÞnC1 þ ðV2ÞnC2 þ . . .
þ ðVn1ÞnCn1 þ ðVnÞnCn þ asfsV2DDðCoutp1
þ Coutp2 þ Coutp3Þ þ asKsfsðVDD  2VthnÞ
3
4
þ VDD
2
k0n
W
L
 
n
VDD
2
 Vthn
 2
ð1þ kVDD
2
Þ ð36Þ
An important note is in order here. Due to the need to
reduce the dynamic-switching power consumption, VDD
reduces. However, in order not to degrade the performance,
Vthn also reduces but the ratio, VDD/Vthn, reduces with technol-
ogy scaling [21]. The expected result is that the short-circuit
power consumption reduces with technology scaling. It seems
to be a power advantage for the proposed scheme. Refer to
Fig. 15 for the plots of Pavgc and Pavgp versus n for s= 1 ns.
It is obvious from this figure that Pavgp is larger than Pavgc
for practical values of n. This can be attributed to the static-
dc as well as the short-circuit power consumption. It is also5 6 7 8
r of inputs, n.
ption of the conventional and
sus the number of inputs.
conventional scheme
proposed scheme
avgc and Pavgp versus n.
2576 S.M. Sharroushobvious that Pavgc and Pavgp show a monotonic increase versus
n. Finally, refer to Fig. 16 for the plots of the power-delay
products of the conventional and proposed schemes versus n
which shows the superiority of the proposed scheme when n
exceeds 2.
5. Second-order effects
In this section, the second-order effects will be taken into
account. Among these effects are the channel-length modula-
tion, body-effect, drain-induced barrier lowering (DIBL),
short-channel effects, and narrow-channel effects.
5.1. The channel-length modulation effect
If this effect were taken into account, the drain-current equa-
tion of the MOS transistors in the input paths would be mod-
ified to include the term (1 + knvDS). The result, of course, is to
increase the drain current for the same terminal voltages. How-
ever, the voltage across CL upon charging will be (according to
the simulation results) around 350 mV for four inputs. Using a
power-supply voltage of 1 V results in a drain-to-source volt-
age of 0.65 V. Adopting kn = 0.1 V
1 results in a term of value0 0.1 0.2 0.3 0.40.015
0.02
0.025
0.03
0.035
0.04
The body-e
Th
e 
vo
lta
ge
 d
iff
er
en
ce
 a
cr
os
s C
L,
 v
n 
- v
n-
1,
 in
 V
ol
ts
.
The voltage difference across C
Figure 17 The voltage difference across
1 2
0
0.5
1
1.5
2
2.5 x 10
-14
The numb
Th
e 
po
w
er
-d
el
ay
 p
ro
du
ct
s o
f t
he
 c
on
ve
nt
io
na
l a
nd
 p
ro
po
se
d 
sc
he
m
es
 in
 J
ou
le
s.
The power-delay prod
and proposed schemes 
Figure 16 The plots of the power-delay products ofbetween 1.1 (when CL is initially discharged and thus
VDS = VDD = 1 V) and 1.065 (when CL is charged to 0.35 V
and thus VDS = 0.65 V) multiplied by the adopted drain-
current equation. Thus, this effect can safely be neglected with
no loss of good accuracy.
5.2. The body effect
This effect was already taken into account in the analysis per-
formed in Section 4. However, note that the body-effect coef-
ficient, c, is a fabrication-process parameter and is given by
[22]
c ¼
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
2qNAes
p
Cox
ð37Þ
where q is the electronic charge, NA is the doping concentra-
tion of the p-type substrate, es is the electric permittivity of sil-
icon (1.04  1012 F/cm), and Cox is the gate-oxide
capacitance per unit area. It is shown in [25] that in order
for the MOSFET transistor device to operate properly in spite
of CMOS technology scaling, the doping of the substrate, NA,
must be increased. However, the gate-oxide thickness, tox,
decreases in order to reduce short-channel effects [23] with0.5 0.6 0.7 0.8 0.9 1
ffect coefficient.
L versus the body-effect coefficient.
CL versus the body-effect coefficient.
3 4
er of inputs, n.
ucts of the conventional     
versus the number of inputs.
conventional scheme
proposed scheme
the conventional and proposed schemes versus n.
High-performance time-balanced wide fan-in CMOS circuit 2577the result that the gate-oxide capacitance per unit area, Cox,
increases. The increase in Cox more than compensates for the
increase of NA and the net result is that the body-effect param-
eter, c, decreases with technology scaling. The effect of the
weakening of the body effect on the threshold voltage can be
shown in Fig. 17 for the plot of the voltage difference,
DV= Vn  Vn1, versus c. It is apparent that the voltage dif-
ference increases with decreasing c which seems to be an
advantage gained with technology scaling.
5.3. The drain-induced barrier lowering (DIBL) effect
As the magnitude of the reverse bias voltage across the drain-
to-body pn junction is increased, the depth of the junction
depletion layer increases. A deeper depletion layer around
the drain contributes a larger amount of depletion charge to
the channel. An increased drain-to-body reverse bias voltage,
therefore, enhances the short-channel effects and lowers the
magnitude of the threshold voltage of the MOSFET transistor.
The threshold-voltage degradation caused by an increased or
decreased drain bias voltage of an N-channel or P-channel
MOSFET, respectively, is commonly referred to as drain-
induced barrier-lowering (DIBL) [26]. The threshold-voltage
variation due to the DIBL effect can be expressed as
Vthn ¼ Vthn0  gVDS; ð38Þ
where Vthn0 is the value of the threshold voltage at
VDS = VBS = 0 and g is the DIBL coefficient. If this
expression for Vthn were adopted in the previous analysis, the
voltage, VCL(t), can be expressed as
VCLðtÞ ¼ ½VDDð1þ gÞ  Vthn0ð1þ cþ gÞ 1 e
ð1þcþgÞnWCoxvsat t
CL
h i
: ð39Þ
Refer to Fig. 18 for the relationship between the voltage dif-
ference across CL and g. g is typically on the order of 0.1. For
this value of g, the percentage variation of DV is 5.7%.
5.4. The short-channel effects
As the channel length of a MOSFET is reduced with technol-
ogy scaling, the depletion regions around the source and drain0 0.01 0.02 0.03 0.04
0.0262
0.0264
0.0266
0.0268
0.027
0.0272
0.0274
0.0276
0.0278
0.028
The DIB
Th
e 
vo
lta
ge
 d
iff
er
en
ce
 a
t C
L 
in
 V
ol
ts
.
The voltage difference at C
Figure 18 The voltage difference acrosterminals become closer with the result that the total depth of
the source and drain depletion regions becomes comparable to
the effective channel length in deep-submicrometer devices
[23]. Thus, more charge is contributed to the depletion region
beneath the gate area by the source-to-substrate and the drain-
to-substrate depletion layers in short-channel devices. The
threshold voltage thus lowers with decreasing gate length in
what is known as Vthn-roll-off [27,28]. Refer to Fig. 19 for
the relationship between DV and Vthn0. DV changes by a per-
centage of 6.13% for Vthn0 ranging between 0.2 and 0.25 V.
5.5. The narrow-channel effects
Due to the reduction of the channel width, more gate charge is
required to invert the channel because a larger percentage of
the gate-induced space charge is lost in fringing fields [29]. This
results in monotonically increasing Vthn0 with decreasing the
channel width. It seems that the increase of Vthn0 somewhat
compensates for the reduction associated with decreasing the
channel length.
6. Effect of process variations
In this section, the effect of the process variations on the volt-
age difference at the input of the first inverter will be investi-
gated quantitatively. The effect of the variations of each of
Vthn0 (the threshold voltage of the charging NMOS devices),
W (the channel width of the charging NMOS devices), and T
will be investigated one at a time. Assume that the variation
of Vthn is DVthn. So, substituting Vthn in Eq. (11) by
Vthn+ DVthn results in the voltage difference being given by
DV ¼ Vn  Vn1
¼ ðVDD  Vthn0  DVthn0Þð1þ cÞ e
ð1þcÞðn1ÞWCoxvsatT
CL  eð1þcÞnWCoxvsatTCL
h i
:
ð40Þ
The variation in DV due to that in Vthn is thus
DðDVÞDVthn ¼
DVthn0
ð1þ cÞ e
ð1þcÞðn1ÞWCoxvsatT
CL  eð1þcÞnWCoxvsatTCL
h i
: ð41Þ0.05 0.06 0.07 0.08 0.09 0.1
L coefficient.
L versus the DIBL coefficient.
s CL versus the DIBL coefficient, g.
0.2 0.205 0.21 0.215 0.22 0.225 0.23 0.235 0.24 0.245 0.250.0244
0.0246
0.0248
0.025
0.0252
0.0254
0.0256
0.0258
0.026
0.0262
The NMOS threshold voltage in Volts.
Th
e 
vo
lta
ge
 d
iff
er
en
ce
 a
t C
L 
in
 V
ol
ts
.
The voltage difference versus the NMOS threshold voltage.
Figure 19 The relationship between DV and Vthn0.
2578 S.M. SharroushRepeating the analysis with respect to the variations in W
and T and assuming that the variations of these two parame-
ters, DW and DT, are small, then the following approximation
can be used (where a is a constant):
eaðxþDxÞ ¼ eaxð1 aDxÞ: ð42Þ
The variations in DV due to the variations in these two param-
eters are thus given by
DðDVÞDW ¼
DWðVDDVthn0Þ
ð1þ cÞ e
ð1þcÞnWCoxvsatT
CL
ð1þcÞnCoxvsatT
CL
 
eð1þcÞðn1ÞWCoxvsatTCL ð1þ cÞðn1ÞCoxvsatT
CL
 
; ð43Þ
and
DðDVÞDT ¼
DTðVDDVthn0Þ
ð1þ cÞ e
ð1þcÞnWCoxvsatT
CL
ð1þ cÞnWCoxvsat
CL
 
eð1þcÞðn1ÞWCoxvsatTCL ð1þ cÞðn1ÞWCoxvsat
CL
 
; ð44Þ
respectively.
Assuming that the variations in Vthn, W, and T, are uncor-
related, then the total variation of DV, D(DV), can be expressed
as the sum of the products of each of the sensitivities of DV by
the corresponding change [30]. Thus, we get0 0.1 0.2 0.3 0.4 00
1
2
3
4
5
6
7
8
9 x 10
-3
The percentage Th
e 
ab
so
lu
te
 v
ar
ia
tio
n 
of
 th
e 
vo
lta
ge
 d
iff
er
en
ce
 in
 V
ol
ts
. The absolute variation
 difference versus the per
Figure 20 The absolute value of the variation of the volDðDVÞ ¼ DVthn @DV
@Vthn
 
þ DW @DV
@W
 
þ DT @DV
@T
 
: ð45Þ
Refer to Figs. 20–22 for the absolute variations of DV versus
the percentage variations in Vthn, W, and T, respectively.
Figs. 21 and 22 show the plots of the absolute variations of
DV due to that inW andT evaluated approximately and exactly
(without the approximation of Eq. (42)). It is obvious from
these figures that the absolute variations in DV corresponding
to a 100% variation in Vthn, W, and T, are 8.8 mV, 6.8 mV,
and 6.8 mV, respectively. The total absolute variation of DV
is only 7.2 mV for a 20% variation in each of Vthn, W, and T.
7. Simulation results
The proposed scheme will be verified in this section by simula-
tion using the 45 nm CMOS technology with VDD = 1 V for
four and eight inputs. The aspect ratios of all the transistors
are taken equal to 2. The 50% point convention will be
adopted in evaluating the propagation delays. According to
the simulation, the time required to discharge CL from Vn to
0 V in case of four inputs is 20.8 ps. This represents only
3.8% of the worst-case propagation delay and thus can safely
be neglected as stated in Section 4. Refer to Fig. 23 for the plot
of the worst-case voltage difference versus time. According to.5 0.6 0.7 0.8 0.9 1
variation in Vthn.
 of the voltage              
centage variation in Vthn.
tage difference versus the percentage variation in Vthn.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.002
0.004
0.006
0.008
0.01
The percentage variation in T.
Th
e 
ab
so
lu
te
 v
ar
ia
tio
n 
of
 th
e 
vo
lta
ge
 d
iff
er
en
ce
  
ev
al
ua
te
d 
tw
o 
w
ay
s;
 e
xa
ct
ly
 a
nd
 u
sin
g 
th
e 
an
al
ys
is,
 in
 V
ol
ts
.
The aboslute variation of the voltage difference versus the percentage variation in T.
exactly
approximate analysis
Figure 22 The absolute value of the variation of the voltage difference versus the percentage variation in T.
Figure 23 The simulation results illustrating the worst-case voltage difference across CL versus time.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.002
0.004
0.006
0.008
0.01
The percentage variation in W.
Th
e 
ab
so
lu
te
 v
ar
ia
tio
n 
of
 th
e 
vo
lta
ge
 d
iff
er
en
ce
   
 e
va
lu
at
ed
 tw
o 
w
ay
s;
 e
xa
ct
ly
 a
nd
 
us
in
g 
th
e 
an
al
ys
is,
 in
 V
ol
ts
.
The absolute variation of the voltage difference versus the percentage variation in W.
exactly
approximate analysis
Figure 21 The absolute value of the variation of the voltage difference versus the percentage variation in W.
High-performance time-balanced wide fan-in CMOS circuit 2579this figure, the maximum value of this difference is 35.5 mV
and occurs at a time of 7.8 ps. Figs. 24 and 25 show the simu-
lation results of the conventional and proposed schemes for
the two cases of logic ‘‘1” and logic ‘‘0” at the output for four
inputs using two cascaded inverters. The threshold voltage of
the first inverter is adjusted to lie between the values of Vn
and Vn1. The threshold voltages of the NMOS transistorsof the two cascaded inverters of the added buffer are put equal
to 50 mV in order to speed-up the emergence of the output in
case of logic ‘‘1” at the output. For eight inputs, the proposed
alternative can be achieved using two copies of the circuits
used with the case of four inputs and a two-input OR gate
as shown in Fig. 1 (c). The two-input OR gate can in turn
be achieved using a two-input static-complementary CMOS
Figure 26 Simulation results showing the output voltage versus time for the conventional and proposed schemes in case of logic ‘‘1” at
the output for eight inputs.
Figure 25 Simulation results showing the output voltage versus time for the conventional and proposed schemes in case of logic ‘‘0” at
the output for four inputs.
Figure 24 Simulation results showing the output voltage versus time for the conventional and proposed schemes in case of logic ‘‘1” at
the output for four inputs.
2580 S.M. SharroushNOR gate and an inverter. Figs. 26 and 27 are the counter-
parts to Figs. 24 and 25 for eight inputs.
Refer to Table 1 for the high-to-low and the low-to-high
propagation delays of the conventional and proposed schemes,tPHLc, tPHLp, tPLHc, and tPLHp, along with their average, tavgc
and tavgp, and the percentage reduction for these two cases:
four inputs and eight inputs.
Table 1 The simulation results of the conventional and proposed schemes (all in ps) in case of four and eight inputs.
Four inputs Eight inputs
tPHLc 700.86 tPHLc 2900
tPLHc 136.5 tPLHc 227.2
tavgc 418.68 tavgc 1563.6
tPHLp 158 tPHLp 1156
tPLHp 547.16 tPLHp 55.5
tavgp 352.58 tavgp 605.75
% reduction in average propagation delay 15.8% % reduction in average propagation delay 61.25%
Figure 27 Simulation results showing the output voltage versus time for the conventional and proposed schemes in case of logic ‘‘0” at
the output for eight inputs.
High-performance time-balanced wide fan-in CMOS circuit 25818. Conclusions
In this paper, a novel alternative to CMOS stacks was pre-
sented. This alternative depends on applying the inputs with
a certain pulse width. The percentage reduction in the average
propagation delay was found to be 15.8% and 61.25% for the
cases of four and eight inputs, respectively, for the 45 nm
CMOS technology. It could be concluded from the rough esti-
mation of the area that the proposed scheme has an area
advantage over the conventional stack when the number of
inputs exceeds 3. The proposed scheme has a smaller propaga-
tion delay for all numbers of inputs and a larger power con-
sumption for practical number of inputs due to the static-dc
and short circuit power consumption. However, the power-
delay product of the proposed scheme is smaller than that of
the conventional one when the number of inputs exceeds 2.
The second-order effects were taken into account and it was
shown that these effects do not affect the proposed scheme
significantly.
References
[1] J.E. Ayers, Digital Integrated Circuits: Analysis and Design,
CRC Press, Boca Raton, USA, 2005.
[2] A.S. Sedra, K.C. Smith, Microelectronic Circuits, seventh ed.,
Oxford University Press, New York, 2015.
[3] S.C. Prasad, K. Roy, Transistor reordering for power
minimization under delay constraint, ACM Trans. Design
Autom. Electr. Syst. 1 (2) (1996) 280–300.[4] E. Musoll, J. Cortadella, Optimizing CMOS circuits for low
power using transistor reordering, in: Proceedings of European
Design and Test Conference, Paris, 1996, pp. 219–223.
[5] L. Ding, P. Mazumder, On optimal tapering of FET chains in
high-speed CMOS circuits, IEEE Trans. Circuit Syst. 48 (12)
(2001).
[6] M. Shoji, N.J. Warren, Apparatus for increasing the speed of a
circuit having a string of IGFETs, U.S. Patent: 4430583, Feb. 7,
1984.
[7] B.S. Cherkauer, E.G. Friedman, The effects of channel width
tapering on the power dissipation of serially connected
MOSFETs, in: IEEE International Symposium on Circuits
and Systems, 3–6 May 1993, Chicago, IL, vol. 3, 1993, pp. 2110–
2113.
[8] R.H. Krambeck, C.M. Lee, H.F.S. Law, High-speed compact
circuits with CMOS, IEEE J. Solid-State Circuits SC-17 (June)
(1982) 614–619.
[9] S. Choudhary, S. Qureshi, Power aware channel width tapering
of serially connected MOSFETs, in: International Conference
on Microelectronics, 29–31 Dec 2007, Cairo, 2007, pp. 399–402.
[10] B.S. Cherkauer, E.G. Friedman, Channel width tapering of
serially connected MOSFET’s with emphasis on power
dissipation, IEEE Trans. Very Large Scale Integr. VLSI Syst.
2 (1) (1994) 100–114.
[11] J. Yuan, C. Svensson, Principle of CMOS circuit power-delay
optimization with transistor sizing, in: IEEE International
Symposium on Circuits and Systems, 12–15 May 1996,
Atlanta, GA, vol. 1, 1996, pp. 637–640.
[12] C. Tan, J. Allen, Minimization of power in VLSI circuits using
transistor sizing, input ordering, and statistical power
estimation, in: Proceedings of International Workshop Low-
Power Design, 1994, pp. 75–80.
2582 S.M. Sharroush[13] X. Kavousianos, D. Nikolos, Novel single and double output
TSC Berger code checkers, in: 16th Proceedings of VLSI Test
Symposium, 26–30 Apr 1998, Monterey, CA, 1998, pp. 348–353.
[14] C. Metra, M. Favalli, B. Ricco, Tree checkers for applications
with low power-delay requirements, in: Proceedings of
International Symposium on Defect and Fault Tolerance VLSI
Systems, 1996, Boston, MA, 1996, pp. 213–220.
[15] W. Kuzmicz, Leakage physics and modeling – exercises,
available from the IDESA project web site <http://www.idesa-
training.org/Docs/Leakage_exercises_final.pdf>.
[16] M.V. Dunga, X. Xi, J. He, W. Liu, K.M. Cao, X. Jin, J.J. Ou,
M. Chan, A.M. Niknejad, C. Hu, BSIM4.6.0 MOSFET Model:
User’s Manual, University of California, Berkeley, 1986.
[17] D.A. Neamen, Semiconductor Physics and Devices: Basic
Principles, fourth ed., McGraw-Hill, 2012.
[18] N.H.E. Weste, D.M. Harris, CMOS VLSI Design: A Circuits
and Systems Perspective, fourth ed., Addison-Wesley,
Massachusetts, USA, 2011.
[19] K.-Y. Toh, P.-K. Ko, R.G. Meyer, An engineering model for
short-channel MOS devices, IEEE J. Solid-State Circuits 23 (4)
(1988) 950–958.
[20] A. Hamoui, Current, delay, and power analysis of submicron
CMOS circuits, Master Thesis, McGill University, Montre´al,
1998.[21] S.M. Sharroush, Design techniques for high performance MOS
digital integrated circuits, Doctor of Philosophy Thesis, Port
Said University, Port Said, Egypt, 2011.
[22] A.S. Sedra, K.C. Smith, Microelectronic Circuits, fourth ed.,
Oxford University Press, New York, 1998.
[23] V. Kursen, E.B. Friedman, Multi-Voltage CMOS Circuit
Design, John Wiley & Sons Ltd, Britain, 2006.
[24] W. Chase, F. Bown, General Statistics, fourth ed., Wiley, 1999.
[25] J.P. Uyemura, CMOS Logic Circuit Design, Kluwer Academic
Publishers, New York, 2002.
[26] Y.S. Abdalla, Design of high speed MUX/DMUX using a new
all-time-on single-ended CMOS logic, Doctor of Philosophy
Thesis, Waterloo, Ontario, Canada, 2006.
[27] Y. Tsividis, Operation and Modeling of the MOS Transistor,
second ed., McGraw-Hill, Boston, 1999.
[28] Y. Cheng, C. Hu, MOSFET Modeling & BSIM3 User’s Guide,
Kluwer Academic Publishers, Boston, 1999.
[29] R.S. Muller, T.I. Kamins, Device Electronics for Integrated
Circuits, second ed., John Wiley, New York, 1986.
[30] J.S. Rad, Design and analysis of robust variability-aware
SRAM to predict optimum access-time to achieve yield
enhancement in future nano-scaled CMOS, Doctor of
Philosophy Thesis, University of California, Santa Cruz, USA,
2012.
