Abstract -Physical delay models entirely based upon device equations for small-geometry CMOS inverters with RC tree interconnection networks are presented. Through extensive comparisons with SPICE simulation results, it is shown that the maximum relative error in delay-time calculations using the developed model is within 15% for 1.5-pm CMOS inverters with RC tree interconnection networks. Moreover, the model has a wide applicable range of circuit and device parameters. Based upon the developed models and the mathematic optimization method, an experimental sizing program is constructed for speed improvement of interconnection lines and trees. In this program, given the size of the input logic gate and its driving interconnection resistances, capacitances, and structures, users can choose one of four speed improvement techniques and determine the suitable sizes and/or number of drivers/ repeaters for a minimum delay. The four speed improvement techniques use minimum-size repeaters, optimal-size repeaters, cascaded input drivers, and optimal-size repeaters with cascaded input drivers to reduce the interconnection delay. It is found from the sizing results of the experimental program that the required tapering factor in cascaded drivers is not e (the base of the natural logarithm) but a value in the range of 4-8. Moreover, adding a small number of drivers/repeaters with large sizes is more efficient in reducing the interconnection delay. It is also shown that the technique of optimal-size repeaters with cascaded input drivers can lead to the lowest delay.
I. INTRODUCTION T IS known that interconnection delay is a critical
I factor in speed improvement of CMOS VLSI/ULSI [13, [2] . To deal with this problem, efficient and analytical delay models of complicated interconnection nets among logic gates have to be developed first in order to perform the delay analysis. Then, with the aid of the delay models, optimal drivers should be designed to drive interconnection nets and minimize the delay. Evidently, the timing models must be accurate enough in calculating interconnection delay as well as driver's delay. Otherwise, the driver scheme cannot be correctly designed and sized. Manuscript received January 4, 1990; revised April 3, 1990 . The authors are with the Institute of Electronics and Department of Electronics Engineering, National Chiao-Tung University, Hsinchu, Taiwan 30039, Republic of China.
IEEE Log Number 9037787.
So far, many interconnection delay models [3]-[ 121 have been proposed. Among them, modeling logic gates and interconnection nets separately [3]- [9] or modeling a logic gate by a single linear RC circuit [10] - [12] may lead to a significant error in high-performance design [131-[151. Recently, a physical delay model for simple interconnection lines among small-geometry CMOS inverters has been proposed [13] , [14] where analytical delay equations were derived together with the input waveshape effects [16] , [17] and both the effect of a logic gate on the interconnection delay and the effect of interconnection on the gate delay were considered. It has been shown that the proposed model has a good accuracy and a wide applicable range of circuit and device parameters.
It is the aim of this paper to generalize the modeling techniques [13] , [14] for the characterization of the delay of interconnection tree networks among small-geometry CMOS inverters. Through extensive comparisons with SPICE simulation results, the maximum relative error of the developed model is found to be below 15% for the delay times of CMOS inverters with different RC values in each branch of tree interconnection networks, different gate sizes, device parameters, and even input excitation waveforms. Moreover, the delay at any output node can also be accurately predicted.
A long interconnection line or tree among the two logic gates degrades the total delay significantly. To improve the speed, a suitable number of drivers and/or repeaters with suitable sizes has to be added to drive the interconnection nets [2], [MI. There are four speed-improvement techniques which use minimum-size repeaters, optimalsize repeaters, cascaded input drivers, and optimal-size repeaters with cascaded input drivers [2] . In these techniques, determining the suitable number and sizes of drivers and/or repeaters is a very important issue in design and optimization.
Based upon the developed interconnection line and tree delay models, the four speed-improvement methods 001 8-9200/90/ 1000-1247$01 .OO 0 1990 IEEE [2] , and the mathematic optimization technique of the Broyden-Fletcher-Shanno (BFS) [ 191 method, an experimental CAD program is constructed to determine suitable sizes and/or numbers of drivers/repeaters for interconnection lines and trees. It is shown that the driver and/or repeater schemes can be more accurately designed in this program to achieve a higher speed than those designed by the other approach [2] .
In Section 11, a model formulation of interconnection trees is presented. Verifications of the developed model are then presented in Section 111. The application of the developed delay models in speed improvement is described in Section IV. Finally, the discussion and conclusions are given in Section V.
DELAY MODEL OF INTERCONNECTION TREES

A. Waveform Generation, Timing Definition, and MOSFET Region Location
Consider a string of identical 1.5-pm CMOS inverters with RC tree interconnection networks as shown in Fig.  1 . To accurately simulate the behaviors of the RC tree interconnection networks under various operating conditions, each branch in the tree networks is equivalently represented by a three-step 7~ ladder circuit [11] as will be shown later. If the fall time of the output voltage Vo, where 1 < j < 10 is to be characterized, its simulated characteristic waveforms [13], 1141, [16] can be illustrated in Fig. 2 . At any output node j , the falling waveform of the node voltage VoJ has an initial delay time f d f , , fall time TF,, and fall delay time TPHLJ, as indicated in Fig. 2 for j = 1 and 10.
During the fall-time period TFJ, the operating regions of the MOSFET's M, and M , and those in the load stage are first determined from their drain-source voltages V,, and drain-source saturation voltages V, , , , .
According to the MOSFET operating regions, the falling waveform of 
B. Large-Signal Equivalent Circuit Generation and
Current / Capacitance Linearization
The overall large-signal equivalent circuit during TFj is given in Fig. 3 where each branch in the tree interconnection networks is represented by a three-step T ladder circuit to simplify the overall equivalent circuit while retaining the delay accuracy, as will be verified later. In 
c, = C a n + C m p C t , = C / 6 + C / 6 + C 3 / 6 c, = C g d n t C q d p t Cg.,+ c,, 
C. Effective Dominant Pole Calculation
In Fig. 3 the feedthrough current from the input node v to the output node Voj is negligibly small so that its effect can be neglected [13] , [141, [211, [221. Then, from the S-domain nodal equations of the circuit in Fig. 3 with the input node short-circuited, the voltage Voj(S> at output 
E. Rise/Fall Time and Delay Time Formulation
The characteristic fall time TFj of the output voltage VoJ(t) can be calculated from (3) and written as
In this expression, the second term in the square root represents the effective dominant zero contingent upon the output node position. The effective dominant zero is ing the effective dominant zero will lead to a significant (tfs,) and v ( t f l e ) , single-pole-response approximation is used and v(t) can be expressed as
where ln(9.0)
T R k
In the above equations, Prk and TRk represent the characteristic rise pole and rise time, respectively, at node k . Substituting the expressions of tfse and tfle into (7), the equations of Y(t,,) and I / ( t f r e ) in Table I can be derived. In these two equations, an empirical and universal constant is given to the pole-delay product PrlOtdf,, which has been proven to be a nearly constant physical parameter (tfs, ) and K(tfIe), TF, becomes a function of TRk and vice versa. Simple iterations are required to solve TFj and TRL. Usually the resulting iteration number is less than 5. According to the delay definition in Fig. 2 , the rise propagation delay TpLHj and fall propagation delay TPHLJ where PfsJ and PflJ represent the effective dominant pole in Regions I and 11, respectively, at output node j and tdfJ is the initial delay at output node j.
From the waveform function V,,(t), the equations of tfsr, tfse, and tfre as listed in Table I can be derived according to their definitions given in Sections 1I-A and B. Substituting the expression of tfsl into (3), all the between the input node k and any output node j can be expressed as
where TROj(TROk) stands for the time interval during which Voj(t)(Vok(t)) rises from 0 to O.5VDD at node j ( k ) , and TFoj(TFok) for the time interval during which Voj(t) (Vok(t)) lowers from V,, to 0.5VDD.
For simplicity, empirical laws for the initial delay times tdrj and t d f j were found [13] , [14] , [21]-[24] . As a result, the rise propagation delay TpLHj and fall propagation delay TpHLj at any output node j can be reformulated by the simple relations
Note that the above equations are universal and can be used to calculate the delay times under various conditions with a satisfactory accuracy, as will be verified in the next section.
COMPARISONS WITH SPICE SIMULATIONS
A. CMOS Inverters with RC Tree Interconnections
To verify the accuracy of the developed analytical delay models, extensive comparisons between theoretical calculations and SPICE simulations were made for 1.5-pm CMOS inverters with different RC values in each branch of the tree networks, different gate sizes, device parameters, and even input excitations. Part of the comparisons is shown in Fig. 4 for the rise/fall delay times at the first output node of CMOS inverters with smaller RC values in each branch of the RC tree interconnection networks and reduced threshold voltages VTop and VTo,. It is shown that the maximum relative error in the rise/fall delay times at any output node is 15%. Since the delay times are expressed by equations in the developed model, the CPU time consumed in the delay calculation is about two orders of magnitude smaller than that in point-bypoint full transient analysis like SPICE.
As mentioned in the previous section, the developed model equations contain the constant product of the input pole and the initial delay. Moreover, the output fall(rise) time is a function of the input rise(fal1) time. Through these relations, the input waveform effect has been implicitly incorporated into the model. Thus it can be applied to the cases of the noncharacteristic waveforms not deviating much from the characteristic waveforms. The general relative errors for the delay times are still below 15%. The ability to calculate the noncharacter- istic waveform timing makes the developed models more practical and versatile. Although the above comparisons are all based on the demonstrating networks like Fig. 1 , the developed model can also deal with the RC tree interconnection networks with different RC values in every branch of the tree networks and retain the same error characteristic.
B. CMOS Inverters with RC Line Interconnections
The developed delay model for interconnection tree networks can also be applied to calculate the signal timing of CMOS inverters with RC line interconnections and the same relative error characteristic can be retained. In this sense, the developed timing models are quite general.
IV. SPEED-IMPROVEMENT TECHNIQUES
There are many optimization methods to solve the unconstrained problem with a minimum delay [191, 1251, [26] . Among them, the Broyden-Fletcher-Shanno (BFS) method is an optimization method with quadratic convergence rates. Thus the CPU time consumption can be reduced. This method uses only the function values and gradient vectors in generating mutually conjugate search direction. From the computation point of view, it is very suitable for the sizing of CMOS inverters with RC-tree interconnection networks.
Based upon the developed interconnection delay models and the Broyden-Fletcher-Shanno (BFS) method [19] , an experimental program is constructed for speed improvement of the interconnection lines and trees. In this program, given logic gate sizes and its driving interconnection resistances, capacitances, and structures, users can choose one of four speed improvement techniques [2]. The program can calculate the number and/or sizes of drivers/repeaters for a minimum delay. The four improvement techniques use minimum-size repeaters, optimal-size repeaters, cascaded input drivers, and optimalsize repeaters with cascaded input drivers. It is shown that the program can accurately design the number and/or sizes of drivers/repeaters in each scheme. Also, the results are more accurate than those given in the previous literature [2].
A. Speed-Improuement Techniques in RC Line Interconnections 1) Minimum-Size Repeaters:
When the resistance of interconnection networks is comparable to or larger than the output resistance of the active driver, propagation delay increases as the square of the interconnection length because both capacitance and resistance increase linearly with the interconnection length [2] . The use of k minimum-size inverters as repeaters makes time delay linear with length by dividing the interconnection lines into k smaller subsections, as shown in Fig. 5 Fig.  5 . Running the developed program, the propagation delay of an interconnection line with k minimum-size inverters as repeaters can be optimized. Table 11 shows the calculated and the simulated total pair delay of the original interconnection line (Fig. 5(a) ) and the interconnection line with k minimum-size repeaters, with the calculated k listed in Table 111 . It is shown that the speed improvement in total pair delay is significant only when RinfCint of the interconnection line is large. When RintCinf is small, no minimum-size repeater is added ( k = 1) and the total delay remains unchanged. This means that using minimum-size repeaters can improve the speed of a very long interconnection line but not a medium or short one.
2) Optimal-Size Repeaters: Total pair delay can be further improved by increasing the size of repeaters because the driving capability of the repeaters is directly proportional to the size of MOSFET's. Running the developed program, the minimum total pair delay of an RC interconnection line driven by equally spaced k -1 optimal-size inverters as repeaters (Fig. 5(c) ) with size factor h can be achieved. The calculated and the simulated total pair delay times are listed in Table I1 whereas the calculated values of k and h are listed in Table 111 . Note that the first stage of this scheme must be a minimum-size inverter so that its delay can be compared to other schemes which have the same minimum-size inverter as the first stage.
As may be seen from Table 11 , adding optimal-size repeaters is very efficient in reducing the overall delay of an interconnection line when RinfCint is large. The reduction percentage can be as high as 98% for a long interconnection line with Rint = 200 klR and Cint = 100 pF.
As a comparison, the number of drivers/repeaters and their sizes in both cases of minimum-size repeaters and optimal-size repeaters are also calculated by using the formulas given in [2]. Then the total delay times of the overall circuit are obtained from SPICE simulations. Table  I1 shows the SPICE simulated total pair delay times whereas Table I11 gives the corresponding number of drivers/repeaters and their sizes. It can be seen that the SPICE simulated delay times in both schemes designed through the use of the program are smaller than those designed from the formulas in [2] . But the required number of minimum-size and optimal-size repeaters in our work is less than that in [2] . For example, according to the design program, 97 minimum-size repeaters are required for an interconnection line with Rinr = 200 klR and Gin, = 100 pF and the resultant SPICE simulated delay is 3570 ns. According to the result in [2], five times more repeaters (500 repeaters) are required but the resultant delay is 24% higher. For optimal-size repeaters, the number of repeaters calculated by using the design program (239) is also much smaller than that (515) in [2] by 53%, whereas the delay can be reduced by 30% through the suitably optimized repeaters sizes. Note that the repeater size should be properly increased with the increase of RinrClnt in order to achieve a better speed performance.
3) Cascaded Input Drivers:
It is shown that a chain of n drivers that increase in size by a tapering factor f can be used to drive the RC loads [2], [27] or interconnection lines as shown in Fig. %d) . Running the developed program, the optimal total pair delay of an interconnection line driven by such cascaded and tapered drivers can be calculated. Table I1 shows the calculated and the simulated results; the required values for f and n are listed in Table 111 . The program shows that the suitable tapering factor is approximately 4-8 instead of e (the base of the natural logarithm) as in the conventional taper buffer [2] . Furthermore, because the number of inverters n must only be an odd integer number [16] , all values of n are equal to 3, which is less than those in [2] .
It is shown in [16] that the suitable tapering factor of CMOS inverters with pure capacitive loads is in the range 3-5. To drive RC loads rather than pure capacitive loads, however, a larger tapering factor has to be used because it leads to a larger transistor size and thus a smaller ON resistance, which can drive RC loads efficiently to achieve a minimum delay. So the required tapering factor is in the range 4-8. Table I1 shows the SPICE simulated total pair delay times in [2] whereas Table I11 gives the corresponding number of cascaded drivers and their sizes. It can be seen that cascaded input drivers with a larger f and a smaller n (in our work) are more efficient than those with a smaller f and a larger n (in [2]) in reducing the delay time when Rl,,CIn, is not large. For example, when an interconnection line has RI,, = 2.00 k R and C, , , = 1.00 pF, the delay time of five cascaded drivers with f = e is about 18% higher than that of three cascaded drivers with f = 6.38.
As compared to the technique of optimal-size repeaters, cascaded input drivers are less efficient in reducing the interconnection delay when RlnfClnI is large. But cascaded input drivers are still better than minimum-size repeaters when Rln,Clnf is not very large.
4) Optimal-Size Repeaters with Cascaded Input Drivers:
As illustrated in Fig. %e) , optimal-size repeaters with cascaded input drivers combine the structure of the optimal-size repeaters and the cascaded input drivers. Table  I1 shows the calculated and the simulated total pair delay times whereas Table 111 gives the required values of f , n, k , and h.
As compared to the results in [2], a lower delay time can be achieved by optimal-size repeaters with a much less k and a larger h and by cascaded input drivers with the same rz and a larger f. For RI,,= 200 kfl and C,,, = 100 pF, the delay time obtained in this design is about 30% lower than that in [2] . Among the four techniques, this technique can give the lowest delay time, which is 8% lower than that of optimal-size repeaters, as shown in Table 11 . The previous four driving schemes used in an RC interconnection line can also be applied to the RC tree interconnection networks. For simplicity, here we only demonstrate the scheme of optimal-size repeaters with cascaded input drivers. Applying this scheme to an interconnection tree, the design guidelines are as follows: v 0 5 1) apply the developed delay model to determine the maximum-delay path and its connected branches; 2) to reduce the RC loads due to the RC branches connected to the maximum-delay path, optimal-size repeaters with cascaded input drivers are used to drive these RC branches; 3) to reduce the total pair delay time along the maximum-delay path, all the RC branches along the maximum-delay path except the leftmost one and those not connected to the maximum-delay path are driven by optimal-size repeaters; 4) the leftmost branch along the maximum-delay path is driven by optimal-size repeaters with cascaded input drivers.
As an example, the delay of an interconnection tree is to be reduced through the above design guidelines. The resultant driver/repeater scheme is shown in Fig. 6 where the number and the sizes of drivers/repeaters are determined from the developed program to reduce the delay. Table IV shows the calculated and the simulated pair. delay time before and after applying the speed improvement techniques. It can be seen that pair delay time can be improved by more than an order of magnitude. Physical delay models for 1.5-pm CMOS inverters with RC tree interconnection networks have been successfully developed. The CPU time of delay calculations is over two orders of magnitude faster than that of SPICE simulations. The maximum relative error of the delay model is only 15%.
Based upon the developed delay models, the four speed improvement techniques 121, and the BFS 191 optimization method, an experimental program has been constructed to determine the suitable number and the sizes of drivers/repeaters in each technique from the given logicgate sizes and interconnection structures. It is shown that a tapering factor of 4-8 in cascaded input drivers can obtain a lower delay than a tapering factor of e. Moreover, a small number of drivers/repeaters with large sizes is more efficient in reducing the interconnection delay. Among the four speed improvement techniques, the technique of optimal-size repeaters with cascaded input drivers can lead to the lowest delay. It is also shown that the developed program can determine the number and the sizes of drivers/repeaters more accurately than the formulas in [2] .
In future work, the delay model of CMOS inverters with interconnection trees will be generalized to other CMOS logic gates. The design of driving schemes will also be generalized to those gates and those cases with various constraints other than the minimum delay.
