Metadata, citation and similar papers at core.ac.uk

# On Dynamic Delay and Repeater Insertion ${ }^{1}$ 

Hannu Tenhunen and Dinesh Pamunuwa<br>Royal Institute of Technology (KTH), IMIT, LECS<br>Electrum 229, SE-164 40 Kista, Sweden<br>dinesh/hannu@ele.kth.se


#### Abstract

In deep sub-micron technologies, as the wires are placed ever closer and signal rise and fall times go into the sub-nano second region, increased cross talk has implications on the data throughput and on signal integrity. Depending on the data correlation on the coupled lines, the delay can either decrease or increase. Here we show that in uniform coupled lines, the response for several important switching configurations has a dominant pole characteristic. This allows easy prediction for the average, worst-case and best-case delay of buffered lines. We show that the repeater numbering and sizing can be optimised to deal with cross-talk under different constraints to best match the application. Area and power issues are considered and all equations are checked against a dynamic circuit simulator (SPECTRE).


## 1. INTRODUCTION

In the future generation of VLSI circuits when the feature size shrinks to a fraction of a micro meter, the aspect ratio (width/height) of interconnect is reduced in order to keep the resistance increase to a minimum. This means the capacitance between wires increases, and cross talk which couples a noise voltage onto the victim net, and has an effect on the delay- poses a serious challenge in designing VLSI systems. Our interest in this paper is in crosstalk induced delay, and further in a parallel line configuration, where the nets are laid out alongside each other for a relatively long distance as would occur in an intermediate or global level bus. Recently there has been a profusion of research into block oriented architectures [1] with different modules communicating via buses and the parallel net topology in Fig. 1 will occur very often.

Capacitive coupling can result in speeding up of the signal or cause delay- depending on the correlation between the data on the different lines. This input dependent dynamic delay can exactly be captured only by dynamic simulators. However when the line under consideration is reduced to a uniformly coupled two aggressor
configuration as shown in Fig. 1, certain simplifications are possible which allow delay predictions depending on the switching of the aggressors. There have been previous works which have distributed the capacitance over ground and coupled components and presented closed form delay equations with various switching configurations. However these use a single T or $\Pi$ section, which does not represent a distributed line with reasonable accuracy. We shall show that for uniformly coupled parallel nets, when the aggressors switch simultaneously in a variety of ways, the response of the victim line has a dominant pole characteristic. This allows the delay to be modelled by a single time constant, with a changing coefficient giving measures of average, worst-case and best-case delays with over $90 \%$ accuracy. This analysis is extended to buffered lines, where we give closed form equations which quantify the effect that repeater sizing and numbering have on the delay for different switching patterns. Finally we show how these expressions facilitate repeater optimisation under different constraints.

## 2. DELAY MODELLING

From now on, whenever delay is mentioned we are always talking about the $50 \%$ delay, since this corresponds to the delay of the output to the switching threshold of an inverter. Also in all cases the victim line is assumed to switch from zero to one, without loss of generality. When a line switches up(down) from zero(one) it is assumed to have been zero(one) for a long time. We consider a line with coupling on two sides as shown in Fig. 1. To build up our delay model for the distributed line, we analyse first the lumped model which consists of a single section. For simultaneously switching lines, six different switching scenarios can be identified.
(a) Both aggressors switch from one to zero
(b) One switches from one to zero, the other is quiet
(c) Both are quiet
(d) One switches from one to zero, the other switches from zero to one
(e) One switches from zero to one, the other is quiet

[^0]

Figure 1: Repeater Insertion in a long interconnect
(f) Both switch from zero to one

Consider (c) above as the reference delay, where the driver of the victim line charges the entire capacitance. Cases (a) and (b) slow down the victim line, (d) is equivalent to (c), and (e) and (f) speed up the victim. Now given in (1) is the complete response of the victim line where the coefficients $A_{i}$ and $B_{i}$ take the values given in Table 1 depending on how the aggressor lines switch.

Table 1. Coefficients for different switching configurations

| $i$ | Switching <br> Configuration | $A_{i}$ | $B_{i}$ | $\lambda_{i}$ | $k_{i}$ |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | (a) | $-4 / 3$ | $1 / 3$ | 1.51 | 2.20 |
| 2 | (b) | 1 | 0 | 1.13 | 1.50 |
| 3 | (c) | $-2 / 3$ | $-1 / 3$ | 0.57 | 0.65 |
| 4 | (d) | $-2 / 3$ | $-1 / 3$ | 0.57 | 0.65 |
| 5 | (e) | $-1 / 3$ | $-2 / 3$ | - | - |
| 6 | (f) | 0 | 1 | 0 | 0 |

In cases (b) and (f), the response is a single decaying exponential, while in the other cases the slow or dominant time constant is $R\left(C_{s}+3 C_{0}\right)$. In cases (a), (c) and (d), the slower time constant is also associated with the larger coefficient, and hence the faster time constant can be neglected with good accuracy in the delay. This is especially so in case (a). Now to state some well known results, a lumped $R C$ circuit with no aggressors has a single pole response and the delay is as given in (2). Signal propagation along a distributed RC line is governed by the diffusion equation which does not lend itself readily to closed

$$
\begin{align*}
& V=1+A_{i} e^{-\frac{t}{R\left(C_{s}+3 C_{c}\right)}}+B_{i} e^{-\frac{t}{R C_{s}}}  \tag{1}\\
& T_{0.5, \text { lumped }}=0.7 R C \quad \text { (2) }  \tag{2}\\
& T_{0.5, \text { distr }}=0.4 R C_{s}+\lambda_{i} R C_{c} \tag{3}
\end{align*}
$$

$$
T_{0.5, \text { distr }}=0.4 R C
$$

form predictions for the delay at a given threshold. However it turns out that a simple exponential is a very good predictor [2] which leads to (3) as the model for the $50 \%$ delay of a distributed $R C$ line to a step input. This is a very good approximation and is reputed to be accurate to within $4 \%$ for a very wide range of $R$ and $C$.

For the kinds of $R C$ lines shown, whenever the response of the lumped model corresponding to a single section of the distributed line is or can be approximated by a waveform containing a single exponential, the response of the distributed line can also be approximated by a waveform with a single exponential. Hence we propose to model the delay of the distributed lines corresponding to (a), (b), (c), (d) and (f) with single time constant expressions. (In the case of (e) the accuracy is not high enough to justify such an approach because the lumped model does not have a dominant time constant). Since the time constants in question are linear combinations of $R, C_{s}$ and $C_{c}$, changing coefficients are sufficient to distinguish between the different cases. The delay is as given in (4) where $\lambda_{i}$ take the values in Table 1. These constants were obtained by running sweeps with the circuit analyser SPECTRE. For all $i$, the accuracy is more than $93 \%$ for a wide range of $R, C_{s}$ and $C_{c}$ values. In the interest of brevity, only a representative subset of the values for $i=1$, which is of special interest, is shown here in Table 2.

## 3. REPEATER INSERTION

To reduce delay the long lines in Fig. 1 are broken up into shorter sections, with a repeater (an inverter) driving each section as shown in Fig. 2. The analysis for repeater insertion is carried out by characterizing the non-linear buffers by an output resistance and input capacitance. Let the number of repeaters including the original driver be $k$, and the size of each repeater be $h$ times a minimum sized inverter (all lines are buffered in a similar fashion). The output impedance of a minimum sized inverter for the particular technology is $R_{d r, m}$ and the output capacitance $C_{d r, m}$. Then the output impedance of an $h$ sized driver is assumed to be $R_{d r, m} / h$, and the output capacitance $h \times C_{d r, m}$.

Table 2. Comparison of simulated and predicted delay for a distributed RC line with worst-case cross talk

| R <br> (ohms) | Cs <br> (fF) | Cc <br> (fF) | (simulated) <br> $(\mathrm{fs})$ | Td <br> (model) <br> (fs) | Error <br> percentage <br> $(\%)$ |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 10 | 1 | 10 | 153.8 | 154 | $-0.2 \%$ |
| 10 | 100 | 1 | 403 | 415 | $-2.8 \%$ |
| 10 | 100 | 10 | 546 | 550 | $-0.8 \%$ |
| 100 | 1 | 1 | 197 | 190 | $3.7 \%$ |
| 100 | 1 | 10 | 1537 | 1540 | $-0.1 \%$ |
| 100 | 10 | 10 | 1984 | 1900 | $4.2 \%$ |
| 200 | 10 | 30 | 9938 | 9800 | $1.4 \%$ |
| 300 | 30 | 10 | 8393 | 8100 | $3.5 \%$ |
| 300 | 30 | 20 | 13222 | 12600 | $4.8 \%$ |
| 300 | 30 | 30 | 17850 | 17100 | $4.2 \%$ |

Now with reference to Fig. 1 and using superposition with the delay equations ( 2,3 and 4 ) the total delay for a line takes the expression given in (5). This expression follows the Bakoglu model [3] of equalising the repeaters, and can be explained as follows. The distributed and lumped resistances combine with the distributed and lumped capacitances to produce various delay terms. The terms in bold are the result of modelling cross talk in the delay. $\lambda_{i}$ and $\mu_{i}$ take the values given in Table 1, where $\mu_{i}$ is a coefficient introduced to take the Miller effect into account. ${ }^{1}$ It is assumed that the load $C_{L}$ is equal to the input capacitance of an $h$ sized inverter. Also the signal rise time has been included here. Because in general the delay per section is much greater than half the rise time, the non-zero rise(fall) time of the input signal is approximated in (5) as a simple addition. Hence the fact that the entire analysis is based on step inputs does not cause grave drops in the accuracy of the final expressions. This is ever more true for future generations of technologies where decreasing feature sizes allow transistors to be gated with faster signals, but also cause wire parasitics to become more pathological. This delay expression was checked against simulated values,

$$
\begin{align*}
t_{0.5}= & k\left[0.7 \frac{R_{d r v_{m}}}{h}\left(\frac{C_{s}}{k}+h C_{d r v_{m}}+\mu_{i} \frac{2 C_{c}}{k}\right)+\right.  \tag{5}\\
& \left.\frac{R}{k}\left(0.4 \frac{C_{s}}{k}+\lambda_{i} \frac{C_{c}}{k}+0.7 h C_{d r v_{m}}\right)\right]+\frac{t_{r}}{2}
\end{aligned} \quad \begin{aligned}
& k_{i, o p t}=\sqrt{\frac{0.4 R C_{s}+\lambda_{i} R C_{c}}{0.7 R_{d r v_{m}} C_{d r v_{m}}}}(6) \quad h_{, o p t}=\sqrt{\frac{0.7 R_{d r v_{m}} C_{s}+1.4 \mu_{i} R_{d r v_{m}} C_{c}}{0.7 R C_{d r v_{m}}}}(7)
\end{align*}
$$

[^1]

Figure 2: Delays for different repeater insertion strategies
and the accuracy found to be limited only by the accuracy of the initial expression (4).

To find the optimum $h$ and $k$ for minimising delay, the partial derivatives of (5) with respect to $k$ and $h$ are equated to zero, resulting in (6) and (7). Case (a) is of special significance because it represents the worst-case cross-talk of all the cases considered (the delay for this pattern is only 1 or $2 \%$ less than the worst-case delay caused by non-simultaneously switching aggressors). A repeater insertion strategy that is optimised for a certain pattern will not be optimal for other patterns, and of interest is exactly how it performs. Given in Fig. 2 are the delays for different patterns, when the repeater insertion strategies are optimised for cases (a) through (f), excepting (e). The net considered here has a resistance of $1 \mathrm{k} \Omega$ and capacitances of $100 \mathrm{f} F$ to ground and to each of the adjacent wires. $R_{d r v}$ and $C_{d r v}$ are set to $7.7 \mathrm{k} \Omega$ and 9.5 f F to match the 0.35 mm technology we use for testing. The legend termed single refers to the conventional optimisation strategy that would be carried out by treating the total capacitance as a single lumped component. Obviously for each switching pattern, the delay is minimum for the $h$ and $k$ that is optimised for that particular pattern. What is interesting here is that pattern (a) always causes the maximum delay (hence defining the maximum bit frequency over the line as the worst-case has to be expected in general), and this can be reduced by a repeater insertion strategy that is more aggressive than would be predicted as the optimal by a conventional analysis. By inspecting the optimal $k$ and $h$ values for the different switching patterns and considering the delay constraints and the available resources for repeater insertion, the $k$ and $h$ values that best suits ones application can be selected.

To check the accuracy of our models we ran simulations for transistor models in an actual $0.35 \mu \mathrm{~m}$ technology
where $R_{d r, m}$ and $C_{d n, m}$ take the values given above. Shown in Fig. 3 are the results of simulations for a range of $h$ situated either side of the value predicted by (6), where the k and h values associated with each graph refer to $k_{l, o p t}$ and $h_{l, o p r}$. It can be seen that the fidelity of (6) and (7) are quite good.


Figure 3: Effect of Repeater Sizing on Delay

## 4. AREA AND POWER


igure 4: Delay constraint matching for Net in row 1, Tab. 4.

Minimising power consumption is equivalent to minimising area, or the product $h k$. When the delay is equalised over each line segment, the problem of repeater optimisation can take two forms. Either the maximum acceptable delay for the net is specified, and the objective is to minimise $h k$ subject to the constraint $t \leq t_{\text {max }}$, or the maximum acceptable area is specified and the objective is to minimise the delay subject to the constraint $A \leq A_{\max }$. Consider Fig. 4 which shows the variation of delay with $h$ and $k$ where the line parasitics are defined by $R=800 \Omega$ and $C_{s}=C_{c}=550 \mathrm{fF}$. The plane shows a delay constraint of 1.3 n seconds for that net, and any of the $k$ and $h$ combinations which lie below this and on the curved surface showing the delay is acceptable to meet the delay constraint. Also shown is an appropriately scaled plot of $h k$. Because $h k$ is quasi concave in the quadrant of positive $h$ and $k$, it is not possible to find an analytical solution to the first optimisation problem, which has to be solved numerically. The solution to the second optimisation prob-

$$
\begin{array}{rcr}
0.7 \frac{R_{d r v}}{h^{2}}\left(C_{s}+4.4 C_{c}\right)-0.7 R C_{d r v}+L_{1} k+L_{2} & =0 \\
\frac{R}{k^{2}}\left(0.4 C_{s}+1.5 C_{c}\right)-0.7 R_{d r v} C_{d r v}+L_{1} h+L_{3} & =0 \\
L_{1}\left(h k-A_{\max }\right)=0 & h k \leq A_{\max } & L_{1} \geq 0 \\
L_{2}(h-1)=0 & h \geq 1 & L_{2} \geq 0 \\
L_{3}(k-1)=0 & k \geq 1 & L_{3} \geq 0 \tag{12}
\end{array}
$$

lem is obtained by solving the Kuhn Tucker conditions [4] given in (8) through (12) where $L_{i}$ refer to the Lagrangian constants. The coefficients corresponding to case (a) have been used as the worst-case needs to be considered.

## 5. CONCLUSIONS

In this paper we have investigated the issue of dynamic delay in buffered lines and shown that distributing the capacitance into two components as we have proposed allows the effect of switching aggressors in a buffered net to be quantified in simple equations. The optimal $k$ and $h$ values that minimise delay for any given switching pattern were then derived. All these expressions give the designer more information about when and how to insert repeaters in long nets and are proposed as being suitable for static timing tools. The closed form nature of the equations allow iterations to be made much more cheaply than with a dynamic simulator. For all patterns, when the coupling capacitance term $C_{c}$ is set to zero (i.e. total capacitance is lumped into the term $C_{s}$ ), the equations describing $h$ and $k$ simplify to the Bakoglu equations [3]. Hence we have proposed a simple yet accurate way of distributing the capacitance and including the effect of switching aggressors.

## 6. REFERENCES

[1] D. Sylvester and K. Keutzer, "Getting to the bottom of deep submicron II: a global wiring paradigm", in Proc. ISPD, 1999, pp. 193-200.
[2] J. Rubinstein, P. Penfield and M. Horowitz "Signal delay in RC tree networks", IEEE Trans. Computer Aided Design, vol CAD-2, no. 3, pp. 202-211, July 1983.
[3] H. Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Reading, MA: Addison Wesley 1990
[4] S. Dar and M. Franklin, "Optimum buffer circuits for driving long uniform lines", IEEE J. Solid State Circuits, vol. 26, pp. 32-40, Jan. 1991.
[5] Y. Ismail and E. Friedman, "Effects of inductance on the propagation delay and repeater insertion in VLSI circuits", IEEE Trans. VLSI Systems, April 2000, vol. 8, pp. 195-206.


[^0]:    1. The funding support of Sida and that of Vinnova via the Socware and Exsite Programs are gratefully acknowledged
[^1]:    1. Because of the approximate models used for the delay, the final accuracy is improved if the Miller coefficients take non-integer values as shown.
