We consider the problem of minimizing the delay in transporting a signal across a distance in a VLSI circuit.The problem can be restated as a combined buffer insertion, buffer sizing and wire sizing problem. We propose a simple buffering architecture for this problem and show that this architecture achieves a near optimal solution. We also derive simple models for a buffered wire which are suitable for high level design.
Introduction
Interconnects are having a great impact on the system performance in deep sub-micron era. Such a dominance highlights the need to tackle the problem of interconnects early in the design process. While connecting any two points on the chip, we need to minimize the signal transmission delay between them. We therefore need to find the solution for this elementary problem of interconnect delay minimization and also the cost hence incurred, in terms of area and power.
Considerable work has been done so far to come up with solutions for the interconnect delay minimization problem. Three techniques are commonly applied to reduce the delay of an existing wire topology: buffer insertion, wire sizing and simultaneous buffer insertion and wire sizing. The key to buffer insertion in optimizing delay is the isolation property of a buffer by which it decouples the capacitance of the descendants from its ancestors. If a purely capacitive load is to be driven, then a tapered configuration could be used for the buffers [1] . When the wire resistance can no longer be ignored, wire segmentation followed by buffer insertion is found to be beneficial [2] .
A simple approach for the design of optimal buffers for driving long uniform lines, as shown by Dhar et al, would be to use buffers in a tapered configuration both at the source and the sink, and have equidistant buffer placement on the middle wire section [3] . But the wire delay thus obtained, is not purely a function of the wire length. We shall see later how our work, although superficially similar to theirs, overcomes this drawback successfully.
Wire sizing techniques (both continuous and discrete) for minimizing delay have been widely studied [4] , [5] . Perhaps, one of the most sophisticated approaches to minimize wire delay is to adopt simultaneous buffer insertion and uniform wire sizing scheme. Chu et al give closed form expressions for this problem by solving essentially a very complex non-linear optimization problem [6] . Their solution is in turn complex. There has been some debate, though, in the recent past as to whether wire tapering is beneficial for interconnect synthesis. Alpert et al have shown that only a 3.5% improvement in delay is achieved by wire tapering after buffer insertion as compared to uniform wire sizing [7] . It is suggested that uniform sizing of the wire would be beneficial as compared to tapering. Hence we plan to use this strategy for solving the wire delay problem.
We will show in the subsequent sections how our solution to the delay minimization problem would be helpful in filling the gaps left by the previous investigations. In this paper we describe a pre-mid-post (p-m-p) buffer strategy for minimizing wire delay. The technique that we have proposed predicts the wire delay purely as a function of the wire length. We show that a near optimal solution is obtained for the problem using this p-m-p buffer strategy. This solution leads to simple models for the interconnect delay and the energy dissipated.
The problem definition and the delay model are described in Section 2. The experiment leading us to the premid-post buffer strategy for wire delay minimization is presented in Section 3. Our results and conclusions are given in Section 4 and 5, respectively.
Problem Formulation
We aim to propagate a signal from one point to another in the shortest possible time. For this, we intend to provide a simple solution using buffer insertion and uniform wire sizing.
The problem on hand can be defined as follows:
given a sink load and a distance 'l', from source to sink, find the optimal number of buffers, their sizes, placement and the wire width such that the total delay from source to sink is minimum.
Buffered Wire In the figure above, C in is the input capacitance as seen looking into the wire and C sink is the load capacitance.
We adopt the switch resistor model for the gate and R-C model for the wire. Inductance is not presently taken into consideration. We formulate the wire delay equation using the Elmore delay model. All our data correspond to a 0.18 micron TSMC technology. We consider the data corresponding to metal 4 wires for the calculations. The wire capacitances have been obtained from the 3-D capacitance extraction tool CAPEM [8] . The optimizations have been done in MATLAB.
The following models are used in the analysis.
where w is the width of the buffer. R o :wire resistance per unit length = k 3 /w int where k 1 , k 2 and k 3 are constants. C o1 :wire area capacitance per unit area C 1 :wire fringe capacitance per unit length The wire capacitance model is of the form C o = (C o1 w int + C 1 ) for the experiments on the buffer insertion problem. We consider w int , the wire width, to be fixed for a section of the wire (uniform wire sizing).
Observations
Our experiments are on source sink distances ranging from 2mm to 20mm. The wire capacitance was extracted by considering shield planes on metal 3 and metal 5. We have used buffer insertion followed by uniform wire sizing for delay minimization. We start off with the buffer insertion technique and optimize the wire delay function. The input capacitance of the buffered wire is kept to that corresponding to a minimum sized driver while the sink load corresponds to a minimum sized load. We have kept the buffer sizes and locations variable and the wire width minimum. As we increase the number of buffers, we observe the following trends. 
Figure 2. Delay vs Number of buffers
The point of minima (d min ) in the delay curve shown in Figure 2 corresponds to the optimal number of buffers that could be inserted for the given wire-length. 
Figure 3. Width of the buffer vs Buffer Position
As we keep moving from the second buffer (the first buffer is the minimum sized driver itself) towards the later buffer, the width of the buffer initially increases, then stabilizes, and then slightly reduces, as shown in Figure 3 . Also, the distance between the initial few buffers is less and that between the rest is almost equal. This leads us to the scheme described in the following section.
Pre-Mid-Post Buffer Strategy
We break down the buffered wire design problem, which is a complex non-linear delay optimization problem as in [3] , into three sub-problems (stages) : the prebuffer stage, the middle stage and the postbuffer stage. The solutions for each of these can be easily found.
The variables to be calculated are as follows :
n pre : number of prebuffers n : number of mid buffers n post : number of postbuffers r : total number of buffers w : width of the mid buffers w int : width of the wire Our wire architecture is depicted in Figure 4 . For the post-buffer stage, we adopt the tapered configuration, starting with a minimum sized driver and find the optimal number of buffers (n post ) in this stage and their sizes ( Figure 5 ). This scheme uses a tapering factor (f ) of 4 for calculating the buffer sizes [1] . This stage works independent of the previous stages in the architecture.
The second stage is the middle one ( Figure 6 ) where we fix the size of all the buffers to w. We place the buffers equidistant from each other. We then find the optimal number of buffers (n), their size (w) and the wire size (w int ). We obtain values for n, w and w int by optimizing the delay function. 2 We consider uniform wire sizing here, i.e. we find the single best width for the wire. What is to be noted is that this stage is driving a minimum capacitive load which is due to the driver of the post-buffer stage. We need such a scheme in order to express the delay of the intermediate section independent of the sink capacitance.
The first stage is the prebuffer stage, which has a tapered configuration, starting with minimum sized driver ( Figure  7) . The optimal number of buffers (n pre ) in this stage as well as the size of the buffers can be found by the following expression:
The load seen by this stage would be dependent on the buffer width solution from the intermediate stage.
Optimization Procedure for the Middle Section p-m-p Architecture
To arrive at the architecture that we are proposing, we optimized the interconnect delay function (bis) in Equa- 
(3) where we use the following notations :
The delay equation, based on the Elmore delay model, is a posynomial in w and w int . A convex optimization done on a posynomial would ensure that we arrive at the global minimum. This optimization procedure was also used in [9] to obtain an exact solution to the transistor sizing problem. We have used a program 3 written in MATLAB and based on the algorithm described in [10] . In a convex program, a local minimum for the function is also a global minimum. By using this optimization scheme we found the optimal size of the buffers and the wire size such that the delay was minimized. The buffer sizes were expressed relative to the minimum sized driver and the wire sizes relative to the minimum wire width corresponding to the given metal layer. This approach was repeated for different values of n to obtain the optimal number corresponding to minimum delay. Table 1 shows the number of buffers, their sizes (w), the wire sizes (w int ) and the delay (mid delay) for the intermediate section of our architecture.
Simplified p-m-p Buffered Wire Architecture
We note from Table 1 that as the wire-length is varied there is a small variation in buffer width as well as the wire width in the intermediate section. Therefore, we propose a simpler p-m-p model. For a given metal layer, a buffered wire middle section would have a fixed buffer width and wire width, for any source -sink distance. The simple wire model for the given metal layer would have a standard wire width and the middle section buffer width applicable to different lengths of the interconnect. The prestage and the post buffer stage would still have the cascaded buffer configuration. For a given load, the number of post buffers remains the same for all lengths since this stage is being driven by a minimum sized buffer. Moreover, since the middle section buffer width varies slightly, the number of prebuffers can also be fixed. Hence the only parameter which would vary for different lengths would be the number of intermediate buffers (n).
Unconstrained Optimization
To compare the results from our simplified p-m-p architecture with the optimal solution, we performed another set of optimization experiments. We calculated the total delay of the wire, which included the delay incurred in all the three sections of our wire architecture. To get the optimal delay for a signal traversing through the three sections, we performed unconstrained delay optimization with (3r − 1) variables, where r is the total number of sections. Here, we kept the section lengths, buffer widths and the wire widths variable.
Equation 4 gives the function to be optimized (bisws) and corresponds to the delay of a buffered wire of length l, driven by a minimum sized driver and with a sink capacitance C si .
Since this delay function too is a posynomial in w and w int , we adopted the convex optimization to arrive at the optimal solution. The optimization was performed for a fixed number of buffers (r) for the given length of wire. Our simple wire architecture is compared, in the following section, with the unconstrained (reference) solution for delay as well as for the energy dissipated. We also describe simple models for the energy dissipated and the delay of the wire.
Results and Discussion
In Table 2 , we have listed the total delay from our simplified p-m-p architecture (pmp) as well as that from an unconstrained optimization (cvx) . We also show the optimal number of buffers and their fixed size for different lengths. We find that the total number of buffers (r) vary linearly with wire-length. Our experiments are on load capacitances of 10FF and 100FF. We find that our solution for the delay minimization problem is within 7% of the optimal one for most of the lengths. The delay of the prebuffer stage and the middle stage is independent of the load capacitance. Also, the total wire delay varies linearly with length, as seen from Figure 8 . From this result, we can deduce that the total delay is a strong function of the wire-length and a weak function of the sink capacitance. The interconnect delay in (ns) can be modeled as shown in Equation 5 .
If
From our data, x 0 = 0.13e − 3 ; x 1 = 0.4 . This model is only valid for lengths (in µm) greater than our critical length of 4000µm.
The energy dissipation per transition for different wirelengths is shown in Table 3 . This depicts the total energy dissipated from our simplified wire architecture and that 
where y 0 = 2.19e − 3 ; y 1 = 1.1; We have thus arrived at a simple delay and energy model for a buffered wire.
We have also calculated the energy-delay product for the simplified p-m-p architecture and the unconstrained solution. From Figure 9 we see that the energy-delay product for our architecture is slightly better than that offered by the unconstrained solution. This result compensates for the small cost that we pay in terms of a sub-optimal solution for the interconnect delay minimization problem. 
Conclusions
In this paper we have described a simple model for an interconnect in terms of its delay and energy dissipated. We have obtained a near optimal yet simple solution for delay minimization of a 2-pin net by simultaneous buffer insertion and uniform wire sizing. Our architecture for a buffered wire has three stages, the first and the last comprising of cascaded buffers and an intermediate section of equally spaced buffers with fixed wire width. We have further simplified the interconnect model and have shown that for a given metal layer, a single wire width and middle section buffer width can be adopted over a range of wire-lengths. We have successfully expressed the wire delay as a function of the wire-length and the load capacitance. This simple model of the wire would aid us in developing other building blocks for the wire at higher levels of abstraction.
We have, so far, used the pessimistic Elmore delay model for the delay calculations. We now plan to use the SPICE model to optimize our wire architecture. Also, an interesting extension would be to use our wire model for multi-pin nets. We intend to exploit the simplicity of our solution in addressing other on-chip signaling issues and to make VLSI design interconnect aware.
Acknowledgement

