Abstract-This paper addresses the critical problem of global wire optimization for nanometer scale very large scale integration technologies, and elucidates the impact of such optimization on power dissipation, bandwidth, and performance. Specifically, this paper introduces a novel methodology for optimizing global interconnect width, which maximizes a novel figure of merit (FOM) that is a user-defined function of bandwidth per unit width of chip edge and latency. This methodology is used to develop analytical expressions for optimum interconnect widths for typical FOMs for two extreme scenarios regarding line spacing: 1) spacing kept constant at its minimum value and 2) spacing kept the same as line width. These expressions have been used to compute the optimal global interconnect width and quantify the effect of increasing the line width on various performance metrics such as delay per unit length, total repeater area and power dissipation, and bandwidth for various International Technology Roadmap for Semiconductors technology nodes.
In order to achieve improvement in performance, designers tend to use wires which are wider than minimum-sized global interconnects prescribed by the technology. Increasing the width of the interconnect proportionally reduces its resistance per unit length and also increases the line capacitance per unit length. However, for global interconnects in nanometer technologies, where the aspect ratio of wires is approximately 2-2.5, the increase in width results in a reduction in the resistance-capacitance (RC) time constant of the line and therefore improves delay per unit length [7] . However, these "fat" wires take up a lot of routing resources and using fat wires can adversely affect the wireability of the chip. For further improvement in performance, the spacing of global interconnects can also be increased which, to some extent, offsets the increase in line capacitance due to increasing line width. However, this increase in spacing will further degrade the wireability of the chip. Furthermore, the delay per unit length for wide wires may degrade due to inductance effects as well. Therefore, in determining the wire widths at the global tier, the number of interconnects per unit chip edge should also be taken into account along with the delay per unit length. For instance, the ratio of the number of interconnects per unit chip edge and the delay per unit length, which represents the rate of data transfer per unit chip edge, can be a useful metric to optimize. This paper introduces a new methodology for determining the optimum width of global interconnects for a given technology, which maximizes a user-defined figure of merit (FOM), which is a known function of delay per unit length and the rate of data transfer per unit chip edge. As a first step, we develop semi-analytical expressions for line capacitance per unit length as a function of line width and spacing. Using these models, in Section II we obtain the functional dependence of delay per unit length of an optimally buffered interconnect online width. This, in turn, results in the functional dependence of the given FOM on the line width which is analytically optimized to yield the optimum interconnect width. We carry out this optimization for various FOMs and various International Technology Roadmap for Semiconductors (ITRS) technology nodes for two extreme scenarios: 1) Interconnect spacing is kept at its minimum, and 2) interconnect spacing is kept equal to the interconnect width. The optimization results indicate that the rate of data transfer per unit chip edge is very close to optimum when the line width is minimum as prescribed by the ITRS. However, in order to optimize a different FOM, line width needs to be increased. We also quantify the improvement in delay per unit length, total repeater area and power dissipation and the degradation in the per unit width bit transfer rate for this optimum width compared to minimum width lines. We show that these improvements are fairly insensitive to technology scaling.
II. METHODOLOGY
Consider a uniform interconnect of resistance per unit length and capacitance per unit length buffered by identical repeaters as shown in Fig. 1 . Assume that for a minimum sized repeater, the input capacitance is , the output parasitic capacitance is and output resistance is . Therefore for a repeater of size , the total output resistance , the total output parasitic capacitance and the total input capacitance is
. If the line segment is of length and the repeater size is , then the time-constant of that segment is [8] and the latency or the delay of that section is . Now consider a long interconnect of a given length which is uniformly buffered with inter-buffer interconnect length . Therefore the total number of segments is . The total delay through that line is given by where is the delay per unit length which is given by
Note that optimizing the delay of the interconnect of a fixed length is equivalent to optimizing . This delay per unit length is optimal when and is given by Note that this optimal delay per unit length is a function of interconnect parameters and which in turn are a function of interconnect width and spacing . In the present study we are not explicitly considering cross-talk. The effect of cross-talk would be to change the value of depending on whether the neighboring interconnects are quiet or are making a transition. For
Metal n -1
Metal n global interconnects, this assumption is somewhat justified because long global interconnects would be properly shielded to yield predictable delays and therefore can be assumed to be a function of interconnect geometry only. Earlier studies for quantifying the optimal buffering schemes for optimal delay per unit length [5] always considered minimum sized global wires. However, for further improvement in performance, the designers have a option of increasing the wire width and/or spacing. This increase in for a given will result in a decrease in line resistance per unit length and an increase in line capacitance per unit length . However, the decrease in is much more than the increase in and therefore the optimal delay per unit length will decrease. However, increased pitch will imply a decrease in wireability of the chip. Some previous work such as [9] can be found in the literature on wire sizing for delay and power optimization. However, these authors considered a discrete set of wire widths and also neglected both the leakage and the short-circuit power in their power estimations. Wire width optimization has also been considered by [10] , and [11] . However, as pointed out in the next paragraph, the model for interconnect technology in [10] is not realistic and the metric for optimization in both these approaches is also not flexible and is not applicable for a wide variety of design characteristics and hence their optimization results may not be meaningful. Furthermore [11] do not provide any model for the interconnect delay and power dissipation and therefore their formulation is not very transparent.
We will consider two scenarios. In the first case, line width can be changed but the line spacing is kept constant at . In this case, increasing the wire width will not strongly degrade the wireability of the chip. In the second case, the line spacing will be kept the same as line width for all . The second case is a less popular option for designers but will act as a limiting case. We assume that the line thickness and the interlayer dielectric thickness (Fig. 2 ) cannot be changed. This is in contrast with [10] where it was assumed that and can be arbitrarily varied, which is not realistic, since, for a given process technology and a given layer, and typically cannot be changed by the designers while they are free to choose any and . It has been shown that with ITRS scaling scheme, the minimum sized global interconnects are becoming increasingly resistive and the inductive effects are decreasing rapidly [12] . It was also shown that inductive effects on delay may become significant only if line widths are greater than . Therefore, initially we will assume that inductance effects can be ignored for the purpose of delay and power dissipation calculation and verify for the computed optimum line widths whether this is indeed the case or not.
Let denote the bandwidth, i.e., the rate at which bits can be transmitted across a unit length of interconnect in a given chip edge or width. The rate at which bits can be transmitted per unit length by one interconnect is inversely proportional to the delay per unit length, i.e., rate bit of transmission We assume that the lines are always optimally buffered for a given line width. Therefore rate bit of transmission
The number of such lines present in a given chip edge is chip edge metal pitch and metal pitch . Therefore
The aim of a global interconnect design scheme is to have a large while having a small delay per unit length. As an example, an appropriate FOM to maximize can be for some
. Larger values of would imply more importance to delay per unit length at the expense of the rate of bit transfer per unit width. In our study we carry out the analysis for , 1, and 2. In other words FOM Line resistance per unit length is inversely proportional to line width Line capacitance per unit length is also a function of , i.e., . Using the above, the expression for the optimal delay per unit length in terms of can be written as where is a constant for the given metal layer. Therefore
FOM
Setting the derivative of this with respect to to zero, it follows that satisfies the following equation (1) Note that the optimum width is only dependent on line capacitance and line spacing. The optimum delay per unit length for this optimum line width is given by
As expected, as increases, the delay decreases and asymptotes to a constant value for large values of . The interbuffer interconnect length can be written in terms of interconnect width as As the optimum line width increases, the interbuffer interconnect length increases initially and then asymptotes to a constant. This implies that for a given line length, the number of repeaters reduces. The buffer size is given by
The repeater area of a single interconnect is proportional to and is inversely proportional to , i.e., repeater area of a single interconnect
The total repeater area for a given metal layer is the product of the number of interconnects and the repeater area of a single interconnect. The number of interconnects on a metal layer is inversely proportional to the pitch. Therefore
Power dissipation per unit length for a single line is given by [13] where short circuit
Here, is the power supply voltage, is the clock frequency, is the switching factor (or activity factor), which is the fraction of repeaters on a chip that are switched during an average clock cycle, is the leakage current per unit NMOS (PMOS), is the width of the NMOS transistor in minimum sized inverter, and short circuit is the per unit width short circuit current. We assume , short circuit and . The total power dissipation per unit length in global interconnects of a given layer is the product of the above quantity with the number of global lines which is inversely proportional to . The total power dissipation can be expressed as Note that this has the same form and the dependence of total repeater area on line width. Therefore increasing the line width decreases both total repeater area and power dissipation by the same amount.
We now consider the following two cases separately. 1) Minimum-spaced lines.
2) Line spacing is the same as line width.
A. Minimum-Spaced Lines
In an interconnect system, the technology determines the interlayer dielectric thickness, the metal line thickness, minimum metal width and the minimum spacing at a given metal layer. For higher performance or throughput, one can increase the line width in order to decrease the line resistance. However, in order not to severely limit the wireability of the chip, the wires should be minimum spaced. This is specially true for deep submicron technologies where the designs are mostly wire-limited at the global tiers. Since the aspect ratio of minimum sized global interconnects is two to three, the interlayer dielectric thickness is 2-3 times larger than the minimum inter-wire spacing on a given metal layer and the adjacent metal layers are orthogonal to each other [1] it implies that is typically much larger than (Fig. 2) . Therefore, increasing the line width without changing the spacing is not going to significantly increase the interconnect capacitance. For instance, as shown in Fig. 3 , for the 130 nm technology, if the global line width is increased from to , the interconnect capacitance per unit length increases only by 22%. This is due to the fact that the parallel plate component of to the upper and lower metal layers, which is proportional to line width, is a small fraction of the total line capacitance for a minimum sized wire (Fig. 2) .
Line capacitance per unit length can be written as where represents the total fringing capacitance and sidewall capacitance which is independent of and represents the parallel plate capacitance to the top and bottom layers. Also . Therefore from (1) Note that for , the FOM is the rate of bit transfer per unit width itself. Therefore for minimum spaced lines, the rate of bit transfer per unit width itself has a maximum for a particular line width given by the above expression. This is in sharp contrast to the findings in [10] , where it was reported that asymptotes to a fixed value as width decreases.
B. Line Spacing Equal to Line Width
This case is similar to the previous one except that the line capacitance is given by where the first term represents the constant fringing capacitance, the second term represents the parallel plate capacitance to top and bottom layers of metal which is proportional to the width and the last term represents the parallel plate capacitance to the neighboring wires which is inversely proportional to the spacing. Also . For this case, from (1) For , the optimum width is zero, which means that the FOM which is also the rate of bit transfer per unit width keeps increasing as reduces and should be kept minimum sized for maximum .
III. PARAMETER EXTRACTION
We used FASTCAP [14] to extract the capacitance per unit length for global interconnects for ITRS2001 technology nodes up to 45 nm. For both cases, (i.e., when lines are assumed to be minimum spaced and when line spacing is equal to Device parameters were extracted using SPICE simulation similar to [12] . A five stage ring oscillator with a given length of global interconnect of width in between each stage was simulated. The interconnect length and inverter size were varied to obtain the minimum stage delay per unit length. , and were calculated from these values of , and .
IV. RESULTS
The methodology outlined above was used to optimize global interconnect width for maximum FOM for ITRS 2001 technology nodes up to 45 nm. Device models were found to be extremely unreliable at 32 nm and 22 nm nodes and therefore were not included in this study. NMOS and PMOS off currents were estimated similar to [15] . The relevant technology parameters are shown in Table I . was assumed to be equal to across all technology nodes. Table II shows the calculated optimum width as a ratio of for various technologies for all cases. Note that for minimum spaced lines the optimum value of which maximized is approximately 13% less than for all technologies. This is clearly not feasible, however, as shown in Fig. 5 , is only 0.3% lower at than the optimal value. This was also found to be true across all the technologies considered.
Also note that for all cases, the optimum interconnect width is less than so inductance effects are not significant. To further verify this, Fig. 6 plots the critical inductance (see Appendix) as a function of line inductance for minimum width and 7.5 minimum width global line for 130 nm technology node. As pointed out in [12] , if line inductance is less than then the interconnect system is overdamped and inductive effects are negligible. From Fig. 6 , we observe that even for , the interconnect is overdamped for most practical range of line inductance values nH/mm). However, this may not be true for . Further, note that values are similar across all technology nodes when line spacing is kept minimum for and 2, while they increase with technology scaling when line spacing is equal to line width. Also note that we have not included the trivial and infeasible result for and . Also for and , the FOM at is only 0.3% lower than the optimal value at which is approximately . In the following series of results (shown in Tables III-VI) we always report the ratio of performance metrics at and the corresponding value at . Therefore we will exclude case for from now on. Table III shows the optimum delay per unit length at as a fraction of the optimum delay per unit length when . As expected, increasing the width of the wires reduces the delay per unit length significantly. Note that when the line width is increased from to , (corresponding to for both cases in Table II) , the delay improvement is significant. However, as the line width is increased further to 7-8 (corresponding to for both cases in Table II) , the incremental improvement in delay is not as significant. This is expected since and as becomes very large, the line capacitance is dominated by the parallel plate component of , i.e., . Also note that the relative improvements in delay are not very sensitive to technology scaling. Table IV shows the total repeater area for all interconnects at the global tier when as a fraction of the total repeater area for all interconnects at the global tier when . As pointed out earlier, this fraction is also the ratio of the total power dissipation of all repeaters at the global tier when and the total power dissipation of all repeaters at the global tier when . It can be observed that as the line width is increased (as increases as per Table II), the total repeater area (and power dissipation) decreases dramatically, even though the size and therefore the area (and power dissipation) of a single repeater increases. This is due to the fact that the wider wires result in a large increase in optimal interbuffer interconnect length and also fewer number of interconnects at a given tier. Therefore the total repeater power dissipation reduces dramatically. Table V shows the rate of bit transfer per unit width at as a fraction of at . As indicated in Fig. 5 peaks at and is only 0.3% lower at . Therefore if the primary goal of the design is to maximize , then minimum sized, minimum space wires as prescribed by the ITRS should be used. However, if the delay needs to be improved, then wire width should be increased at the expense of . Note that the ratio of with and is fairly insensitive to technology scaling. Table VI shows the ratio of the optimized FOM when and the FOM when . If this ratio was very close to 1, it would imply that the above-mentioned optimizations were not significantly improving the user-specified FOM and therefore were not very useful. However, in Table VI we find that these ratios are very different from 1, indicating a nontrivial improvement in the FOM at compared to which further emphasizes the utility of these optimizations. Also note that except for the second case with , the improvement in FOM at the optimum width is fairly insensitive to technology scaling.
V. CONCLUSION
In conclusion, we have developed a new methodology for optimizing global interconnect width which maximizes a userspecified FOM, which is a function of the data-rate per unit chip edge and interconnect delay per unit length. Using this methodology we have developed expressions for optimum interconnect widths for typical FOMs for two extreme scenarios regarding line spacing: 1) spacing kept constant at its minimum value and 2) spacing kept the same as line width. We have used these expressions to compute the optimal global interconnect width and quantified the effect of increasing the line width on delay per unit length, total repeater area and power dissipation and bandwidth. As expected, an increase in the line width decreases the optimal delay per unit length (i.e., decreases latency), total buffer area and power dissipation, but severely degrades the rate at which bits can be transmitted per unit chip edge, i.e., bandwidth. We also observed that in most cases, the relative increase in the line width (from to ), the relative improvement in delay per unit length, total repeater area and power dissipation, and the relative degradation in the datarate per unit chip edge are fairly insensitive to technology scaling. This work will have significant implications for signaling and design optimization for global interconnects in future nanometerscale technologies.
APPENDIX CRITICAL INDUCTANCE
Consider a uniform line with resistance, capacitance and inductance per unit length of , , and , respectively, driven by a repeater of series resistance and output parasitic capacitance , and driving an identical repeater with load capacitance (Fig. 7) . For a given technology, let the output resistance, output parasitic capacitance and input capacitance of a minimum-sized repeater be , and respectively. Therefore if the repeater size is times the size of a minimum sized repeater, , and . The transfer function derivation is outlined here from [12] for completeness. The ABCD parameter matrix for a uniform transmission line of length is given by [12] where is the complex frequency , and Therefore the ABCD parameter matrix of the configuration in Fig. 7 is given by the equation shown at the bottom of this page, and the input-output transfer function is given by the first equation shown at the top of the next page. The step response of this system is given by in the Laplace domain. However, computing the response in the time domain is analytically intractable. The above transfer function is therefore approximated by a second order Padé approximation as (2) where The 50% delay is given by This transfer function can be used to calculate the 50% delay [16] . Long VLSI interconnects are typically broken up into buffered segment of equal lengths and driven by identical repeaters. For minimum total delay in these long interconnects, the delay per unit length in the optimally buffered segment should be minimized. The driver size and interconnect length can be numerically optimized to give minimum delay per unit length [16] , [17] .
The second order transfer function given by (2) and discussed in [16] , [17] can be critically damped, overdamped, and underdamped when is equal to, greater than, or less than zero respectively. The response of an overdamped system is very similar to an RC line whereas for an underdamped system, the behavior is significantly different from an RC line, i.e., inductive effects are significant. Since and are functions of and and is a function of , it has been shown [16] that for optimum values of and where interconnect delay is minimum for a given line inductance, a value can be obtained for which the system will be critically damped [16] . If line inductance is less than , the system will be overdamped where as if line inductance is greater than , the system will be underdamped, as specificed in the second equation shown at the top of the page. 
