Abstract-This paper presents a detailed empirical study and analytical derivation of voltage waveform and energy dissipation of global lines driven by CMOS drivers. It is shown that at high clock frequencies where the output voltage at the termination point of the transmission line may not reach its steady-state value during the clock period, it is possible to reduce energy dissipation while meeting a dc noise margin by driver sizing. This is in sharp contrast with the steady-state analysis, which states that driver size has no impact on the energy dissipation per output change. In addition, we propose a new design metric which is the product of energy, delay and some measure of ringing in lossy transmission lines. In particular, this paper provides closed-form expressions for the energy dissipation, 50% propagation delay, and the percentage of maximum undershoot when the circuit exhibits an underdamped behavior. This metric is used during the driver sizing problem formulation for minimum energy-delay-ringing product. The experimental results carried out by HSPICE simulation verify the accuracy of our models.
that the global wires exhibit transmission line effects including electromagnetic coupling. On the other hand, as technology sizes continue to decrease, many new effects are being observed due to the use of nanometer technologies. Some significant deep sub-quarter-micrometer effects are caused by increasing cross-coupling capacitance and coupling inductance. So far, the well-known model has been used as an interconnect energy model, where includes the capacitance of the interconnect as well as the capacitances of driving and driven circuitries, and is the voltage swing. This model, however, fails to predict the interconnect energy dissipation in the current range of clock frequencies, where the signal transients do not settle to a steady-state value due to the small clock cycle-time. Moreover, this model does not consider coupling noise being imposed by neighboring wires as well as other transmission line properties. As we will see in this paper, these effects must be taken into account in the energy calculations, that will otherwise lead to erroneous results. An analytical interconnect energy model with consideration of event coupling has been proposed in [3] . The authors used nodal equations for a system of interconnects to obtain the state vector of the system. The state vectors were utilized in the interconnect energy dissipation expression. This approach does not capture the transmission line effects. It also assumes that the system reaches the steady state. In [4] , authors showed that using distributed RC circuits do not capture all behaviors of lossy transmission lines that can be captured otherwise using the transmission line equations. Taylor et al. proposed a deep submicrometer (DSM) aware power estimation methodology using a three-wire lookup table [5] . The dissipated energy of each individual interconnect is computed considering capacitive coupling effects of the immediate adjacent wires. Using a detailed SPICE simulation of all possible types of transitions on a group of three adjacent wires, a three-wire lookup table was created. To obtain the total energy dissipation, the sum of energy dissipations of each individual interconnect was computed [5] . [5] , however, does not consider the transient behavior of the interconnect in the energy calculations.
In this paper, accurate expressions for the energy dissipation of coupled interconnects are obtained while addressing the transmission line effects on the energy dissipations. We provide empirical evidence as well as detailed closed-form analytical expressions for the energy dissipation of a lossy transmission line which is driven by a CMOS inverter and is terminated by a CMOS load. We show that this circuit configuration exhibits behavior similar to a new RLC circuit topology, and therefore, it is possible to reduce energy dissipation at a given clock frequency by driver sizing. The effect of driver sizing is to change the output behavior (from overdamped to underdamped or vice versa). By careful selection of ratio of the line driver, we may thus reduce the overall energy dissipation of the line and line driver. This is accomplished by forcing the input transition to initiate when the output is in the undershoot region (for the underdamped case), or by forcing the input transition to take place when the output has met the dc noise margin, but not yet reached the steady state (for overdamped case). Clearly, there are some critical design metrics that need to be taken into consideration during the driver sizing, such as propagation delay and the ringing. We, therefore, present a driver sizing technique that minimizes energy-delay-ringing (EDR) product.
Section II presents a 2-circuit model for the lossy transmission line; an RLC circuit configuration called RLC-circuit, and an RLC circuit. In Section III, the RLC-and the RLC circuits are utilized to derive the total energy dissipation of a transmission line driven by a CMOS inverter for large 's and small 's of the driving transistors, respectively. In Section IV, a new metric, the EDR product, is introduced. It will be shown that the EDR product will be very relevant for the energy optimization under the delay constraint. Simulations and experimental results provided throughout this section confirms the accuracy of our model and the usefulness of our metric. Finally, Section V presents the conclusions of our paper.
II. ENERGY DISSIPATION OF PASSIVE RLC CIRCUITS
A common way of studying the parasitic effects of an on-chip interconnect on the performance of a VLSI circuit is to model it using a large number of cascaded ladder RLC circuits. Therefore, a relevant starting point to study the energy dissipation of on-chip interconnects is to investigate the energy dissipation of a passive RLC circuit, depicted in Fig. 1 , that is excited by a unit-step voltage. Depending on the relative values of the circuit elements, this circuit exhibits one of the two possible transient responses as also depicted in Fig. 1 . The total energy delivered by the input source to the passive circuit is as follows:
In the next two sub-sections, we obtain the total as well as the dissipated energy for both underdamped and overdamped RLC circuits. 
A. Energy Dissipation of an Underdamped RLC Circuit
In the underdamped case, the voltage and current transient waveforms oscillate toward their steady-state values. This transient behavior occurs when . In terms of energy, the stored energy in the capacitor and/or in the inductor is being transferred back and forth between reactive elements. If the circuit is lossless ( ), this energy transfer will be performed endlessly. However, with a resistor being present in the circuit, a portion of the energy is dissipated in the resistor. To obtain the energy dissipated by the circuit, we first obtain the total energy generated by the input source (2) where is the current flowing through the underdamped circuit [in Fig. 1, ]. Suppose that the input source to the RLC circuit is a periodic rectangular waveform, which is almost the case in digital integrated circuits. The total energy delivered by the input source during a low-to-high transition of the input source is as follows: (3) where , the damping constant, is , , the resonant frequency, is , and , the oscillation frequency, is equal to . Fig. 2 shows the energy variation as a function of the fundamental period, , for an underdamped RLC circuit excited by a periodic rectangular voltage signal. Note that for small periods, the energy model gives rise to a wrong value. The dissipated energy in the low-to-high transition of the input source is (4) In (3) and (4), . As increases, the second term inside the bracket becomes smaller, and in the limit, the energy expression simply becomes .
B. Energy Dissipation of an Overdamped RLC Circuit
In the overdamped case, the resistor is sufficiently large (i.e., ) to eliminate resonances from current and voltage waveforms. The total energy delivered by the input source is the same as (1), which is rewritten here for convenience as (5) where is the current flowing through the overdamped circuit [in Fig. 1, ]. Similar to the underdamped case, consider a periodic rectangular waveform at the input. The total energy delivered by the input source is (6) Fig. 3 shows the energy variation in terms of the variation in the fundamental period. The error caused by using the model in the overdamped case is smaller than that in the underdamped case. However, in practice, the underdamped response occurs more frequently in practical situations, because the on-chip interconnect resistive loss is small, particularly for circuits fabricated in copper technology.
The energy dissipated in the low-to-high transition of the input source will be as follows: (7) In (6) and (7),
. Once again, as increases, the energy expression approaches . 
C. Frequency-Domain Analysis
Some observations can be made from the foregoing analysis. First of all, the energy dissipation of a passive RLC circuit excited by a unit-step input of amplitude is irrespective of the circuit response (i.e., overdamped or underdamped). From another perspective, the capacitor charges up to the input step voltage, , and in the steady state is modeled as an open circuit. Therefore, the total stored energy appears as electric field energy across the capacitor (
). An important task is to find a simple equivalent circuit corresponding to a given RLC circuit that can be directly utilized to obtain the total energy generated by the input source. To find such equivalent circuit, first consider the driving-point admittance of RLC circuit of Fig. 1 (8) represents the equivalent dc driving-point admittance of the circuit in the steady state. This simple observation will be used later during the simplification of the driving-point admittance as well as the derivation of the energy dissipation of a coupled lossy transmission line. Simple calculation reveal that for the RLC circuit shown in Fig. 1 , . The current flowing to the circuit is thus an impulse function, and the total delivered energy by the source is as follows: (9) As a consequence, the energy transferred from the unit-step voltage source to the RLC circuit of Fig. 1 is the same as the energy delivered by the unit-step source to the low-frequency component of the driving-point admittance of this RLC circuit. For the RLC circuit in Fig. 1 , this low-frequency component is the capacitor.
As a generalization, consider a circuit consisting of RLC circuits in cascade that is excited with a unit-step voltage, as shown in Fig. 4(a) . The equivalent circuit for each RLC subsection solely consists of the capacitor of the RLC subsection. As a consequence, the equivalent circuit for the circuit of Fig. 4(a) is an all-capacitive circuit shown in Fig. 4(b) . The total energy delivered by the source in the steady state is (10) Equation (10) is in agreement with the well-known model for the on-chip interconnect. Although (10) reiterates a well-known concept, the discussion that was undertaken to derive (15) will, however, play a key role in our analysis of the interconnect energy dissipation.
III. ENERGY DISSIPATION OF LOSSY TRANSMISSION LINES
So far, our main attention has focused on the energy analysis of single passive RLC circuits. There are, however, two major questions that also need to be addressed. In present-day digital and mixed-signal integrated circuits, the global on-chip interconnects must provide the required connectivity and performance for clock rates of 1.0-3.0 GHz, which is in a microwave frequency range. This certainly demands a knowledge of electromagnetic-field theory to analyze the on-chip wiring effects. A related question that arises is whether the transmission line effects of on-chip interconnects can have any effect on the energy dissipation. On the other hand, high wiring density and high operating frequencies result in high capacitive and inductive coupling. Consequently, the second question is whether the electromagnetic coupling has any impact on the energy dissipation. This section addresses these questions.
The critical global interconnections, such as clock lines, control lines, and data buses (which can be 32-128 bits wide) between processor and on-chip cache reach more than 100 K connections [2] . The propagation delay of signals traveling through these global wires is comparable to the time of flight. In other words, the line length is comparable to the propagated signal wavelength, , which is on the order of 0.7-2.2 cm. This implies that transmission-line properties have to be taken into account. It was shown in [4] that any two uniform parallel conductors, the signal and the return paths, that are used to transmit electromagnetic energy can be considered transmission lines. The return path can be a ground plane, a ground conductor, or a mesh of ground lines on many layers. Solutions to Maxwell's equations for the electric and magnetic fields around conductors are current and voltage waves. The current and voltage wave solutions are a function of the characteristic impedance, , and the propagation constant, . Consider a single transmission line as shown in Fig. 5 . Note that the on-chip long wires are implemented in the top-level metal layer (for instance, metal 6 in 0.18 m technology). The top-level metal layers are far above the substrate and isolated from lower-level metal layers using interlayer dielectric material. Therefore, the shunt resistance is very large, and hence is orders of magnitudes larger than the capacitance reactance at current clock frequencies. Therefore, the per-unit shunt conductance is ignored in the circuit representation of Fig. 5 .
The voltage and current waves in the frequency domain at any point along the line are expressed as a combination of incident and reflected waves (11) (12) where . The load termination determines how much of the wave is reflected upon arrival at the wire end. The reflection coefficient, , determines the amount of the incident wave that is reflected back to the line as a result of impedance mismatch between the line and the load. (13) The concept of the reflection coefficient is generalized to define the reflected and incident quantities at any arbitrary point along the line.
The driving point impedance,
, of the lossy line terminated by a load impedance is the ratio of the voltage and current waves at the input source end (15) where is the line length. In the above equation, the load impedance is normally the input capacitance of driven CMOS circuits following the line.
To account for the electromagnetic coupling effects in the interconnect energy dissipation, the total line inductance and capacitance per unit length are modified accordingly. The effect of capacitive coupling is predicted by considering the switching transients of the immediate neighboring wires. The effect of nonadjacent lines are ignored because the capacitive coupling exhibits a near-field effect, and the adjacent aggressive lines behave as shield lines for nonadjacent wires. On the contrary, the inductive coupling exhibits a far-field effect. The nonadjacent lines have a considerable amount of inductive couplings on the victim line. This makes the analysis of inductive coupling particularly difficult. In addition, the current return paths cannot be easily configured in the circuit [4] . This causes the problem of inductive coupling to become even more complicated.
The effect of capacitive coupling is taken into account by using an alternative adaptation of the Miller theorem [6] for traveling wave equations, as also shown in Fig. 7(b) . Comparing voltage and current waveforms in Fig. 7 (a) with those in Fig. 7(b) verifies the accuracy of (16).
The inductive coupling between the lines is far more complicated phenomenon than the capacitive coupling. First of all, unlike the capacitive coupling, it presents a far-field effect, therefore, the inductive coupling of nonadjacent lines must be accounted for. To simplify the analysis, the inductive coupling is addressed for the bus structures in which constituting parallel lines have identical geometries and carry the same currents. For example, in a set of coupled bus lines, the total per unit length inductance of the th line that is magnetically coupled to other lines is (17) Note that if the currents are not equal, we cannot reduce an n-port coupling to a single scalar form, i.e., the coupling should be expressed in a matrix form.
According to (15), the input impedance of a transmission line is a nonlinear function of frequency. Unfortunately, direct substitution of this nonlinear expression into the energy equation (which is the integral of the voltage-current product) does not yield a closed-form expression for the energy dissipation of the lossy transmission line. Yet, it is possible to simplify (15), using similar observations in Section II.C, and obtain an accurate expression for the energy dissipation.
Observation 1: If the transitions of the input waveform to a circuit are sufficiently spaced apart so as to allow the circuit to come very close to its steady-state response, then the total energy delivered by the input source is obtained using the driving-point impedance of the circuit evaluated at low frequencies.
This observation is utilized here to simplify (15). We evaluate at low frequencies by expanding its Taylor expansion around and truncating higher order terms. Depending upon the order of the truncation, two stable equivalent circuits are extracted.
To account for the interconnect propagation delay, the signal transfer function from the input of the line driver to the output terminal of the interconnect must also be calculated. Therefore, in addition to developing an equivalent circuit for the drivingpoint impedance, an equivalent circuit must also be constructed for the signal transfer function. This will be discussed in Section III.B.
A. First-Order Truncation
The first-order Taylor expansion of is . This leads to the following approximated rational function: (18) where is the total interconnect capacitance including the Miller capacitance of the neighboring lines that are capacitively coupled to the line, and the interconnect-to-substrate capacitance. Using (18) a series RLC circuit is synthesized as depicted in Fig. 8 , where and are defined as follows:
(19) is the line resistance.
is the total inductance of the lossy line including the self-and mutual inductances and is obtained by (18). The inductive couplings between transmission lines are accounted for by an algebraic summation of each line's self-inductance and all mutual inductances between that line and other lines considering also the current direction flowing through the lines.
B. Second-Order Truncation
The second-order truncation of the Taylor series expansion of is for small values of (20)
Recall that is a function of . Replacing in (15) with its second-order truncation leads to the following relationship:
To find out the accuracy of (21) in predicting the frequency variations of the actual driving-point impedance of a lossy line, the driving-point impedance of a single lossy microstrip line driving a 100-fF capacitor is calculated using both (15) and (21). The impedance calculations are carried out for three different wire-lengths (length 0.5 mm, 2 mm, 9 mm). Fig. 9 demonstrates the comparison between the second-order approximation given by (21), and the actual impedance given by (15). On-chip global interconnects with a length of more than 1-2 mm are normally broken into smaller segments, and line buffers are inserted between these line segments. For the wire length of 0.5 mm, (21) accurately follows the frequency variation of the actual input impedance given by (15) over the whole frequency range of Fig. 9 . For the wire-length of 2 mm, the truncated second-order impedance successfully follows the variation of the actual impedance of the line for frequencies up to 12 GHz.
For the second-order rational function given in (21), a circuit can readily be synthesized. More specifically, for a lossy transmission line whose driving-point impedance near the dc frequency is expressed by (21), a stable equivalent circuit realization called RLC-is synthesized whose topology is demonstrated in Fig. 10 . The input impedance of the RLC-circuit shown in Fig. 10 is (22) where represents the series combination of and . The input impedance of the coupled lossy transmission line in Fig. 10 at the lower frequency range is (23) Note that does not introduce any transmission-zero to the transfer function, because --make a capacitive loop. This observation is also evident from (22). Fig. 11 shows the magnitude response of the driving-point admittance of a lossy transmission line which is electromagnetically coupled to a similar line. The line electrical parameters are also indicated in Fig. 11 . First, the circuit is simulated using star-HSPICE. In the next step, the magnitude response of the driving-point admittance of the equivalent RLC-circuit is calculated. According to Fig. 11 , this circuit accurately follows the frequency variation of the magnitude response of the line admittance at lower frequencies up to 32 GHz. Therefore, according to Observation 1, the energy calculation of the lossy transmission line using the RLC-circuit yield accurate results. Finally the magnitude response of the driving-point admittance of the equivalent RLC circuit is calculated and compared with those of RLC-equivalent circuit and the lossy coupled line, as also shown in Fig. 11 . The RLC-circuit models the lossy line more accurately than the RLC circuit over a broader range of frequencies. The discrepancy between the RLC-circuit and the lossy line at higher frequencies occurs due to the fact that the RLCcircuit is incapable of modeling the wave reflection phenomenon in a distributed lossy transmission line.
The RLC-circuit is also used to calculate propagation delay of the interconnect, as will be illustrated in Section IV.B. Using the RLC-circuit to model a lossy interconnect introduces error. Recall that time-domain ringing and plateaus occur due to the impedance mismatch between line driver and transmission line, and multiple wave reflections at the output terminal of transmission line. lumped circuits are not able to model the wave reflection phenomenon, which in turn leads to errors in the timeand frequency-domain estimations of the line. However, this error is negligible under the practical range of values for the line geometries and clock frequency. To verify this, the circuit in Fig. 10 is simulated using HSPICE. To highlight the accuracy of the RLC-circuit, a wire length of 4 mm is chosen, which is longer than the unbuffered lengths used in practice. The interconnect has a per-unit inductance of 870.53 nH/m, a per-unit length capacitance of 60.57 pF/m, and a per-unit length resistance of 41.83 m. The frequency-and time-domain responses of the voltage at the output terminal of the lossy line is compared with those of the RLC-circuit. Fig. 12(a) and (b) shows the results of these comparisons. In the frequency domain, the output voltage of the RLC-circuit closely matches that of the lossy line up to 40 GHz. In the time domain, the output voltage of the RLC-circuit exhibits the same propagation delay and rise and fall times compared to the output response of the line.
The propagation delay of the lossy line is derived using the voltage transfer function . Applying the same secondorder truncation as for the impedance on the voltage transfer function, yields
The equivalent delay RLC-model values for are specified as follows: 
IV. DRIVER SIZING FOR OPTIMUM EDR PRODUCT
Consider the circuit shown in Fig. 10 consisting of an inverter driving a lossy transmission line. The load is CMOS fanout gates that are connected to the output port of this lossy transmission line and are represented by their input gate-source capacitances. The electromagnetic coupling effects are treated the same way as discussed in Section III. The output voltage waveform at the load varies significantly as a function of the driver ratio. In practice, the output behavior may change with the driver size and interconnect wire sizing from an overdamped response toward an underdamped response. In other words, although the steady-state output voltage values are the same, the transient waveforms are drastically different depending on the electrical parameters of the line and line driver, which is, in turn, a function of the geometrical parameters of the line and line driver. Note that energy dissipation varies as a function of the clock cycle time [(3) and (6)]. If the output waveform has not reached its steady state at clock edges, the amount of energy dissipation in the clock cycle may be lower (if in the undershoot region) or higher (if in the overshoot region, when exhibiting underdamped behavior) compared to the steady-state value of . variations for four different driver ratios are shown in Fig. 13. Fig. 14 shows the energy dissipation variation per clock period for different driver ratios.
As a consequence, by changing the ratio of the driver we can change the characteristics of the output voltage and thereby, the amount of energy dissipation per clock period. There are three crucial design metrics to consider: 1) energy dissipation in a clock period; 2) 50% propagation delay of ; 3) the signal level of undershoot in an underdamped response which should be less than the noise margin (in a low-to-high transition, the noise margin is approximately the threshold voltage of the PMOS device, ). From Fig. 13 , one can observe that the overdamped response does not exhibit ringing, but exhibits a larger delay compared to an underdamped response. On the other hand, the underdamped response exhibits lower delay, but may cause dc noise margin violations. Nonetheless, the 50% propagation delay is not a good metric for an underdamped system due to the existence of damped oscillations. To take the effect of the circuit delay into account, we propose a new metric. For the overdamped response since all the wave-forms are monotonically rising or falling, the best performance metric is the product. However, for the underdamped response the delay must incorporate the settling time of the oscillations as well as the percentage of maximum undershoot for noise-margin violations. To come up with a unique metric for both the underdamped and overdamped responses, we use the product. Considering "
" as ringing factor, we call the cost function, "EDR product."
A. Analytical Derivation of Output Voltage and Energy Dissipation
Papers [7] , [8] , and [9] attempted to find the optimum driver size to minimize the EDR product. However, they could not find a unified closed-form expression for energy, delay and ringing. Subsequently, they were not able to obtain an analytical solution for the optimum drive size. In contrast, in this paper, we provide a closed-form driver sizing solution that minimizes the EDR product.
Consider the circuit in Fig. 10 , that is composed of an inverter driving a lossy transmission line. The load is another CMOS gate that is connected to the output port of the lossy transmission line. The electromagnetic coupling effects are treated the same way as discussed in Section II. Due to the changes in the operation regions of the NMOS and PMOS transistors of the line driver during the low-to-high and high-to-low transitions of the driver's output, we must distinguish between low-to-high and high-to-low transitions of the driver's output. During the low-to-high transition of the driver's output voltage, the PMOS transistor is conducting and provides a low-impedance conduction path from the supply to the load. During the high-to-low transition of the driver's output voltage, the NMOS transistor is in the "ON" position and no additional energy is transferred out of the power supply.
We calculate the energy transferred out of the power supply during a low-to-high transition. This energy is the total energy dissipated per clock period for a CMOS gate driving another CMOS circuit through a lossy coupled transmission line. The energy delivered by the power supply through the gate in a low-to-high transition of the driver's output voltage is specified as follows: (26) where is the current flowing from the power supply to the load and through the PMOS transistor during the low-tohigh transition of the output. The current is obtained using the driving-point admittance of the circuit: (27) where is the driving-point admittance seen from the power supply terminal to the source connection of the PMOS transistor of the driver. Fig. 15 shows the equivalent simplified model of the circuit shown in Fig. 10 .
As for the inverter, it is known that the operating regions of the conducting transistors change during the input transition. This change of the operating regions makes the analysis cumbersome. As an underlying assumption, the conducting transistors of the driver operate in the triode region for a large portion of the transition time if the line driver has a sufficient current drive capability [10] . As a result, we assume that the conducting transistor will be in the triode region for the entire input transition, and is modeled as an ideal switch in series with its drain-to-source resistance, , along with the equivalent capacitance, . is a parallel combination of the diffusion capacitances of PMOS and NMOS devices (Fig. 15) .
The structure of the RLC-circuit makes the impedance calculations straightforward. For instance, the diffusion capacitances of the driving CMOS circuits are placed directly in parallel with the capacitor of the RLC-circuitry and consequently, no additional calculation is required. Considering , we, first, derive transfer function of the circuit depicted in Fig. 15 (28) where Fig. 16 . Comparison between the output voltage calculated using (28) and approximated one using (29) for both underdamped and overdamped cases.
From (28), it is readily seen that the system is represented using a third-order transfer function equation. The closed-form analytical solution for energy and delay of a third-order system is too complicated, sometimes the numerical analysis is the only possible solution. On the other hand, as will be explained later, for different interconnect lengths and driver ratios, the output voltage of the circuit behaves like a second-order circuit over the frequency range of interest. Several experiments, shown later in this section, using practical geometrical values of an interconnect and line driver in CMOS technology prove that the magnitude of appeared in the transfer function is negligible compared to lower order coefficients over the frequency range of interest. We can therefore neglect the effect of and use the second-order approximation as (29) Using the this approximation, we will find closed-form expressions for the output voltage waveform and the energy of the system.
The above second-order approximation for the third-order denominator of (28) is valid in nanometer technologies as the device dimensions are scaled down to a few tens of nanometers, because parasitic capacitances of device and interconnect scale down with technology lowering the third-order coefficient in (28).
increases with the technology scaling, however with smaller scaling factor due to the fact that the mobility degradation and voltage scaling are partially compensated by the reduction of the gate-oxide thickness. As an example, Fig. 16 shows the approximated output waveforms for two different wire lengths and driver ratios, with an step function at the input of the line driver. The average error between actual output and approximated output waveforms for different interconnect lengths and driver ratios is depicted in Fig. 17 . The normalized error between the actual output voltage, , at the output terminal of the lossy interconnect driven by CMOS inverter, and the approximated output voltage, , is calculated as follows:
The average error for experimental data derived from 2050 runs of simulations is 0.93% over three different frequency ranges, 1 GHz, 1.66 GHz, and 2 GHz. The maximum error was reported to be 3.4%.
According to the above simulation results, we can use the equivalent second-order approximation and find the output waveform, while introducing a negligible approximation error. Similarly, for the driving point input admittance , we have (31) where:
Substituting the approximated transfer function of (11) in (13), leads to the following:
Similar to any second-order circuit, we will distinguish between the overdamped and underdamped responses and analytical models will be derived for both underdamped and overdamped responses.
1) Overdamped Response:
In the overdamped case and the output resistance of the line driver are sufficiently large so as to eliminate frequency resonance modulated on the current and voltage waveforms. In fact, if , the equivalent circuit model for the line and line driver in Fig. 16 will exhibit an overdamped response. To obtain the total energy transferred out of the power supply using (26), the input current to the circuit is first obtained by solving the characteristic differential equation of the RLC-circuit, which is an overdamped decaying waveform in this case. The voltage response at the output terminal of the RLC-circuit and the total energy are are the same as the ones described above and . It is seen that if a CMOS inverter driving a lossy coupled line undergoes an underdamped oscillatory response, and if , then the energy expression becomes:
B. Analytical Derivation of Delay and Ringing
Earlier in Section IV, a new cost function named EDR product was proposed for the purpose of providing a unifying definition of the energy-delay product for both underdamped and overdamped responses of the circuit. To find the optimum point in EDR product, we also need to find the 50% delay and ringing. However, calculating these time-domain parameters is straightforward, once we have the voltage response of the circuit in Fig. 15 . The 50% propagation delay for any of two possible responses (i.e., underdamped and overdamped) is calculated by setting in (15) and (16), respectively. No ringing phenomenon exists for the overdamped case. For the underdamped case, the ringing is defined as the time at which the output waveform experiences its first minimum value, which is given by:
Ringing
(39)
C. Driver Sizing for Optimum EDR Product
Using (34), (37), and (39), the EDR product is calculated for the equivalent circuit shown in Fig. 15 that models a lossy interconnect driven by CMOS inverter. Consider and , where and are constants that are dependent on the technology, and is the driver size. The optimum driver size is derived by solving the following equation: Equation (40) is a nonlinear equation with respect to the driver size. Hence, the optimum solution is obtained using numerical solution of this nonlinear algebraic equation.
Remember that electrical parameters of a lossy transmission line are a function of the geometrical parameters of that line such as wire width, wire thickness and wire length. Similarly, is a function of the MOS gate aspect-ratios. Subsequently, the energy dissipation of a lossy line driven by a CMOS inverter is a function of line and driver physical parameters. Using BSIM3v3 -equations for the MOS transistor and accurate closed form expressions derived in [10] , the energy is thus expressed in terms of the geometrical parameters of the interconnect and MOS transistors.
Comparison between our analytical models for delay and energy with HSPICE simulations are demonstrated in Figs. 18(a) and(b) and 19(a) and (b), respectively. From these experiments, it can be seen that for different interconnect widths and driver sizes, our delay equation exhibits very high accuracy (i.e., only 1.7% average error) and that the energy equation shows a mere 4.3% average error. Fig. 20(a) shows the energy variation as a function of the fundamental period, , and for the five different gate aspect-ratios of the driver. The interconnect width is fixed at 5 m. Fig. 20(b shows the energy variation with respect to the clock period for the five different values of the metal widths of the interconnect under a fixed driver of 60. The input is a periodic rectangular voltage signal. Note that for small clock periods, the energy model gives rise to a wrong value. As the metal width of the interconnect decreases, the transient variations of dissipated energy per clock period in terms of the clock period gradually changes from a damped oscillatory function to a damped exponential function. Note that due to the large gate aspect ratios, the overdamped response is rarely observed because the line driver's resistance will be very small, in practice. The same statement is true when the W/L of transistors is decreasing. Varying the transistor size and line width will vary the steady-state value of the energy dissipation as being expected. Fig. 20(a) and (b) suggests that for a given clock period, we can change the transistor sizes as well as the interconnect metal width such that the dissipated energy per clock period attains its undershoot value which is beneficial from both the speed and energy point of view. This is, however, a difficult task in practice, because process variations cause a deviation from the optimum undershoot value.
Having accurate expressions for the energy dissipation of a lossy transmission line driven by CMOS drivers helps us propose a new design guideline for an area-efficient wire and transistor sizing to achieve the minimum energy under the noisemargin constraint. However, considering the energy dissipation alone is misleading. In other words, performing wire and transistor sizing to achieve the minimum energy may result in unacceptable delay and insufficient voltage swing, and as a result may lead to the logic and the circuit failures. Fig. 21 shows the EDR product per clock cycle of an inverter driving a lossy transmission line with a pure capacitive load termination. The small incremental positive slope of the EDR product metric with respect to the is due to the direct relationship between the energy and the diffusion capacitance of the device.
As shown in Fig. 21 , EDR product changes for different driver ratios. Furthermore, It is also dependent on the clock period, as domesticated in Fig. 20(a) and (b) . Fig. 21 shows that for a predetermined interconnect width, there is an optimum driver , which minimizes the EDR product. In the energy calculations of interconnects driven by CMOS circuits, it was normally assumed that transients in the current and voltage waveforms have been settled to steady-state values and the energy was thus simply equal to . Sections II.A and II.B showed that this expression can yield quite an inaccurate result for the dissipated energy of the interconnect in high-frequency ULSI circuits. Figs. 22 and 23 show that modeling a lossy transmission line with a single RLC circuit do not still provide accurate results for energy dissipation analysis of a lone lossy line driven by a large CMOS inverter in both underdamped and overdamped cases. These figures show the dissipated energy of a single lossy transmission line for various line lengths when the line is modeled by the RLC-circuit and compare it with that obtained using a single RLC circuit. For small clock cycles, the RLC circuit model is unable to give a good energy estimate. This is true for both overdamped and underdamped circuits. Figs. 22 and 23 also reveal that for both underdamped and overdamped circuits when the clock cycle time is sufficiently long, the results obtained by energy calculations in RLC and RLC-circuits are both closely equal to .
V. CONCLUSION
This paper presented accurate expressions for the interconnect energy dissipation and propagation delay in high performance ULSI circuits. We showed that at high clock frequencies, where the output voltage at the termination point of a transmission line, does not reach its steady-state value during the clock period, it is possible to reduce energy dissipation while meeting a dc noise margin by driver sizing. It was shown that this phenomenon is mostly due to the voltage behavior of the transmission line at its termination point which may correspond to either underdamped or overdamped behavior. More precisely, if the clock period is chosen such that the output voltage for the underdamped case is in its overshoot region, then energy dissipation per transition in the clock cycle will be higher than . On the other hand, if the output voltage is in the undershoot region when the clock changes, the energy dissipation per transition in the clock cycle will be lower than . Of course for the overdamped case, energy is always less than or equal to . In addition, we propose a new design metric which is the product of energy, delay, and ringing in lossy transmission lines. This metric is used during the driver sizing problem formulation. He is also Research Assistant in the System Power Optimization and Regulation Technology (SPORT) Laboratory, University of Southern California. His research focuses on timing analysis models and algorithms considering process variation. 
