In deep-submicron technology, global interconnect capacitances have started reaching several orders of magnitude greater than the intrinsic capacitances of the CMOS gates. The dynamic power consumption of a CMOS gate driving a global wire is the sum of the power dissipated due to (dis)charging (i) the intrinsic capacitance of the gate, and (ii) the wire capacitance. The latter is referred to as on-chip signaling power consumption.
INTRODUCTION
It is widely accepted that in deep-submicron CMOS technology interconnects are a limiting factor for achieving higher integration levels. This is because transistor feature size is getting smaller, which results in a lower parasitic capacitance (gate, drain and bulk capacitances). Whereas, because of the decreasing spacing between metal layers and the increasing number of metalization levels, interconnect parasitics are increasing as the technology scales down. Furthermore, as the integration levels increase, the average wire length is increasing steadily over time. In addition, the supply voltage V dd and the threshold voltage V th of the transistor are being aggressively scaled down. This results in a lowering of the power dissipation of the CMOS gates without degradation in speed. However, due to the effect of the technology scaling, power and delay caused by interconnect capacitance have begun to reach several orders of magnitude larger than those attributed to the intrinsic capacitance of the gates. Moreover, as a result of the reducing V dd and V th , the noise margin, NM, of the gates is decreasing as reported in Table I and in Ref. [19] . In Table I the parameters H and W represent the height and width of the metal layer, respectively, where the spacing between metal layers is assumed to be equal to W.
Due to the increasing integration density per standard die size and the increasing metalization levels, noise in DSM is increasing because of factors such as high number of switching devices, high current density per unit area and the increasing coupling noise between adjacent cells. This is due to the decreasing spacing between metal layers as shown in Table I . As a result it is expected that the magnitude of crosstalk noise escalates as the technology advances [2] .
DSM noise has two negative impacts: (i) it causes functional failure of the circuits by, e.g. inverting logic levels or false switching of a quiet line which tends to increase the glitch power consumption [12] ; and (ii) it increases the delay in the critical path of the circuits.
Thus, performing noise analysis and avoidance earlier began to become crucial in insuring a functionally correct and low-cost design. Noise avoidance can be addressed in hierarchical fashion, i.e. that is from algorithm, architecture, and circuit down to the layout level [15 -17,19] .
In Ref. [15] Algorithmic Noise Tolerance (ANT) scheme that allows for low-energy implementation of frequency selective filters in the presence of DSM noise is proposed. This concept is very efficient, however, its chief drawback is that it is only applicable to a class of DSP algorithms, namely FIR filter.
In Ref. [19] , circuit technique that enhances the noise immunity of dynamic circuits was reported. This technique permits a lowering of the supply voltage without comprising the circuit invulnerability from DSM noise. This is indeed a very appealing technique for dynamic CMOS, however, the energy saving scheme addresses only gate power consumption.
In Ref. [11] it was demonstrated that the power needed for sending the signal over global wires had started to reach several orders of magnitude greater than the power consumed by the signaling gate. In Refs. [11, 12] , a signaling scheme has been described based on low-voltage signaling combined with buffer insertion. Buffer insertion techniques have three ultimate goals: (i) to combat crosstalk noise; (ii) to decrease the interconnection delay; and (iii) to reduce the power consumption while driving the interconnect at an optimum or a given delay budget.
In Ref. [11] , these power-consumption problems were thoroughly studied. The advantage of buffer insertion in its ability to combat the speed loss caused by lowering the supply voltage has been shown. These studies have been extended to include inverter chains.
In Ref. [12] , the potential of buffer insertion to combat crosstalk noise has been shown and an heuristic algorithm for buffer insertion that drives the interconnect at very low-energy while still meeting the noise budget has been derived.
This paper is a continuation of our previous works [11, 12] . The main contributions to the research field contained in this paper are outlined below:
1. An accurate delay-estimation algorithm has been presented for global wire modeled as a distributed RLC network. 2. An accurate crosstalk-noise estimation algorithm has been described. 3 . Two buffer-insertion algorithms have been proposed, one for delay optimization, and the other for noise optimization.
The rest of the paper is organized as follows. In the second section, the problem is formulated and our work is put into perspective with existing approaches. In the fourth section, an interconnection delay model is described and a buffer insertion algorithm for delay optimization is presented. This is followed by the derivation of a crosstalk noise estimation model and then the description of buffer insertion algorithm for noise optimization. Subsequently, an heuristic algorithm for robust buffer insertion that allows to drive the interconnect at reduced voltage swing while meeting the delay and noise budgets is reported. In the fifth section, the experimental results are described. Finally, the sixth section concludes our paper.
ENERGY EFFICIENT ON-CHIP SIGNALING
Given two inverters, INV 1 and INV 2 communicating through a wire of length d as shown in Fig. 1 . Without loss of generality, we approximate the wire by a lumped capacitance of value C w ¼ Cd; where C is the per-unit value of the wire capacitance. According to [26] the load seen by the driving inverter, INV 1 , can be given by
where C dp and C gp are the drain and gate capacitance of the PMOS transistor, respectively. The dynamic power consumed by the driving inverter is given by
where a is the switching probability, P intr is the intrinsic power consumption, and P(d, f ) is the power consumed due to signaling over a wire of length d at a given frequency f. The latter is referred to as on-chip signaling power consumption [11] .
Similarly to the expression for the dynamic power consumption, the expression for the propagation delay is found to be
where b is given by Eq. (4) below.
and k 0 n and k 0 p are, respectively, the transconductance of the PMOS and NMOS transistor. W tr and L tr are, respectively, the channel width and length of the NMOS transistor, where it is assumed that the PMOS transistor is three times wider than the NMOS transistor [26] .
As the transistor feature sizes are decreasing, C w is getting higher than C intr . Under these circumstances the first-order approximation of power consumption and delay of the driving inverter reduces to
Four important observations come into light from Eq. (5): (i) the dynamic power consumption is proportional to the square of the signaling voltage V dd ; (ii) the dynamic power consumption is linearly proportional to the wire-length d;
(iii) the delay is inversely proportional to the size of the inverter; and (iv) the delay is linearly proportional to the wire-length, d and to V dd when the interconnect is modeled as lumped capacitance. In the case of an RC interconnect, the interconnection delay is proportional to the square of the wire length. However, for the case of RLC transmission line, in Ref. [14] , it was found that the interconnection delay was linearly proportional to the wire-length. These observations reveal that the most efficient way of saving power while driving the interconnect at the optimum delay is to reduce the supply voltage V dd . However, this comes at the expense of an increased propagation delay and reduced noise margin of the gate. In order to tackle these problems, a buffer with an appropriate size must be inserted. This problem is formally described in the next section.
Problem Formulation
Consider a CMOS gate, referred to as a transmitter, that sends signals to another CMOS gate situated at a distance d from the transmitter. The medium used for signaling between the transmitter and the receiver is the On-Chip Interconnect (OCI). In order to define the efficiency of the communication (signaling) between the transmitter and the receiver, the following definition has been adopted.
Definition 1 Let s t (t ) denote a transmitted signal, s r (t ) denote the received signal and d denote the propagation delay of the signal s t (t ) sent over a distance d, where d is the distance between the receiver and the transmitter. The communication between the transmitter and the receiver is said to be perfect if and only if at the instant of time t the received signal s r (t ) sent by the transmitter at t 2 d is within the noise margins of the receiver, thereby satisfying the following inequalities: js t ðt 2 dÞ 2 s r ðtÞj # NM H if s t ðt 2 dÞ is a logic high or js t ðt 2 dÞ 2 s r ðtÞj # NM L if s t ðt 2 dÞ is a logic low [25] .
If j ðd; DÞ is used to denote the energy required to send the data over a distance d under a specific delay D then the power saving in DSM can be formally described as the optimization problem depicted in Fig. 2 .
Related Work
A lot of effort has been recently devoted to addressing the problem of "optimum" signaling over global wires. The techniques addressing the problem are rigorously reviewed below.
In Ref. [3] , two buffer insertion algorithms that minimize the power consumption while driving an interconnect at a given delay budget have been described. These algorithms compute the optimum inverter sizes, {w 1 ; w 2 ; . . .; w n } such that the total energy consumption (dynamic and static) is minimized and constrained by a given delay budget. The first algorithm discussed is used in semi-custom design, where the buffers are selected from a buffer-library. The second algorithm is used in fullcustom design, where the buffer sizes are computed using an optimization algorithm, based on the Lagrange multiplier. It has been found that the non-constant tapper buffer chain consumes less energy than the buffer chain FIGURE 2 Parasitic capacitances of two inverter pairs communicating through a capacitive interconnect. with a constant tapper factor equaling e. These buffer insertion algorithms are very efficient for signaling over capacitive interconnects.
In Ref. [4] , the short-circuit power consumption for signaling over resistive interconnect has been addressed, and the following three different schemes for driving resistive inter-connect were compared: (i) uniform repeaters (ii) tapered buffers and (iii) tapered repeater buffers. It was found that the uniform repeaters are more efficient than the others. In order to save power, the approach proposed by Adler et al. is to trade speed for power, and a maximum of 15% power saving has been saved at the expense of a 4% increase in the propagation delay.
The aforementioned reports suffer from several limitations. Firstly, the interconnect has been inadequately modeled (e.g. lumped RC or purely capacitive interconnect). Secondly, locations of the buffers have been computed without considering crosstalk noise. Lastly, the power savings approach is to optimally select the number and sizes of the repeaters.
The second problem in the last paragraph above has been addressed in Ref. [5] , where three algorithms for buffer insertion have been described. The first algorithm inserts buffers for delay optimization, the second algorithm inserts buffers for noise optimization, finally, the third algorithm, which a dynamic programming algorithm, inserts buffers for both delay and noise optimization. The third algorithm is an improvement of the Van Ginneken's algorithm [8] . The buffer insertion algorithm for noise optimization uses the Devgan's crosstalk noise prediction algorithm [27] , whereas the buffer insertion algorithm for delay optimization is based on the traditional Elmore delay. In addition, the wires have been modeled as a lumped RC network. This work is the first of its kind to address noise and delay optimization based on buffer insertion. However, power-savings have not been addressed. In addition the Devgan's noise estimation algorithm has been found in Ref. [28] to be inefficient for some interconnect benchmark.
The buffer insertion algorithm for noise optimization based on inaccurate crosstalk noise prediction algorithm can have two negative impacts. If the noise prediction algorithm is pessimistic then the buffer insertion algorithm inserts more buffer than needed. This will certainly result in an increased propagation delay, power consumption and silicon area. On the contrary, if the noise prediction algorithm is optimistic, then the buffer insertion algorithm inserts less number of buffers than needed. Consequently, the design will suffer from the presence of the noise.
Indeed, the experimental results reported in Ref. [5] show that for some interconnect benchmarks, Devgan's algorithm is pessimistic which results in identifying 37 more nets with noise violations than the actual case.
In Refs. [9, 10] , the dynamic power consumed by a driver while driving a global wire has been reduced by scaling the supply-voltage. While a sizable power reduction has been achieved, the speed degradation caused by voltage scaling has not been properly addressed.
In addition, the scheme does not guarantee a perfect communication, because the buffer locations (i.e. wire segmenting) are computed without accounting for the crosstalk noise.
In order to efficiently address the aforementioned problems, in Ref. [12] an algorithm that solves the optimization problem shown in Fig. 2 was described. The cornerstone of the scheme is the reduced swing signaling combined with repeaters insertion and resizing. The estimation of the location of the buffers that takes into account both crosstalk noise and delay is computed using a heuristic algorithm called VIJIM. The interconnect has been modeled as lumped RLC, and an algorithm for delay estimation has been described. VIJIM uses the crosstalk noise estimation described in Ref. [28] . However, the deployed crosstalk noise estimation algorithm does not take into account the inductive effect of the wire. In addition, the wire in Ref. [11] is modeled as lumped RLC.
OPTIMUM SIGNALING OVER AN ON-CHIP INTERCONNECT
The expression of D and j ðd; DÞ must be derived in order in turn to be able to derive an algorithm for solving the optimization problem OPT-SIG shown in Fig. 2 . These expressions depend on the interconnect parameters (R, L, and C ) and the interconnect model (capacitive versus distributed RLC versus lumped RLC). It also depends on the number and sizes of the buffers used to drive the interconnect at a given delay budget. The number and size of the buffers are estimated based on the delay budget D. In order to minimize the quantity j ðd; DÞ; the approach in this work has been based on reduce voltage-swing signaling. This, however, comes at the expense of an increased propagation delay and decreased noise margin of the buffer. As was explained earlier on, in order to restore the speed, the approach adopted here has been to properly resize the inverter. The optimum number of inverter stages and the size of each inverter are computed using the algorithm described in Ref. [11] . In order to assess the efficiency of our proposed scheme, the following signaling schemes have been implemented in 0.25 mm, 2.5 V CMOS process using full-custom design; (i) S 1 : full swing (2.5 V) [3, 5] ; (ii) S 2 : reduced swing (1.5 V) [9, 10] ; and (iii) S 3 reduced swing signaling (1.5 V) driven by a buffer which is four times larger than the minimum-sized buffer.
For each scheme, j (d ) and t p have been measured using HSPICE simulator. The transistor is the BSIM3v3 MOS model from UC Berkeley. The results are reported in Table II : Tables II and III clearly show that our method achieves, on average, over 70% of energy savings without substantial speed degradation.
In this section it has been shown by using closed-form equations that the power consumed by an inverter (or buffer) is the sum of the on-chip signaling power consumption and the intrinsic power dissipation. It was also found that the delay is the summation of the delay caused by the intrinsic capacitance of the driver and the delay caused by the wiring capacitance. Consequently, it was argued that the most efficient way to reduce the onchip signaling power consumption under a given delay constraint is to reduce the supply voltage that must be combined with buffer resizing. The buffer resizing compensates for the delay degradation caused by scaling the supply voltage. The technique proposed in this section is very efficient for signaling over capacitive wire in the absence of crosstalk noise. However, this model may not be that valid for contemporary and future technologies, because the wire needs to be modeled as distributed RLC. Additionally, after the detailed place and route, or even after global routing, the OCI will be coupled to adjacent wires. This suggests that a more robust buffer insertion scheme that solves SIG-OPT must be developed. This is the goal of the following sections.
ROBUST BUFFER INSERTION: THE VIJIM ALGORITHM
The aim in this section is to provide answers to the following questions: (i) what is the optimal wiresegmenting strategy for delay optimization if the wire is modeled as a distributed RLC network?; (ii) what is the optimal buffer locations to suppress the crosstalk noise for non-critical nets?; and (iii) what is the optimal locations, number and sizes of the buffers that would allow voltage scaling for critical nets that suffer from the presence of crosstalk noise? In order to answer these questions, the problem must be described in a mathematical way.
Consider a wire E of length d surrounded by N e wires, and denote this by E e; i¼1;...;N e : Let S0 and SI denote the source and the sink nodes of the wire E. Without losing generality, assume that S0 and SI are minimum sized inverters. The optimum signaling scheme is an application (function), denoted by M similarly to the notation used in Ref. [5] , that takes as input E and its surrounding wires (E e,i ), delay constraint, D, the supply voltage V dd , and the noise margin h. Given these parameters, the algorithm returns, as a result, the locations, number, and sizes of the buffers. Thus, the application M is the solution of the optimization problem OPT-SIG, shown in Fig. 2 . In order to solve the OPT-SIG, the first task is to derive a closed form expression for j ðd; DÞ and for the interconnection delay d.
Let N b be the number of buffers to be placed on the wire. Each buffer, b i , ;i [ {1; . . .; N b } is located at a distance d b,i from the source node and has a width W b, i . Without loss of generality, we assume that the noise margin of the buffers is constant. However, our approach can be easily extended to handle the case of buffers with different noise margins. The number of buffers N b depends on the wire-length, the crosstalk noise induced into the wire and the delay budget D. Based on the alphapower-law-model, two delay constrained optimization cases for computing the optimum number and sizes of the buffers have been reported in Ref. [3] . In the next section, we follow the same formulation, however, the long-channel model is used. In addition, a closed-form expression for the short-circuit power consumption described in Ref. [20] is utilized. If we assume that N b buffers have the same switching activity a and work at the same clock speed f then the power needed to send the signal from S0 to SI is accurately modeled using Eq. (6)
where C min is the gate capacitance of the minimum sized inverter, t is the slope of the input voltage, V th is the threshold voltage and 9 ¼ L tr ½ð1=k 0 n Þ þ ð1=3k 0 p Þ (Cf. Eq. (4)).
There are two solutions for OPT-SIG. The first solution is to search for the optimum sizes of the buffers from a given buffer library. This is a discrete non-linear optimization problem that can be solved by, e.g. exhaustive search if the number of buffers is reasonably small, or can be further transformed into an ILP optimization problem. This is the solution used in semicustom design. In full-custom design, OPT-SIG is solved using Lagrange multipliers or heuristic algorithms, where the buffer is designed based on the optimum value after it is obtained.
The first task towards solving the OPT-SIG optimization problem is to partition the wire into an optimum number of segment. Each wire-segment is then driven by a buffer with an appropriate size. By using the Elmore delay, it has been shown that in order to drive an interconnect at an optimal delay the interconnect needs to be partitioned in N b þ 1 equally spaced segments if the buffer, the source and the sink have the same electrical parameters. The value of N b is computed using a closed form expression [6] . The algorithm described in Ref. [6] guarantees optimum solution if the interconnect is modeled as a distributed RC and the library contains one type of buffers. However, this technique suffers from the following limitations:
1. As was proved in Ref. [5] and experimentally shown in Ref. [12] buffer insertion based on delay optimization does not guarantee a perfect communication. 2. This results is not necessary true for the case of an RLC interconnect. 3. It has been found in many published reports that tapered buffer repeaters are more optimal than uniform repeaters [3, 7, 11] .
In the semi-custom design, the last point made above is irrelevant, because the choice of the buffer sizes is limited by the size given in the library.
In the work done here, the interest has been focused on a full-custom design solution where the sizes of the buffers are computed using the algorithm described in Ref. [13] . In order to derive the algorithm for buffer insertion that solves OPT-SIG, accurate crosstalk noise and delay estimation algorithms have been presented in the subsequent sections. After that, two buffer insertion algorithms for delay or noise optimization are described.
Interconnection Delay
If the interconnect is modeled as a distributed RLC circuit is based on the two-poles approximation of the transfer function of the RLC transmission line. The derivation steps and the delay models are described in Appendix A. The steps undertaken to obtain a closed form expression of the interconnection delay are similar to those used in Ref. [29] .
The two main closed-form expressions for the interconnection delay are given by Eqs. (7) and (9) .
where K r is given by
The selection of the appropriate equation for estimating the interconnection delay is done in the following way. If b 2 1 $ 4b 2 then the interconnection delay must be estimated using Eq. (7); otherwise Eq. (9) must be used. In fact, in the interconnect benchmarks used in this paper it was found that b 2 1 $ 4b 2 : Thus, in the sequel our objective is to derive a buffer insertion algorithm for delay optimization based on Eq. (7) . Problem 1 Given a wire of length d and a non-inverting buffer that has a resistance R b an intrinsic capacitance C b and an intrinsic delay K b , provide a necessary and sufficient condition that when the buffer is inserted, the interconnection delay decreases.
Similarly, to the conditions given in Ref. [6] , what is of interest here is finding the conditons when it is worthwhile inserting a buffer to reduce the interconnection delay. In Ref. [6] , the Elmore delay has been exercised to obtain conditions for buffer insertion.
As mentioned above, the interconnect is assumed to have real poles, thus, Eq. (7) is the best-suited model for delay estimation. The problem is that Eq. (7) is a nonlinear equation and thus an approximated linear and closed-form equation is needed in order to derive an algorithm for buffer insertion.
An inspection of Eq. (7) reveals that in order to achieve this goal, a simple expression must be found for K r . In Appendix A, an empirical value for K r has been found to be equal to 0.69 and therefore, Eq. (7) can be reduced to Eq. (10).
Once the approximated quasi-linear equation for Eq. (7) was found, the goal is to obtain a linear relation between the delay and the interconnect length. For that, the following Lemma is needed.
Lemma 1 Given a wire of length d which is modeled as a distributed RLC circuit. Assuming that the poles of its approximated transfer function are real and that Eq. (10) accurately estimates the interconnection delay, then its delay can be reduced to
Proof Consider the denominator given in Eq. (10). As s 1 and s 2 are real, b 1 and b 2 are positive defined real numbers,
; we know that the first order polynomial approximation for the function f ðxÞ ¼ ffiffiffiffiffiffiffiffiffiffi ffi
If we substitute this in Eq. (10), we obtain Eq. (11) . This concludes the proof.
A Theorem 1 Consider a wire E of length d driven by a source that has a resistance R 0 . Let us assume that E is terminated with a capacitance C L and let us further assume that Eq. (11) accurately predicts the interconnection delay, then the buffer b must be inserted whenever the inequality (12) holds.
are, respectively, the intrinsic delay, capacitance, and resistance of b.
Proof Let us assume that the buffer is located at a distance x from the source (distance d 2 x to the sink), and that the sink has an input capacitance denoted by C L as shown in Fig. 3 . The intrinsic delay of the source and sink are denoted by K b and K s , respectively. The delay before and after buffer insertion is, respectively, denoted by t b and t, where the subscript b means "after" buffer insertion. The expression for t b and t are given by, respectively, Eqs. (13) and (14) .
The optimum value for x, denoted by x opt is obtained by solving ›t b =›x ¼ 0: The expression for x opt is given by Eq. (15) .
If we substitute x by x opt in the expression of t b , the condition t b . t is true if and only if the length of the interconnect satisfies the inequality given by Eq. (16).
d .
A Theorem 2 Consider a wire connecting two identical buffers (source and sink). If the buffer is identical to the source then it must be inserted half-way between the source and the sink.
Proof If we replace C L by C b and R 0 by R b in Eq. (15), we obtain x opt ¼ d=2: This concludes the proof. A In fact if the source, the sink and that buffers satisfy the conditions given in Lemma 2, then given N b buffers, the optimum way to reduce the interconnection delay is to place the buffer equidistant at
Based on the Elmore delay, in Ref. [6] it was shown that the buffer b must be inserted along the wire E to reduce the interconnection delay if the length of E satisfies the following relation
If the wire E satisfies Eq. (17), then as in Ref. [6] it was shown that the buffer must be placed at the optimum location given by Eq. (18). where the subscript "A" is used to refer to the buffer location computed using Alpert's algorithm. Careful examination of Eqs. (15) and (18) shows that indeed the two equations are identical. However, Eq. (15) was obtained by finding an approximation of the interconnection delay given in Eq. (18) . The resulting approximated equation is independent of the inductance. This suggests that for certain interconnect topology, the error between Eqs. (7) and (10) can be unacceptably high. Figure 4 shows the impact of the inductance on the interconnection delay. A rigorous analysis of the impact of the inductance on the buffer insertion scheme and the interconnection delay is reported in Ref. [14] .
Under this circumstance, the buffer insertion algorithm based on Eq. (15) (or Eq. (18)) can lead to a non-delay optimized design. In order to tackle this problem, Eq. (7) must be used to find x opt .
To summarize, ideally for a given wire of length d modeled as a distributed RLC circuit, if the source and the sink on the wire have different characteristics than the buffer, then Eq. (16) must be recursively used to place the buffers for delay optimization. Otherwise, the buffers must be inserted equidistant onto the wire.
The chief drawback of the proposed buffer insertion is that the placement of the buffer has been computed under the assumption that the noise at the buffer locations satisfies the perfect communication requirements. However, in order to guarantee a hazard free circuit, at each buffer location the crosstalk noise must be checked. This should be done with the help of the crosstalk noise estimation algorithm. In the sequel, an efficient crosstalk noise estimation algorithm is derived.
Efficient Crosstalk Estimation
The crosstalk noise is referred to as the noise induced by an active line (called aggressor) into a quiet line (referred to as victim). Over the last decade many efficient algorithms for crosstalk noise estimation have been reported, e.g. Refs. [24, 27, 28, 30, 31] .
The crosstalk noise algorithm described in Ref. [28] , which is an improved version of the algorithm described in Ref. [27] , has been used in Ref. [12] to estimate the location of the buffers. While the algorithms published in Refs. [27, 28, 30, 31] are very efficient when the interconnect is modeled as distributed RC network, the efficiency of these algorithms has not been assessed when the interconnect is modeled as distributed RLC network. Thus, the aim of this section is to quantify the impact of the inductance on the crosstalk noise, then based on this, derive a new efficient crosstalk noise estimation. The derivation steps are described in Appendix B.
Consider two coupled interconnects of length d shown in Fig. 5 . Each wire is modeled as distributed RLC network. Two adjacent wires are coupled via a coupling capacitive. In Ref. [32] , the validity of this model has been experimentally checked using measured data from a manufactured VLSI chip implemented in 0.25 mm 2.5 V CMOS process. The measured data has been compared against HSPICE and the results showed a good agreement between the HSPICE model and the measured data. However, in Ref. [32] HSPICE has been used to characterize the crosstalk noise for technology that has a feature size below 0.25 mm. A closed form expression for the crosstalk noise is very useful for interconnect-driven optimization such as routing and buffer insertion [21] .
In order to quantify the effect of the inductance L on the peak crosstalk noise, the interconnect shown in Fig. 5 has been approximated by 10 ladder RC (referred to as M 0 ) and 10 ladder RLC (referred to as M 1 ) network. The PUV of L, C, C c and R are, respectively, 0.321 nH/mm, 1.59 V/mm, 0.156 and 0.156 pF/mm. The values of the parameter were obtained using a field-solver as described in [11] . By using VRCMAX and VRLCMAX to denote the maximum crosstalk noise for M 0 and M 1 , respectively. Figure 6 shows a plotting of these parameters as a function of the wire-length. The curves for VRCMAX and VRLCMAX clearly show that in this particular case the inductance increases the crosstalk noise, consequently, the prediction model that does not account for the on-chip inductance is an optimistic estimator. The error of the prediction model can be unacceptably high.
In Appendix B, the closed-form for the normalized crosstalk noise, assuming that the aggressor and the victim line are of same length (See Fig. 5 ), was found to be V N ðt peak Þ ¼ 0:5 s þ e s þ 1 t peak 2 s 2 e s 2
where t peak is given by Eq. (20) .
s 2 1;2 ; and s þ 1;2 are, respectively, the solution for the DFEs given by Eqs. (37) and (38) in Appendix A.
If the net has multiple aggressors, the superposition Theorem can be used to compute the maximum crosstalk noise induced into it.
The chief advantage of Eq. (19) is that it accounts for the driver's resistance R 0 , the wire parasitics, i.e. R, L and C, and the loading capacitance C L . In order to derive a practical formula, the parameter b given by Eq. (21) has been neglected.
where T r is the rise time of the aggressor's voltage. In Ref. [27] , Devgan derived a closed form equation for the current induced by the aggressor coupled to the victim net, which hereafter will be referred to as Devgan's model. Devgan's noise model is given by Eq. (22) .
where d is the wire length, C is the per-unit value of the wire capacitance, l is the ratio of the coupling to the wire capacitance, and m is the slope of the aggressor net defined as
where T r is the rise-time of the signal. It is clear from Eq. (22) that the induced current depends on the wire length. This suggests that for a given net, which is coupled to one or several aggressors, the magnitude of the crosstalk noise can be unacceptably high that may cause serious functional failure or increases the glitch power consumption. In order to bring the noise under control, in Ref. [5] , a buffer insertion algorithm, named BuffOpt, for noise optimization has been proposed. The idea behind BuffOpt is that for a given net which is coupled to one or any aggressors, there is an optimal wirelength beyond which the level of crosstalk noise becomes larger than the noise margin of the circuit (h ). For the case of a single net coupled to one aggressor, the necessary and sufficient condition that the wire length must satisfy in order to insert a buffer to wipe up the crosstalk noise is given by Eq. (24) .
where R 0 is the driver's resistance.
The advantage with Devgan's noise model is that the noise is linear on the interconnect length d. However, the risk of using Eq. (22) is that for some interconnect benchmarks the estimated crosstalk noise is several orders of magnitude higher than the actual case. Which will result in over buffering the net.
Indeed, in Ref. [28] it was found that using a 5 mm length of wire that if the rise-time of the aggressor's voltage is 20 ps then the value of the crosstalk noise estimated using Devgan's algorithm is 55.82 V, whereas the correct value is only 0.39 V. Although the authors of Ref. [28] have proposed a more efficient crosstalk noise estimation algorithm, since, the maximum error defined in Eq. (25) can be as high as 51%.
where w c is the correct value and w e is the estimated value.
Based on the p model of the interconnect a more accurate formula for the crosstalk noise has been reported in Ref. [30] .
Robust and efficient buffer insertion for noise optimization using our model, given by Eq. (19) , is based on a binary searching technique. The binary searching algorithm works in the following way. Consider a quiet wire E q driven by source S o and a sink S i that has an input capacitance C si . Consider also an aggressor wire E a driven by a source S 0 that has a resistance R 0 coupled to E q through a coupling capacitance that has an PUV denoted by C c . Given these conditions, denote the noise margin of the sink placed on E q by h. In order to insert the buffer onto the quiet line, the estimated crosstalk noise using Eq. (19) , which will be denoted by v si should be larger than h; otherwise there is no need for buffer insertion. Let us assume that the buffer has an input capacitance denoted by C b , the task of the buffer insertion algorithm is to find a suitable location d opt for inserting the buffer. By letting d min be the length of the wire such that the estimated crosstalk noise is less than h, and letting d max be the maximum distance such that the estimated noise is more than h, then obviously d min and d max should satisfy the relationship given by:
where L is a very small parameter that represents the minimum distance between two buffers. If g is a displacement parameter such that g , 1 and at the initialization step, d min is set to zero and d max is set to d.
The Fig. 7 . 
VIJIM Algorithm
In the previous subsections, two algorithms for buffer insertions were described. The first algorithm described in section "Interconnection Delay" inserts buffers for delay optimization, whereas the second algorithm described in section "Efficient Crosstalk Estimation" inserts buffers for noise optimization. For critical nets, an algorithm for buffer insertion that optimizes both delay and noise optimization must be used. That means, given a delay budget D, if the buffer must be inserted at a location d opt,time to satisfy the delay requirements then it may happen that the level of the noise at d opt,time is more than the noise margin of the buffer. In this case, the buffer must be shifted to a distance d opt,noise which is smaller than d opt,time .
In the "Energy Efficient On-chip Signaling" section it was found that the power needed for signaling, which is the power needed to transmit the signal over the wire of length d, is proportional to the wire-length and proportional to the square of the supply voltage. It was suggested that the supply voltage must be reduced in order to satisfy the delay budget while minimizing the power dissipation. The potential of buffer resizing has been shown in compensating for the delay degradation. The closed form expression described in "Energy Efficient On-chip Signaling" section are only valid for signaling over capacitive wires. Therefore, the proposed delay and buffer resizing algorithm cannot be efficient for signaling over distributed RLC wire. The most efficient way to drive a distributed RC wire at an optimal delay is hence reduced to the problem of selecting an appropriate resizing algorithm. In the following paragraph two popular resizing algorithms are reviewed.
The concept of geometric ratio sizing is a well known design technique for choosing buffer sizes. In Ref. [26] , a technique for choosing the number and sizes of inverters to drive an interconnect wire at an optimal delay has been demonstrated. By using simplified inverter charging/discharging models, it was concluded in Ref. [26] that successive inverters driving the interconnect line should be sized up in a geometric progression. According to this technique, assuming an inverter of size (S ) is to drive a load of capacitance C L , then the first driver/inverter of size say (S ) is followed by uniformly spaced inverters of size eS; e 2 S; e 3 S; . . .; e n S: Where n ¼ log 2 ðC L =SC g Þ; and C g is the gate capacitance of a minimum sized inverter. It was later proved in Ref. [13] that this technique cannot achieve an optimal delay. By taking into account the slope of the input signal as well as the RC interconnect models and more exact behavior of inverter charging/ discharging equations, in Ref. [13] it was found that in order to achieve an optimal delay, the inverters need to be sized with a pseudo-fixed ratio. The pseudo ratio should be of the form a £ ð1 þ eÞ i for sizing stage i. This implies that the inverter sizes will vary as, S; að1 þ eÞS; a 2 ð1 þ eÞ 3 S; . . .; a n ð1 þ eÞ nðnþ1Þ=2 S: If the length of the interconnect is d then the inverters need to be located with uniform spacing d=n: In Ref. [13] , a detailed technique was then developed for minimal powered chained-driver configuration for achieving a specified delay. While this is our objective in this paper, we note that the inductive effects and noise for buffer insertion have been totally ignored in Ref. [13] . A robust algorithm for buffer insertion that solves the optimization problem given in Fig. 2 is described in Fig. 8   FIGURE 9 Comparison between VIJIM and BuffOpt of Ref. [5] for the case of normalized h ¼ 0:2; D ¼ 1; R ¼ 35 V=cm; L ¼ 3:47 nH=cm; and C ¼ 5:16 pF=mm: These parameters are chosen from Table V where it is assumed that C L ¼ 153fF; C c ¼ 2 £ C; R0 ¼ 2000; C b ¼ 153fF; and d ¼ 1 mm: VIJIM algorithm inserts a buffer at 0.344 mm from the source, whereas BuffOpt inserts a buffer at 0.13 mm from R 0 . Before the buffer is inserted, the amplitude of the normalized crosstalk noise is ca. 0.24. After VIJIM and BuffOpt the amplitude of the normalized crosstalk noise has been reduced to, respectively, 0.20 and 0.14.
The VIJIM algorithm relies on a well characterized interconnect, an accurate delay estimation and an accurate crosstalk noise estimation algorithms. These functions are given as an argument to the VIJIM algorithm: f-computeðdelay; OCI; a; bÞ and f-computeðnoise; OCI; bÞ:
The VIJIM algorithm works as follows. At the ith iteration, VIJIM algorithm uses a fast searching technique to locate the "best" position of the ith buffer such that the received signal from the ði 2 1Þ buffer V i21 satisfies the perfect communication and delay constraints (see Definition 1) . The location of the buffer is determined within a given precision (s ) which can be set at the onset of VIJIM. The convergence speed, which is the time needed to locate b opt given a, of VIJIM depends on the value of g. In other words, for a small value of steps VIJIM estimates the location of the ith buffer with faster convergence time provided the location of b opt is closer to the initial position of b than to a. Given N possible locations of a buffer to be placed on a net of length d, VIJIM algorithm (and efficient BuffOpt) seeks the optimal buffer location (d opt ) in log 2 (N ) steps.
EXPERIMENTAL RESULTS
In the experiments carried out in this paper, the wire resistance, capacitance and inductance (R, L and C ) are obtained from the measurement results published in Ref. [33] . A closed-form expression has been obtained based on least-square estimation for R, L, and C as a function of the wire-width (W ) is obtained. The results are shown in Figs. 9-11. The expression for R, L and C are given, respectively, by Eqs. 
The second interconnect benchmarks are the PUV of R, C and C c of two adjacent M3 interconnect used in a real microprocessor designed in 0.25 mm CMOS process [30] .
In order to estimate the on-chip inductance, we used a practical formula that relates the capacitance to the inductance which is described in Ref. [1] . The PUV interconnect parameters are reported in Table IV . The third interconnect benchmarks are the one published in Ref. [34] and are presented in Table V .
Accuracy of the Delay Model
The delay estimation algorithm has been implemented in Cþ þ . Each interconnect of length d p ¼ 10 mm is approximated by an RLC circuit of value R p ¼ Rd p ; L p ¼ d p L and C p ¼ Cd p : The delay estimation algorithms given by Eqs. (7) and (9) have been compared to the delay obtained (measured) using HSPICE simulator and the Elmore delay. In addition, the delay models developed here (Eqs. (7) and (9)) have been compared against the delay model proposed by Ismail et al. referred to as Yehea's model [14] . For each delay model, the error is computed using
where t is the estimated delay using Elmore, Yeheas' or our delay mode, and jxj means the absolute value of x. Table VI compares our delay model (denoted by t d ) to Elmore delay (denoted by t E ) and HSPICE delay (denoted by t spice ). The interconnect parameters were the one reported in Ref. [33] (see Eqs. (27) - (29) . Tables VII and VIII compare the three models: (i) Elmore delay, (ii) the delay model developed here (given by Eq. (10)) and (iii) Yeheas' delay model (denoted by t Y ) to HSPICE delay.
In the interconnect benchmarks developed here, it was found that b 2 1 $ 4b 2 ; which means that the interconnection delay is given by Eq. (7) . In order to compare Yehea's delay model and our model given by Eq. (9) to HSPICE, it is important to find an interconnect benchmark such that b 2 1 , 4b 2 : Table IX reports the comparison of the three delay model to HSPICE for the case when b 2 1 , 4b 2 : The interconnect parameters are obtained from measurement results reported in Ref. [33] . The interconnect paramters are those given in Table V .
Finally, Table X reports a statistical comparison between Elmore delay, Yehea's and the model proposed here. From the results presented in the aforementioned table we found that our delay model has the lowest average prediction error. However, Yehea's delay model has the lowest maximum prediction error.
Accuracy of the Crosstalk Noise Estimation Algorithm
In order to evaluate the accuracy of our noise model, the crosstalk noise, denoted by V N is obtained using HSPICE simulator. The interconnect is modeled as a distributed RLC network. Each interconnect of length d p ¼ 10 mm is approximated by an RLC circuit of value R p ¼ Rd p ;
The resulting error between the estimated crosstalk noise and the correct value, i.e. V N is computed using Eq. (31).
where "model" refers to as the crosstalk noise model and V model is the predicted crosstalk noise using "model". Let V N , V ABK , and V our be, respectively, the crosstalk noise algorithm reported by HSPICE, the crosstalk noise estimated using Kahng's algorithm [30] , and the crosstalk noise estimated using our algorithm. The maximum crosstalk noise occurs at some specific time. This time is referred to as the peak time. Let t peak , t ABK and t our be, respectively, the reported peak time obtained via HSPICE simulator, the estimated peak time using Kahng's algorithm, and the peak time estimated using our algorithm. Tables XI -XIII report the comparison between Kahng's and our crosstalk noise estimation algorithm against HSPICE. Table IV R 0 (V) 
Robust Buffer Insertion
Buffers are inserted in a given net and its aggressors such that the constraints depicted in Fig. 2 are met at the lowest possible signaling power consumption. From the previous experiments it is known that the crosstalk and interconnection delay estimation algorithms proposed here are very accurate. Before we proceed into the description of the results of energy efficient signaling, this would be an appropriate moment to discuss the robustness of the buffer-insertion scheme proposed here in the presence of crosstalk noise without delay constraints that is with the setting D ¼ 1 in VIJIM algorithm.
In the case of non-critical nets (i.e. D ¼ 1), there is no need for buffer resizing which means the buffer resizing algorithm must be halt. As a result, the VIJIM algorithm is reduced to the efficient BuffOpt depicted in Fig. 7 . For this special case, the comparison between VIJIM and BuffOpt of Ref. [5] is reported in Table XIV. Figure 12 shows the impact of buffer insertion on the crosstalk noise using both VIJIM and BuffOpt.
The following definition has been used to measure the efficiency and robustness of the buffer-insertion algorithms for noise estimation.
Definition 2 Consider the two algorithms denoted A 1 and A 2 that insert buffers for noise optimization. A 1 is said to be more efficient than A 2 (or vice-versa) if A 1 inserts buffer at the optimal location, denoted by d opt such that after buffer insertion the noise at d opt is equal or slightly less than h.
From the results presented in Table XIV and Fig. 12 and for large R 0 , we see that VIJIM is more efficient than BuffOpt. This is because the noise metric used by BuffOpt is too pessimistic. Thus, for special nets, this technique would lead to a design that would suffer from an over buffering. Which would unquestionably increase the cost (silicon area and power dissipation) of the IC and may also increase the power supply noise. For example, where R 0 ¼ 1000 or 2000 and if d ¼ 200 mm and h ¼ 0:2 then the net satisfies the noise budget requirements (see Table XIV ). That means there is no need for buffer insertion. However, BuffOpt inserts buffer at the location where d opt ¼ 130 mm:
The results described earlier in the evaluation of the crosstalk noise estimation proposed here, reveal instead TABLE XIII Comparison between the estimated normalized crosstalk reported in Ref. [30] and our model against HSPICE for the case of an interconnect that belongs to Case 4 in Table IV R Table V R The results are obtained by fixing C L and C b to 0.135fF and the normalized h ¼ 0:2: The cell that contains " -" means that no buffer is inserted by the corresponding algorithm.
FIGURE 12
Measured and estimated interconnect capacitance as a function of the wire width.
that this underestimates the crosstalk noise. The maximum of this estimation error is less than 10% (9.9%). Let us denoted by h our the maximum under estimation error which means that in our case h our ¼ 210%: If we assume that the normalized noise margin is h, then VIJIM does not insert a buffer if the estimated crosstalk noise, v max satisfies Eq. (32) .
From the results tabulated in Table XIV , we found that for R 0 ¼ 100 and for d ¼ 500 mm or d ¼ 1000 mm; VIJIM inserts buffers at a non-optimal location, as the amplitude of the crosstalk noise is still higher than h after buffer insertion. In order to tackle this problem, the value of h in the VIJIM algorithm is reduced from 0.2 to 0.18. The results after this correction is reported in Table XV .
In order to minimize j ðd; DÞ; our approach is to select the optimum supply voltage from a given set of possible supply voltages. In our experiments, we are given three supply voltages: 2.5 (full swing), 1.8 and 1.5 V. For each wire length, the optimum location and sizes of the buffers are computed using VIJIM. The resizing algorithm used in this version of the VIJIM is the one proposed in Ref. [13] . The results, given in Table XVI , show that 1.5 V has the lowest jðd; DÞ: This observation rises the following question: why not just use the lowest supply voltage that guarantees a perfect communication? In order to answer this question, we have conducted the same experiments using supply voltage equal to 1 V. The results, tabulated in Table XVI , show that j 15 is very close to j 1 . This is because in order to compensate for the delay caused by over scaling the supply voltage then more buffers are needed if the supply voltage equals 1 V compared to the case if the supply voltage is 1.5 V.
CONCLUSION
In this paper a scheme has been proposed for combating crosstalk noise when driving an RLC wire at an optimal delay or a given delay budget while reducing the on-chip signaling power consumption. The core of this scheme is low-voltage signaling combined with buffer insertion and resizing. The buffer insertion algorithm inserts a properly resized buffer at an appropriate location to achieve the following goals: (i) compensate for the delay degradation caused by lowering the supply voltage; (ii) reduce the interconnection delay; and (ii) eradicate the crosstalk noise.
An accurate delay and crosstalk noise model for coupled distributed RLC wires has been presented here. A fast algorithm VIJIM that inserts buffers for delay and noise optimization at a reduced voltage swing signaling has been presented.
The experimental results for a set of interconnect benchmarks show that the proposed delay model has a comparable performance to Yehea's algorithm [14] . The average error of our delay estimation algorithm is 6.42%, whereas the average error of Yehea's algorithm is 6.6%. TABLE XV Comparison of VIJIM to BuffOpt at the case when D ¼ 1 and the interconnect parameters are given by the column W 1 in Table V R The results are obtained by fixing C L and C b to 0.135fF and the normalized h ¼ 0:18 to fix some noise violations reported in Table XIV for the case VIJIM algorithm. The cell that contains "-" means that no buffer is inserted by the corresponding algorithm. More importantly, with the proposed delay model it is possible to obtain a condition for buffer insertion and the optimal buffer location for delay optimization.
The results also show that the proposed crosstalk noise is very accurate with respect to HSPICE results and more accurate than state of the art published algorithms. In on-chip signaling in 0.25 mm CMOS process, and in the presence of crosstalk noise, VIJIM algorithm was found to be more efficient than state of the art algorithm that inserts buffer for noise optimization [5] . In order to save power while driving the interconnect at an optimal delay and to satisfy the noise requirements, the experimental results show that over 60% of energy-saving can be achieved if the supply voltage is reduced from 2.5 V down to 1.5 V.
assumed to be homogeneous with the same electrical parameters R, L and C. Thus, it is enough to just derive the DFE for TML 1 .
Let us consider the node ðN x 1 ; N xþdx 1 ; N 0 Þ; where N 0 is used to refer to the ground. If we apply Kirchhoff's voltage law, we obtain the following equations
where C, L, R, C C are, respectively, the PUV of the wire capacitance, inductance, resistance, and crosstalk capacitance.
If i 1 is substituted in Eq. (33) by its expression given by Eq. (34), the following equation is obtained:
By using the symmetrical property of the coupled transmission line, the DEF for TML 2 is found to be 
By doing this useful transformation, we can see that the problem of computing v 1 (t ) and v 2 (t ) of the two coupled transmission line is simply reduced to the problem of solving two independent Telegraph equations for two transmission lines, T 1 and T 2 driven by, respectively, v 2 and v þ .
Closed Form Expression for the Output Voltage of a transmission Line
From the derivation described in Ref. [22] we know that the transfer function for a transmission line of length d, shown in Fig. 13 , is given by
where Z s is the source impedance,
is the characteristic impedance for the transmission line, and s is the Laplace variable.
Consider a case of a source of impedance Z s ¼ R 0 driving an interconnect of length d terminating at an impedance of value Z L ¼ 1=sC L ; and letting r ¼ Rd; l ¼ Ld and c ¼ Cd; the expression for the transfer function then becomes
Let FðsÞ ¼ Lðf ðxÞÞ be the Laplace transform of f(x ) as defined [23] FðsÞ
In order to derive the time domain expression for v d ðtÞ ¼ L 21 Vðd; sÞ; Eq. (40) must be expressed as follows
where A i ¼ Pðs i Þ=Q 0 ðs i Þ and N 0 is the approximation order. Now, our objective is to express the transfer function H(s ) given by Eq. (40) in a polynomial form as given by Eq. (42).
Let us define G as
Eq. (40) is then reduced to 1
If only the second-order polynomial of s, is considered, approximate expressions for D 1 (s ) and D 2 (s ) are obtained. The second-degree polynomial for s that approximates H(s ) is given by:
and b 2 is given by
Given b 1 and b 2 , the transfer function can be factorized as
where s 1 and s 2 are the solutions of the following equation
In order to solve Eq. (49), the first step is to compute the discriminator D ¼ b 2 1 2 4b 2 :
The inspection of Eq. (48) reveals that there are three cases to consider. (i) case number one: s 1 and s 2 are real and s 1 -s 2 ; this is equivalent to the condition D . 0; (ii) case number two: s 1 ¼ s 2 ; or alternatively D ¼ 0; (iii) case number three: s 1 , s 2 are complex numbers, this corresponds to the condition D , 0; and thus, s 1 is the complex conjugate of s 2 . Mathematically this means that s 1 ¼ s 2 : In order to derive a time-domain expression for the output voltage v d ðtÞ ¼ L 21 ðHðsÞV i ðsÞÞ; let us assume that the input voltage follows an exponential function, which means that v i ðtÞ ¼ V dd ð1 2 e 2tð2:30=T r Þ Þ: The expression for V i (s ) is then given by Eq. (50).
where b ¼ 22:3=T r ; and T r is the rise-time of the input voltage.
The predicted output voltage for the case number one is given by ; ð52Þ Figure 14 shows the plot of v d (t ) given by Eq. (51) versus the measured v d (t ) given by HSPICE simulator. Figure 15 shows that v d (t ) computed using the method proposed in this paper behaves better than ABK formula. However, in many situations, the value of T r is very unlikely to be higher than few nano-seconds, thus in the rest of the derivation, the ABK formula given in Eq. (52) is used as an approximation for v d (t ) if s 1 and s 2 are real valued, such that s 1 -s 2 :
Using the expression given by Eq. (52), our next goal is to derive the equation for v d (t ) if s 1 and s 2 are complex numbers (case2). Mathematically it means that ; and f ¼ p þ arctanð2l=aÞ: Figure 15 plots v d (t ) computed using Eq. (53) versus the one obtained via HSPICE simulator. In summary, the output voltage, v d (t ), of the transmission line can have three possible expressions. These expressions are presented in Table XVII .
Closed Form Expression for the Interconnection Delay
Given the expression for the output voltage, v d (t ), the delay is defined as
From the closed-form expressions for v d (t ) given in Table XVII , we see that the delay t d may take on one of the three forms given in Table XVII . A closed form delay expressions for the three different cases have been derived by Kahng et al. in Ref. [29] (Referred to as ABK delay model). In the sequel, ABK delay models are reviewed. However, only the two first cases given in Table XVII have been considered.
If s 1 and s 2 are real then the approximated expression for t d , is reduced to
where K r is given by the following equation
Kahng et al. have found that K r is constant for a wide range of interconnect models used in their experiments. Then, a least-square estimation (linear regression) has been applied to obtain an empirical value of K r . In their interconnect benchmarks, K r has been found to be equal to 2.3. This expression is appropriate for the 90% delay threshold. In this work, we are interested in the 50% delay model. In order to derive the interconnect delay for the 50% delay threshold, we fix v th ¼ 0:5: Thus, the expression for K r becomes in the form of logð1 þ xÞ where
: The function logð1 þ xÞ < x 2 0:5x 2 : Thus, the approximated expression for the t d given by Eq. (56) is reduced to Eq. (58).
An empirical value for K r has been found to be equal to 0.69. Thus, the expression for the interconnection delay given by Eq. (58) becomes
If s 1 and s 2 are complex number then the 50% delay must be computed using Eq. (53). Computing the delay for this case requires some approximations in order to find a linear expression of the delay as a function of the wire-length. The approach undertaken in Ref. [29] was to substitute t by the Elmore delay in the exponential function, then the sine function is approximated by its first order Taylor series. In the case here, we have used a second order Taylor series for the function t d given by Eq. (53). The expression for the delay is found to be
where e ¼ 1=2r:
APPENDIX B: CLOSED FORM EXPRESSION FOR THE CROSSTALK NOISE
The objective in this Appendix is to derive a closed-form expression for the crosstalk noise. From Appendix A, we know that there are three possible cases for the output voltages v þ ðd; tÞ and v 2 ðd; tÞ: Those cases are summarized in Table XVII . This means that the expression of the crosstalk noise can have six different cases. However, in this paper we assume that TML 1 and TML 2 belong to the same class. In other words, if the poles of TML 1 belong to Case 1 then TML 2 belongs to Case 1 (see Table XVII ). In this model, only the expression for the crosstalk noise if TML 1 and TML 2 belong to Case 1 is elaborated.
Let us denote by E 1 and E 2 the DC voltage for v 1 (t ) and v 2 (t ), respectively. Following the same techniques used by Sakurai et al. to derive the crosstalk noise estimation model [31] , if we set E 1 equals zero, we obtain the normalized peak noise which is given by V N ðt peak Þ ¼ 0:5 s þ e s þ 1 t peak 2 s 2 e s 2 His research interests include all aspects of VLSI implementations of broadband access networks. He is currently working on VLSI adaptive digital filters, equalizers and beamformers, error control coders and cryptography architectures, low-power digital systems and computer arithmetic. He has published over 320 papers in these areas. He has authored the text book VLSI Digital Signal Processing Systems (Wiley, 1999) and co-edited the reference book Digital Signal Processing for Multimedia Systems (Dekker, 1999). He received the 2001 W.R.G. Baker prize paper award from the IEEE, a Golden Jubilee medal from the IEEE Circuits and Systems Society in 1999, a 1996 Design Automation Conference best paper award, the 1994 Darlington and the 1993 Guillemin-Cauer best paper awards from the IEEE Circuits and Systems society, the 1991 paper award from the IEEE signal processing society, the 1991 Browder Thompson prize paper award from the IEEE, and the 1992 Young Investigator Award of the National Science Foundation.
Dr Parhi has served on editorial boards of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, the IEEE TRANSACTIONS ON SIGNAL PROCESSING, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-PART II: ANALOG AND DIGITAL SIGNAL PROCESSING, the IEEE TRANSACTIONS ON VLSI SYSTEMS and the IEEE SIGNAL PROCES-SING LETTERS. He is an editor of the JOURNAL OF VLSI SIGNAL PROCESSING. He is the guest editor of a special issue of the IEEE TRANSACTIONS ON SIGNAL PROCESSING, two special issues of the JOURNAL OF VLSI SIGNAL PROCESSING and a special issue of JOURNAL OF ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING. He served as technical program co-chair of the 1995 IEEE Workshop on Signal Processing and the 1996 ASAP conference, and is general chair of the 2002 IEEE Workshop on Signal Processing Systems. He serves on numerous technical program committees, and was a distinguished lecturer of the IEEE Circuits and Systems society (1994 -1999) . He is a fellow of IEEE. Finland, as an associate professor. He was also a coordinator of National Microelectronics Programme of Finland during 1987 -1991. Since January 1992 he has been with the Royal Institute of Technology (KTH), Stockholm, Sweden, where he is a professor of electronic system design. His current research interests are VLSI circuits and systems for wireless and broadband communication, and related design methodologies and prototyping techniques. He has made over 400 presentations and publications on IC technologies and VLSI systems worldwide, and has over 16 patents pending or granted. He has served in TPCs of many conferences and was the conference chairman of European Solid State Circuits Conference in 2000. Currently he is the dean of School of Information Technology at KTH.
