As process technologies scale into deep submicrometer region, crosstalk delay is becoming increasingly severe, especially for global on-chip buses. To cope with this problem, accurate delay models of coupled interconnects are needed. In particular, delay models based on analytical approaches are desirable, because they are not only largely transparent to technology, but also explicitly establish the connections between delays of coupled interconnects and transition patterns, thereby enabling crosstalk alleviating techniques such as crosstalk avoidance codes. Unfortunately, existing analytical delay models, such as the widely cited model in [1], have limited accuracy and do not account for loading capacitance. In this brief, we propose analytical delay models for coupled interconnects that address these disadvantages.
I. INTRODUCTION
Crosstalk caused by coupling capacitance between adjacent wires leads to additional delay to multiwire buses. As the process technologies scale into deep submicrometer region, coupling capacitance between adjacent wires and hence crosstalk delays increase greatly. According to the International Technology Roadmap of Semiconductors [2] , gate delay decreases with scaling, while global wire delay increases. Hence, the crosstalk delay problem is becoming increasingly severe in VLSI designs, especially for global on-chip buses, and will become the performance bottleneck in many highperformance VLSI designs.
This brief focuses on analytical delay models applicable to general RC-coupled interconnects. Although various delay models of interconnects have been proposed in the literature [1] , [3] - [7] , few are comparable with our work in this brief. Some delay models [3] , [7] do not consider crosstalk from adjacent wires. In addition, most previously proposed delay models are based on numerical approaches [3] , [5] - [7] . They can achieve high accuracy, but they have several drawbacks, such as bulky lookup tables, dependence on technology, poor portability, and high complexity. A widely cited analytical delay model proposed in [1] and [4] , which uses the similar methodology to that in [8] , appears to be the most comparable previous delay model to our work in this brief.
Based on the model in [1] and [4] , the delay of the kth wire (k ∈ {1, 2, · · · , m}) of an m-bit bus is given by
where λ is the ratio of the coupling capacitance between adjacent wires and the ground capacitance of each wire, τ 0 is the intrinsic delay of a transition on a single wire, and k is 1 for 0 → 1 transition, -1 for 1 → 0 transition, or 0 for no transition on the kth wire. We observe that in this model, the delay of the kth wire depends on the transition patterns of wires k − 1, k, and k + 1 only. Because all possible values of T k in (1) are (1 + iλ)τ 0 for i ∈ {0, 1, 2, 3, 4}, all transition patterns on wires k − 1, k, and k + 1 can be divided into five classes according to their corresponding i. These five classes are denoted as iC for i ∈ {0, 1, 2, 3, 4} (this classification was also used in [9] ). With this model, various CACs [9] - [12] have been proposed, based on the central idea of achieving a reduced delay by limiting transition patterns over the bus, at the expense of additional wires. However, the model in [1] has two significant drawbacks. First, it has limited accuracy. In a bus with more than three wires, it tends to overestimate the delays of 1C-4C patterns and underestimate the delay of 0C pattern. This is partially because of the model's dependence on only three wires and Elmore delay used in derivation [13] . The second drawback of the model in [1] is that it does not account for the loading capacitance. It has been shown that the loading capacitance is crucial in real practice and can affect the total delay for all patterns.
Addressing these disadvantages for the model in [1] , in this brief we propose analytical delay models for coupled interconnects. Our delay models differ from the model in [1] in three aspects. First, in our delay models, we eschew the Elmore delay used in the model in [1] . Then, we consider either three wires or five wires in our delay models for improved accuracy. Because of these two differences, our models have significantly improved accuracy than the model in [1] . Finally, we consider the buffer effects (driver resistance and loading capacitance). Our delay models also maintain the simplicity of the model in [1] , and the transition patterns are divided into several categories based on their delays. Hence, our delay models are easy to use and conducive to the design of crosstalk avoidance codes (CACs). Although our delay models consider adjacent three and five wires in this brief, our models are applicable to buses of any number of wires.
The rest of the paper is organized as follows. In Section II, we propose our delay models. The delay models are also modified to account for the buffer effects. In Section III, we present extensive simulation results for our delay models. Concluding remarks are provided in Section VI.
II. DELAY MODEL

A. System Model
In this brief, we focus on global interconnects with regular structures, which have uniformly distributed parameters and are parallel routed in the same metal layer without turnings. Partially coupled buses are not considered in this brief. In addition, our delay models do not consider the inductance effect for two reasons. First, it seems difficult to derive a closed-form expression of the signals on the bus with consideration of inductance effect. Second, according to the criteria in [14] , the inductance effect is significant in some cases, but is negligible in other cases. For our work with 5-mm-long buses based on a 45-nm technology, the inductance effect is negligible.
In this brief, we use the distributed RC model for interconnect modeling. For an m-wire bus, V i (x, t) is the transient signal at a position x along wire i for i ∈ {1, 2, · · ·, m}. r and c are the resistance and capacitance per unit length, respectively. In addition, λ c is the coupling capacitance per unit length between two adjacent wires. The output resistance of a driver is approximated as a linear resistor, R S , and the loading due to a receiver is modeled as a capacitance, C L . In this brief, we focus on a uniformly distributed bus and hence assume the parameters r , c, and λ are the same for all the wires. See the extended version of this brief in [15] for more details.
We use the 50% delay, which is defined as the time difference between the respective instants when the input signal and correspond-ing output signal cross 50% of the supply voltage V dd , and denote by T iC m the worst delay of the middle wire (wire m + 1/2) of an m-wire bus for all iC patterns. We assume ideal step signals are applied on the bus directly. In this brief, we use the same classification iC for i = 0, 1, 2, 3, 4 in [1] and focus on the worst 50% delay of any wire for all classes to formulate our delay models. We consider the neighboring wires for crosstalk, because farther wires have weaker coupling effects.
In this section, we first derive delay models by assuming that the buffer effects (driver resistance and loading capacitance) are negligible. Then, in Section II-E, we modify the delay models to account for the buffer effects, which are crucial in real practice. It has been shown that the buffer effects would increase the total delay for all patterns.
B. Internal Wires for Three-Wire Model
In [16] , the crosstalk of two coupled lines was described by partial differential equations (PDEs), and a technique for decoupling these highly coupled PDEs was introduced by using eigenvalues and corresponding eigenvectors. Using the same technique as in [16] , we obtain the differential equations describing a three-wire bus with length L as follows:
The boundary conditions are given by
where V p i and V f i are the initial and final voltages of the transition on wire i, respectively.
We find the three eigenvalues of C/c, p 1 = 1, p 2 = (1 + λ), and p 3 = (1 + 3λ), and their corresponding eigenvectors e i 's,
where
Using appropriate initial conditions, we solve
. By solving V 2 (L , t) = 0.5V dd , we can approximate the 50% delay of a three-wire bus for different transition patterns.
In this brief, we use ↑ to denote a transition from 0 to the supply voltage V dd (normalized to 1), -no transition, and ↓ a transition from V dd to 0.
For 0C pattern ↑↑↑, the output of wire 2 is given by [16] 
For the 50% delay, keeping only the first exponential term is accurate enough. So we have V 2 (L , t) . 
we keep only the first exponential term as the solution for other cases.
Similarly, the closed-form expressions of wire 2 and approximate delays for other classes are derived and summarized in Table I , where T iC 3 the approximate delay for iC pattern by our three-wire model.
C. Internal Wires for Five-Wire Model
To further improve the accuracy of delay, we include two extra adjacent wires to approximate the delay by considering the influences of all five wires. Each wire has three kinds of transition: ↑, -, and ↓. Hence, for such a five-wire bus, there are 3 5 transition patterns. To maintain the simplicity of our models, we still divide them into five classes (iC, i ∈ {0, 1, 2, 3, 4}) based on the transition patterns of middle three wires (wires 2, 3, and 4). Hence, there are nine different transition patterns for each pattern of the same class.
Because the interconnect is a linear system, any pattern can be decomposed into a combination of patterns with transitions on a single wire. However, this approach would result in expressions that are hard to analyze. Instead, we propose to group these individual wires to form some special patterns, which can be analyzed easily.
Definition 1-Reducible Transition Pattern (RTP): An RTP in the five-wire model is defined as a transition pattern that can be reduced to a transition pattern in the three-wire model. For instance, ↓-↑-↓ can be reduced to ↓↑↓ with a half coupling factor λ 2 . The set of RTPs is given by {↑↑↑↑↑, ↓↓↓↓↓, ↓-↑-↓, ↑-↓-↑} for the five-wire model. For the transition ↑↑↑↑↑ (similarly for ↓↓↓↓↓), the delay is given by ln 8 π τ . For ↓-↑-↓ (similarly for ↑-↓-↑), the delay is given by ln ( 16 3π 
Definition 2-Single Transition Pattern (STP):
An STP is defined to be a transition pattern with transitions on only one wire. For our five-wire model, we focus on the set of STPs with transitions on wire 2 or 4,
The expressions of wire 3 can be approximated by considering wires 2, 3, and 4 as a three-wire model. Let V i j (x, t) is the signal on wire j because of coupling from wire i. For example, by ignoring coupling from wires 1 and 5 in -↑---, the output of wire 3 is approximated by V 2 3 (L , t)
, which is obtained by considering only wires 2, 3, and 4.
The following approaches are used to derive the delay of the five-wire bus. First we decompose the worst pattern in each class into a combination of an RTP and STP(s). Then we combine the expressions of the RTP and STP(s) for the middle wire based on 
the conclusion of our three-wire model. Finally, we evaluate the expression of the middle wire to approximate its delay.
Because the performance is limited by the worst-case delay in each class, we need to approximate the delays of only the worst patterns in each class. We use simulation to identify the worst patterns in all classes. The worst patterns for each class and their decompositions with RTPs and STPs are shown in Table II .
The closed-form expressions of wire 3 and approximate delays for all classes in a five-wire bus are derived and summarized in Table I , where T iC 5 is the approximate delay for iC pattern by our three-wire model.
D. Boundary Wires
In the previous derivation, we focus on middle wires and consider four neighboring wires (two to the left and two to the right) for crosstalk. For boundary wire 1 (wire m) of an m-wire bus, we consider wires 2 and 3 to the right (wires m − 2 and m − 1 to the left) for crosstalk, and use the same classification as in (1) [1] . Note that for wires 1 and m, there are only three classes of patterns, 0C, 1C, and 2C. With the similar technique, the closed-form expressions of wire 1 (wire m) and approximate delays for all classes are derived and summarized in Table III , where T iC b1 is the approximate delay for iC pattern. For wire 2 (wire m − 1) of an m-wire bus, we consider wire 1 to the left and wires 3 and 4 to the right (wires m −3, m −2 to the left and wire m to the right) for crosstalk. Similarly, the closedform expressions of wire 2 (wire m − 1) and approximate delays for all classes are derived and summarized in Table IV , where T iC b2 is the approximate delay for iC pattern. 
) AND APPROXIMATE DELAYS FOR THREE-AND FIVE-WIRE BUSES
WITH BUFFER EFFECTS. V 2 (L , t) = 1 − b 1 B 1 e −(t/τ 1 ) − b 2 B 2 e −(t/τ 2 ) , V 3 (L , t) = 1 − b 3 B 3 e −(t/τ 1 ) − b 4 B 4 e −(t/τ 2 ) − b 5 B 5 e −(t/τ 3 ) , B 1 = B 3 = 1.01(R T + C T + 1/R T + C T + (π/4)), B 2 = B 4 = 1.01(R T + C * T + 1/R T + C * T + (π/4)), B 5 = 1.01(R T + C † T + 1/R T + C † T + (π/4)), τ 1 = (RC(R T C T + R T + C T + ( 2 π ) 2 )/1.04), τ 2 = ((1 + 3λ)RC(R T C * T + R T + C * T + ( 2 π ) 2 )/1.04), τ 3 = ((1 + 3 2 λ)RC(R T C † T + R T + C † T + ( 2 π ) 2 )/1.04), R T = (R S /R), C T = (C L /C), C * T = (C L /(1 + 3λ)C), C † T = (C L /(1 + (3λ/2))C), C = cL, R = r L, f 1 = -ln (1/4) + (1/2) √ (1/4) + (3/2B 4 ) , AND f 2 = -ln (1/8) + (1/2) √ (1/16) + (3/2B 4 )
E. Revised Models With Consideration of the Buffer Effects
In the previous derivation, the buffer effects are ignored with assumption that the driver resistance and loading capacitance are relatively small. In practice, the values of resistance and capacitance vary with different structure of buffers. In this brief, we consider drivers and receivers implemented as noninverting inverter chains. The loading capacitance C L and driver resistance R S are due to the first and last stage inverters in the chain, respectively. The buffer strength is measured by the normalized size of inverter to the smallest inverter. For global interconnects in submicrometer technology, the loading capacitance is not significantly large in comparison with that of interconnect. According to [17] , for a 45-nm technology [18], the loading capacitance C L induced by a 100 times inverter is given by 25 fF. In this brief, we consider loading capacitance as large as 100 fF. For significantly large C L , the delay due to C L would dominate the total propagation delay and all classes of patterns would collapse into one class. With consideration of the buffer effects of R S and C L , the revised models for three-and five-wire buses are shown in Table V .
F. Discussions
In the derivation of our five-wire model above, we focus on the worst-case patterns of the middle wires only. We also derive delay models for boundary wires. In the following, we show that our fivewire model can be easily applied to approximate the delays of an m-wire bus (m > 5). First, we use our five-wire delay model as a shift window to scan the internal wires (wire 3 through m − 2) to identify the longest delay. Then, for boundary wires (wires 1, 2, m − 1, and m), we use the models in Tables III and IV for delay approximation. Hence, the delay of an m-wire bus is given by the largest delay among all wires. For example, for a pattern ↑↓↑↓↓↓ of a six-wire bus, the classes for wires 1 through 6 are given by 2C, [1] . ALL THE DELAYS ARE IN ps 4C, 4C, 2C, 0C, and 0C, respectively. Thus, the worst-case class is given by 4C. According to our models in Tables I, III , and IV, the worst-case delay is given by the larger one of the two delays 6.540(1 + (2 − √ 2)λ)τ and (ln (32/3π)) (1 + 3λ)τ . In previous subsections, we assume simultaneous transitions on all the wires. However, for global buses where buffer insertion techniques are usually used to reduce their delay [19] , simultaneous signal transitions on the bus cannot be guaranteed. We observe two possible scenarios with regard to the impact of asynchronous transitions on the delay of our three-and five-wire models. When the time differences are relatively small, the delay is increased only by the time differences. When the time differences are sufficiently large, they can change the worst delay of a class to a different class. Intuitively, this is because large time differences change the intended transition patterns into different patterns. This is consistent with the observation in [20] .
III. PERFORMANCE EVALUATION
We evaluate the performance of our delay models, and compare it with that of the model in [1] in three scenarios. First, because our five-wire model focuses on five adjacent wires, we consider a fivewire bus. This scenario is also motivated by partial coding schemes [9] - [11] , which divide a wide bus into sub-buses and separate them by shielding wires. The second scenario is buses with more than five wires. We have run extensive simulations on buses with an odd number of wires (up to 33 wires). Our conclusions are the same regardless of the number of wires. For brevity, we present our simulation results for a 33-wire bus. In the third scenario, we assume the transition patterns are limited to those of CACs and consider the worst-case delays for all wires of an 8-wire bus. More simulation results can be found in the extended version of this brief [15] .
All the simulation results in this brief are obtained from HSPICE based on a 45-nm technology with 10 metal layers [18] . We focus on global buses in the top metal layer 10 with a ground metal layer 8 down below. The bus parameters are obtained by structure 1 in [21] . All wires are uniformly distributed with a length L = 5 mm, width w = 0.8 μm, spacing s = 0.8 μm, thickness t = 2 μm, and height to ground h = 4.82 μm. The unit length resistance, inductance, capacitance, and coupling capacitance are given by r = 13.75 /mm, l = 1.736 nH/mm, c = 8.263 fF/mm, and c c = 101.136 fF/mm, respectively. The permittivity of the dielectric between metals is K ILD = 2.5. Because the model in [1] does not account for the loading capacitance, we assume C L = 0 fF for simulations in comparison with the model in [1] . The coupling factor is given by λ = c c /c . = 12.2. The buses are divided into 100 sections to characterize the distributed RC model.
A. Five-Wire Bus
For a five-wire bus, the worst delays of all classes of transition patterns based on our five-wire model are compared with those of the model in [1] as well as the simulated delays by HSPICE in Table VI , where T d denotes the simulated worst-case delay of wire 3 for all iC patterns, T iC 5 the approximate delay for iC pattern by our five-wire model, and T 3 by the model in [1] . The error percentages of our model and the model in [1] are shown in Table VI . For a five-wire bus the maximum and minimum errors by our model are 34.41% and 1.59%, respectively, in comparison with 84.28% and 16.50% by the model in [1] , respectively. As Table VI shows, our five-wire model is more accurate than the model in [1] for all patterns in a five-wire bus. In particular, although the delays in the model in [1] were claimed to be upper bounds on the actual delays, our simulation results in Table VI show that this claim is invalid for the 0C patterns. A method that achieves a delay of τ 0 by surrounding each data wire with two shield wires with the same transition was proposed in [20] . Because the transition patterns for each data wire are always in 0C class, the delays of the data wires are τ 0 according to the model in [1] . In contrast, the delay for the data wires can be as large as 0.165(1 + 3λ)τ by our model; when λ is large, the model in [1] severely underestimates the delay, while our model is more accurate.
B. 33-Wire Buses
For a 33-wire bus, we focus on the delay of the middle wire (wire 17). We make the following three assumptions to reduce the time-consuming searching of all 3 33 transition patterns: 1) The worst patterns in each classes are symmetric; 2) Closer wires have greater coupling effect to the middle wire; 3) The middle three wires are initialized to a iC pattern with all other wires in opposite transitions to the middle wire. Then, the patterns with largest delays are obtained via Alg. 1, where m is odd and P i the updated transition pattern after the i-th iteration. Though it is difficult to verify the three assumptions for 33-wire buses due to the prohibitive complexity, we did verify for 9-and 11-wire buses that the worst cases for all the classes based on Alg. 1 are indeed the worst cases by exhaustive search.
The worst transition patterns for each class in a 33-wire bus, with respect to the three assumptions above, are listed in the second column of Table VII , where the pattern on wires 16, 17, and 18 are shown in the parenthesis. The simulated worst-case delays Td of wire 17 are compared with those in our five-wire model and [1] . The maximum and minimum errors by our model are only 45.23% and 5.95%, respectively, in comparison to 86.87% and 7.61% by the model in [1] , respectively, as shown in Table VII . Again, for all classes except 1C, our five-wire model outperforms the model in [1] . The model in [1] also has a large error percentage for 0C. Based on extensive simulation results, we conjecture that our five-wire model would be more accurate than the model in [1] for buses with any number of wires.
C. Performance of CACs
In the simulation results above, we assume the transition patterns are arbitrary. Herein, we assume the transition patterns are limited to those of CACs. We evaluate the performance of our delay model for three families of CACs [9]- [11] : one Lambda codes (OLCs), forbidden pattern codes (FPCs), and forbidden overlap codes (FOCs). With our five-wire model, the worst delays of aforementioned CACs are shown in Tables I, III, and IV. With the model in [1] , the worst delays of aforementioned CACs are approximated by (1 + λ)τ 0 , (1 + 2λ)τ 0 , and (1 + 3λ)τ 0 , respectively. Because the number of transition patterns is a quadratic function of the number of codewords, it is timeconsuming to simulate a large bus to obtain the worst-case delays on all wires. Hence, for each CAC, we simulate an 8-wire bus. The numbers of codewords of OLC, FPC, and FOC are given by 16, 68, and 149, respectively. The total numbers of transition patterns for OLC, FPC, and FOC are given by 240, 4556, and 22052, respectively. As shown in Table VIII , our delay models are more accurate than the model in [1] for all three families of CACs.
IV. CONCLUSION
In this brief, we propose improved analytical delay models for coupled interconnects. We first derive closed-form expressions of the signals on the bus, with the distributed RC model, and then approximate the delays of different patterns by evaluating these closed-form expressions. We focus on three-and five-wire models, and simulation results show that our model has better accuracy than the model in [1] . Although our models are based on three-and fivewire buses, they are not limited to these two cases. For a bus with more than five wires, our five-wire model can still approximate delays better than the model in [1] .
I. INTRODUCTION
The synchronous paradigm has been classically employed in the design of digital circuits because of its ability to abstract time as a discrete amount. This is enabled at the cost of ensuring balanced clock distribution to all registers of a circuit, and fulfilling a set of timing constraints related to the clock signal. Though this was relatively easy to achieve for most technology nodes in the last decades, fully synchronous design is getting increasingly overconstrained because of the advance of technology into deep submicrometer (DSM) nodes and the consequent increases in process variations and in sensitivity to voltage and temperature variations. Asynchronous circuits can be more tolerant to such variations and can become increasingly more relevant for the VLSI research community [1] .
The C-element is a fundamental primitive for building asynchronous logic and implementing the synchronization required by most handshaking protocols used in clockless or asynchronous design styles. It provides the basis for the local handshake needed in asynchronous data exchange. Besides, recent works show that C-elements are not only useful for clockless synchronization but also can be employed in antiglitch mechanisms, clock generators, memory circuits, and clock gating schemes for synchronous circuits [2]- [5] . Three classic static CMOS implementations are Martin's weak feedback [6] , Sutherland's pull-up pull-down [7] , and van Berkel's [8] . Static C-elements are typically preferred for asynchronous design because they guarantee that information inside it can be stored for unbounded periods. However, dynamic implementations of the C-element [1] provide gains in terms of area, power, and delay figures, by avoiding the use of active memory structures.
As far as we could verify, investigations of dynamic C-elements behavior in the literature are limited. This brief's two main original contributions are 1) a comprehensive analysis of the dynamic C-element electrical behavior and 2) a new design technique to ensure robustness to this component.
II. C-ELEMENT BEHAVIOR
Most of the asynchronous design techniques proposed to date require devices other than ordinary logic gates and flip-flops available
