ABSTRACT
Since most of previous works consider only capacitance effects on the bus structure to reduce delay, the worst-case switching pattern which incurs the largest delay is when adjacent wires simultaneously switch in opposite transition directions. However, considering the RLC circuit model for the bus structure, we find that the worst-case switching pattern with the largest on-chip bus delay is when all wires simultaneously switch in the same direction. In contrast, this worst-case pattern is the best-case pattern of a coupling RC model. Further, the best-case switching pattern with the RLC model is that the central wire of the bus switches in a different direction from all other wires that all switch in the same direction. However, this best-case pattern is just the worst-case pattern with the RC model. See Figure 1 for examples of the worst-and the best-case switching patterns on a 5-bit bus. Hence, as inductance cannot be neglected in today's high-performance circuit design, it is very important to consider RLC effects to develop the encoding schemes to reduce bus delay. With the findings of the best-and worst-case patterns, we propose a new encoding scheme for on-chip buses to minimize coupling delay with the dominance of inductance effects. The key idea is that inductance coupling effects should be alleviated by transforming the data sequences transmitting through on-chip buses. However, the architectures of the encoder and decoder should be of low complexity so that the power and delay overheads due to the codec circuitry can be compensated by the significant reduction of bus delay.
The rest of this paper is organized as follows. Section 2 first describes the parameters and the assumptions used in our study for the bus structure and then gives the working flow. Section 3 gives simulations by using the RLC model. The method and circuitry of our encoding (decoding) scheme is described in Section 4, and simulation results are shown in Section 5. Finally, Section 6 concludes the paper and discusses our future work.
2. PRELIMINARY In this work, we used the bus structure shown in Figure 1 to conduct our simulations. We assume that all drivers (receivers) have a uniform size and all signal wires have a uniform width, spacing, and length. The length, width, and pitch of the signal wire were 2000µm, 0.8µm, and 2µm, respectively. The respective width and pitch of the power/ground were 2µm and 13µm. The heights of all wires are set to 2µm. The signal rise/fall time was set to 100ps. With these feasible parameters [5, 6, 15] , we used the famous 3D field-solver FastCap [13] to extract the self and coupling capacitance and FastHenry [11] to extract the resistance, self inductance, and coupling inductance. Then with these extracted RLC parameters, we constructed the coupling RLC and RC circuit models. Both circuit models were constructed as π-segments using series resistance (or series resistance and inductance for RL) and shunt capacitance. Finally, the circuits were simulated by using HSPICE. The overall flowchart is illustrated in Figure 2 . In our simulations, we assumed that synchronous latches are located at the transmitter side. Thus all the signals switch at the same time on the buses, which is a very common assumption for buses [12] . Unlike [8] which assumes that aggressors can switch at arbitrary moments and victims are quiet, we assume that all signal wires only switch at the same time and may have arbitrary switching patterns, i.e., switching high or switching low. In addition, [8] tries to find the switching pattern and switching time resulting in the worst-case noise (WCN) defined as the maximum crosstalk noise peak on a quiet victim net. However, our work tries to find the simultaneously switching pattern that causes the maximum transition delay on a switching victim. Therefore, our wok is significantly different from [8] .
3. SIMULATIONS WITH THE RLC CIRCUIT MODEL In this section, we first simulate all switching patterns on the 5-bit bus structure considering the RLC effects of bus interconnects, and then increase wire capacitance to see whether the worst-case switching pattern will change or not as the wire capacitance becomes dominant. The simulation results of the extracted RLC circuit model for the 5-bit bus are shown in Table 1 . From Table 1 , we observe that the worstcase switching pattern changes from *↓↑↓* (for the RC circuit model) to ↑↑↑↑↑ and the best-case switching pattern changes from *↑↑↑* (for the RC circuit model) to ↓↓↑↓↓. Therefore, the worst-case and bestcase switching patterns are completely different considering RC and RLC effects. Therefore, as the process technology keeps shrinking and the clock frequency continues increasing, it is very important to consider RLC effects on the bus structure to derive encoding schemes to reduce bus delay. Otherwise, the encoding schemes might not improve or even worsen the on-chip bus delay because of the redundant logics and wires.
Further, we also observe that the largest overshoot noise occurs for the pattern ↑↑↑↑↑, as shown in Table 1 .
Why does the worst-case switching pattern ↑↑↑↑↑ result in the largest bus delay when considering RLC effects on the 5-bit bus. Theoretically speaking, this is mainly due to the two factors: (1) Inductance becomes dominant due to higher frequency and longer interconnects. Since the significant frequency fs [3] is defined as fs= 0.35/t r where tr is the signal transition time. Hence, the frequency of interest is 3.5 GHz as the rise time is set to 100 ps. Therefore, for the simulations when the wire length is 2000 µm and the frequency is 3.5 GHz, the inductance effects are much more significant than the capacitance effects. (2) It is also due to the long-range effect of inductance. From Faraday's Law [2] , as shown in Equation (1), the electromotive force induced in a closed circuit is equal to the negative rate of increase of the magnetic flux linking the circuit. We have
where Vj is the electromotive force induced in circuit loop j due to the 
is also positive. In conclusion, from Equation (1), the induced voltage on victim loop is negative; that is, the induced current on the victim wire flows in the reverse direction of the victim current. Hence, while all neighboring wires simultaneously switch in the same direction as the victim wire does, they will all induce a current of the different direction on the victim wire as shown in Figure  1(a) . Therefore, the charge current of the victim wire will be reduced. This implies that the charging time (delay) will increase due to the longrange coupling. We can conclude that as the inductance of wires becomes more significant than the capacitance, the worst-case switching pattern with the maximum delay is when all wires simultaneously switch in the same direction. Meanwhile, these patterns will also result in the largest noise between each other.
4. THE BUS-INVERT SCHEME Inspired by Stan's low-power bus-invert method [14] for reducing the transition activities to reduce the bus transition power, we propose a bus-invert method to reduce the on-chip bus delay due to coupling effects while inductance effects dominate. Our bus-invert method inverts the input data when the number of bits switching in the same direction is more than half of the number of signal bits. The remaining problem is how to implement the coding architecture with low complexity. For the implementation, we propose an encoder architecture shown in Figure 3 .
There are three types of possible signal transitions: type I: ↑ (switching from "0" to "1"), type II: ↓ (switching from "1" to "0"), and type III: 0 (no switching). If we refer to xi(n) as an input signal and to xi(n-1) as its previous input signal, then type I is (xi(n), xi(n-1)) = (1, 0), type II is (x i(n), xi(n-1)) = (0, 1), and type III is (xi(n), xi(n-1)) = (0, 0) or (1, 1). With the input xi(n) and xi(n-1), the codeword generator generates (qL, qH) = (0, 1) for type I, (1, 0) for type II, and (0, 0) for type III. Then all qL's are inputs to the majority voter (L) and all qH's to the majority voter (H). Finally, from the output of the majority voter L or H, we can detect if the number of type I or II transitions is more than half of the number of signal bits. If one of the majority voters' outputs is high, the input signal should be inverted. The majority voters can be implemented by using either a tree of full-adders or resistors combined with a voltage comparator [14] . Since the additional invert line will contribute to transitions, it should also be considered. Let N be the total number of signal bits of a bus excluding the invert line. The output of the majority voter is asserted when ⎡(N+1)/2⎤ inputs are high. If N is odd, the example encoder architecture is just as that shown in Figure 3(b) . Hence, after encoding, the worst-case switching pattern occurs when (N+1)/2 signal bits switching in the same direction, where N is odd. If N is even, the encoder architecture is somewhat different as that shown in Figure 3(a) . The major differences are that we need an extra input INV(n-1) for our encoder and INV(n) = INV(n-1)' or INV(n-1) depending on if INV_t is high or low. Hence, after encoding, the worst-case switching pattern is that N/2 signal bits switch in the same direction, where N is even.
The circuitry of the receiver is relatively simple because it only needs to conditionally invert the receiving data to get a correct data value. If N is odd, the receiving data need to be inverted only when the invert line is high. If N is even, the receiving data need to be inverted only when the invert line has a transition.
SIMULATION RESULTS
With the parameters given in Section 2, we conducted our simulations by varying bus signal bits with or without using the proposed bus-invert method. The simulation results are shown in Figures 4 and 5 .
From Figure 4 , we observe that coupling inductance has greater impacts on bus delay as the number of bus bit lines increases. For a tight LC cross-coupled bus, as shown in Figure 4 , the increase (%) of the worst-case switching delay grows about linearly with the number of bus bit lines. Hence, for a high-frequency, tight LC cross-coupled bus, the delay due to signals simultaneously switching in the same direction should be considered.
As shown in Figure 5 , our encoding method can significantly reduce the worst-case switching delay. Besides, our encoding method can obtain an even better reduction rate as the number of bus bit lines increases. We also compare our method with the conventional shield insertion technique which inserts one shielding wire in the middle of the bus. As Shown in Figure 5 , our method outperforms the conventional shield insertion technique when the number of bus bit lines is greater than 4. However, since the encoder architectures for even-bit and odd-bit buses are slightly different, the delay reductions are also slightly different. For an N-bit bus, if N is odd, the worst-case switching pattern after encoding is (N+1)/2 signal bits (including the INV line) switching in the same direction. For N is even, the worst-case pattern after encoding is that only N/2 signal bits (including the INV line) switch in the same direction. Hence, the reduction of worst-case delay for even-bit buses is more significant than that of oddbit buses when the number of bits is larger than 5 (see Figure 5) . We should also note that for the 2-bit bus, our encoding method will worsen the worst-case delay because the additional INV line will introduce large additional coupling to the victim line. In other words, the worst-case delay after encoding for 2 bit lines plus one INV line will be larger than that for 2 bit lines alone. 
Majority Voter (H)
Majority Voter (H) Although our bus-invert method is mainly intended for delay reduction, it is also effective for inductive noise reduction. As shown in Table 2 , the average reduction of maximum noise is about 17%, which is about the same as the well-known shield insertion technique. Further, our method has the side effects of reducing ground-bounce noise (not included in Table 2 ) because the worst-case ground-bounce noise occur when all signal wires switch in the same direction (large charging or discharging current changes)---our method can avoid such a worst-case configuration while the shield insertion technique cannot. Since previous works of encoding for bus delay reduction such as [1, 17, 20] only consider coupling capacitance, the real worst-case pattern due to coupling LC might happen with their encoding. Besides, they might use more than one additional line in their encoding scheme when the number of bit lines is more than 8. Therefore, their encoding schemes might not be suitable for the high-performance bus applications when inductance effects become significant.
CONCLUSION & DISCUSSIONS
In this paper, we have shown that the inductance effects have changed the worst-case switching pattern with the maximum bus delay. For a 5-bit bus structure, the worst-case switching pattern is *↓↑↓* or *↑↓↑* considering RC effects, but the worst-case pattern changes to ↑↑↑↑↑ or ↓↓↓↓↓ considering RLC effects. Hence, we shall consider both the RC and the RLC effects to derive effective encoding schemes for bus delay optimization.
We have also conducted simulations considering RLC effects on the bus structure when wire capacitance becomes dominant. We have observed that the worst-case switching pattern is also different from the one considering RC effects. The difference is due to long-range inductive coupling.
We have also proposed a bus-invert method to reduce the worst-case on-chip bus delay with the dominance of inductance coupling effect. Simulation results have shown that our encoding method can significantly reduce the worst coupling delay of a bus. In the future, we intend to develop a more sophisticated bus-invert scheme to further reduce the inductive coupling delay.
Our encoding scheme is recommended for the cases when buses or parallel signal wires are about thousands of µm long and work above GHz frequencies. At such working frequencies, the gate delay overhead of our encoder should be small enough. If we choose the full-adder tree to implement the majority voter, the delay of majority voter is O(log3 N)*(full-adder delay), where N is the total number of signal bits of a bus. In other words, if N is very large, our encoder may cause timing violations. To solve this problem, we can divide the original bus into sub-buses by inserting ground wires between sub-buses. Hence, the overall problem is a gate delay (and thus process) dependent optimization problem. Therefore, we shall solve this problem in our future work.
