Crosstalk noise is one of the serious reliability concerns in nanoscale integrated circuits. Repeater insertion together with shielding wires is a typical method to suppress crosstalk noise associated with global data bus. A new crosstalk-noise-aware bus coding scheme with ground-gated repeaters is proposed in this paper to minimize the routing overhead as well as power consumption of data bus systems. A subset of 4-to-6 crosstalk-noise-aware codes is selected to minimize the number of simultaneous data transitions. The routing overhead is reduced by 12.31% with the new bus coding scheme compared to the conventional data bus with shielding wires. Furthermore, the leakage power and worst-case active power consumptions are reduced by 12.5% and 18.26%, respectively, with the new crosstalk-noise-aware data bus system as compared to the previously published bus coding system in an industrial 40nm CMOS technology.
INTRODUCTION
Higher quality-of-experience requirements of integrated circuits lead to increased number of on-chip transistors that boosts both performance and functionality. The chip areas therefore tend to increase in each new CMOS technology generation despite the miniaturized transistors [1] - [3] . The global interconnects, which are elongated with larger chips, suffer from serious signal integrity issues such as crosstalk noise, which is primarily caused by capacitive coupling between interconnects [4] , [5] . Crosstalk noise causes longer propagation delay, higher power consumption, and degraded noise margin in nanoscale CMOS integrated circuits.
Repeater (inverter/buffer) insertion is the most commonly used technique to deal with signal integrity issues in global interconnects [6] - [9] . By inserting repeaters along a long wire, each segment of the divided interconnect is strongly driven by the inserted buffers. The immunity of interconnects to crosstalk noise is thereby enhanced. Hundreds of thousands of repeaters are inserted along global interconnects, a significant amount of which are used for data buses, in state-of-the-art microprocessors [2] , [3] . These repeaters consume huge amount of dynamic power in active mode and leakage power in idle status. Development of novel low-power and robust data bus system where a large number of repeaters drive long wires is therefore highly desirable.
A new crosstalk-noise-aware bus coding scheme with low-power ground-gated repeaters is proposed in this paper. The new bus coding scheme is employed to minimize the crosstalk noise between neighboring global interconnects with minimal aid of power or ground shielding. The routing overhead is thereby reduced with the new bus coding scheme compared to the conventional power and ground wire shielding techniques. The sizes of the sleep transistors are miniaturized with the new bus coding scheme, thereby reducing the leakage power consumption compared to a previously published data bus system with ground-gated repeaters. Furthermore, the switching probability of the data along the global buses is reduced with the proposed bus coding scheme, thereby reducing the active power consumption compared to the previously published ground-gated repeater technique for driving long global interconnects in complex nanoscale CMOS integrated circuits.
The paper is organized as follows. The existing crosstalk noise reduction techniques are reviewed in Section 2. The new crosstalk-noise-aware bus coding scheme with low-power ground-gated repeaters is presented in Section 3. Different crosstalk-noise-aware data bus systems are optimized and evaluated in Section 4. The paper is summarized in Section 5.
EXISTING CROSSTALK NOISE REDUCTION TECHNIQUES
Since crosstalk noise is a critical reliability issue in integrated circuits, various bus coding techniques have been proposed to mitigate the impact of crosstalk noise. The analysis of crosstalk noise is presented in Section 2.1. The existing crosstalk-noise-aware bus coding schemes are introduced as well. In Section 2.2, the previously published crosstalk-noise-aware bus coding scheme with ground-gated buffers is reviewed.
Crosstalk Noise on Data Bus and Noise Reduction
The amplitude of crosstalk noise is affected by the switching signal patterns on adjacent wires. There are three possible signal switching scenarios on a single wire: switching from 0 to 1 (↑), switching from 1 to 0 (↓), or no switching activity (-). Combining all the possible signal switching patterns on two adjacent wires, there are four types of switching scenarios as listed in Table I . For Type 1, there is no switching activity on both wires, thereby resulting in no crosstalk noise. For Type 2, the voltage level on one wire switches either from 0 to 1 or from 1 to 0, while the other wire has no switching activity. Crosstalk noise is induced on the quiet wire due to the coupling capacitance between the two wires. For Type 3, both wires switch to the same direction, so that the coupling capacitance is weakly charged or discharged. There is therefore marginal crosstalk between the two adjacent wires for Type 3 switching scenario. For Type 4, two wires switch in the opposite directions. In this case, the effective coupling capacitance between these two wires is significantly higher than the other three scenarios. The maximum (worst-case) crosstalk noise therefore occurs under the Type 4 switching scenario.
Existing crosstalk noise reduction techniques typically aim to eliminate the worst-case crosstalk noise by avoiding the Type 4 signal switching scenario. The conventional method to suppress crosstalk noise between interconnects is to put a power/ground wire (shielding wire) between every two adjacent signal wires on a data bus [2] , [3] , as shown in Fig. 1 . Since the shielding wires are permanently connected to either VDD or ground, only Type 1 or Type 2 switching scenarios in Table I occur between the shielding wire and the signal wire. The conventional shielding method is therefore effective in eliminating crosstalk noise. However, large routing overhead is caused due to the additional wide shielding wires since the current electronic design automation (EDA) tools have difficulty in reusing those shielding power/ground wires.
In order to reduce the large routing overhead that is caused by the additional shielding wires, a bus coding methodology is proposed in [10] . Instead of transmitting the data on the original data bus directly, an encoder is inserted at the source of the data bus to encode the input data before data transmission, as shown in Fig. 2 . The encoded data is transmitted on the data bus to the sink where a decoder is used to translate the encoded data into the original value. These codes are commonly referred to as Crosstalk Avoidance Codes (CACs) [14] . A unique feature of the encoded data is that no Type 4 switching scenario exists between any two adjacent wires on the bus since crosstalk-noise-aware bus coding is employed in this scheme. No shielding wire is needed to be inserted between two adjacent signal wires on the bus, thereby significantly reducing the number of global wires in the bus system. A similar technique, which is commonly referred to as forbidden pattern coding, is proposed in [11] to prohibit 010 and 101 patterns from codewords, thereby eliminating the Type 4 switching scenario. While reducing the number of global wires along the bus compared to the conventional shielding method, the forbidden pattern coding technique requires similar amount of global wires compared to the crosstalk avoidance coding technique in [10] . A variety of follow-up works propose different techniques to generate either the Crosstalk Avoidance Codes or the Forbidden Pattern Codes (FPCs) in an efficient way [12] - [15] . However, these works only focus on the coding scheme, yet overlook the power consumption issue of the bus system. Alternatively, in this paper, a power-friendly crosstalk-noise-aware bus coding scheme is proposed to not only suppress the crosstalk noise (by eliminating the Type 4 switching scenario), but also minimize the power consumption of the data bus system. 
Crosstalk-Noise-Aware Bus Coding with Ground-Gated Buffers
Power/ground gating is the most commonly used circuit technique for leakage power reduction [18] - [20] . In a power/ground gated circuit, high threshold voltage (HVT) sleep transistors (header and/or footer) are used to cut off the power supply and/or the ground connections to an idle circuit block. Ground gating is used in [9] to suppress the leakage currents that are produced by the repeaters inserted along the data bus. A 3-to-4 crosstalk-noise-aware coding scheme that is proposed in [10] is adopted in [9] . A wide data bus is divided into multiple groups. Each group contains three bits which are encoded into four crosstalk-noise-aware bits (the last two bits are not encoded in a 32-bit data bus case, see Fig. 3 ). A power/ground shielding wire is required to be inserted between the adjacent groups of the 4-bit encoded data. Footer sleep transistors are used for repeater leakage reduction, as shown in Fig. 3 . Since two adjacent buffers along the same wire are separated from each other for long distance, sharing the same footer sleep transistor with all repeaters is difficult. Therefore, a column of repeaters along the bus, which are physically close to each other, shares a footer sleep transistor as shown in Fig. 3 . The ground gating bus coding scheme in [9] is effective in both suppressing crosstalk noise and reducing leakage power consumption of the interconnect buffers. However, the coding scheme that is used in [9] is far from optimized for both leakage power and dynamic power reductions while employing ground gating technique. In this paper, while employing group gating technique V DD / GND shielding wire for the buffers along the bus, alternative power-efficient crosstalk-noise-aware bus coding schemes are explored to achieve lower power consumption compared to the 3-to-4 crosstalk-noise-aware coding scheme that is proposed in [10] at the same crosstalk noise level. Fig. 3 . Crosstalk noise reduction with 3-to-4 noise-aware bus coding [9] . Ground-gating is applied to repeaters. HVT sleep transistors are represented with thick line in the channel region.
NEW CROSSTALK-NOISE-AWARE BUS CODING SCHEME WITH GROUND-GATED REPEATERS
New crosstalk-noise-aware bus coding scheme with low-power ground-gated repeaters is proposed in this section to reduce both the leakage power consumed by the inserted repeaters in idle status and the dynamic power consumed by the repeaters in active mode.
For the crosstalk-noise-aware bus coding system that is shown in Fig. 2 , the encoded data are defined as codeword [10] . A codeword is connected to another codeword if a transition between these two codewords does not incur Type 4 switching scenario. The neighbor set of one codeword is the set of codewords to which the codeword is connected. The degree (dn) of this neighbor set is the number of codewords inside this set [10] . The relationship between the data width n and dn for noise-aware bus coding is listed in Table II . For example, when n = 4 and dn = 8, eight 4-bit codewords exist. Each of these eight codewords can make a transition to another codeword within this set without incurring Type 4 switching. Since dn = 8, the maximum number (k) of input bits that can be encoded is 3 (2 3 = 8) as listed in Table II . Eight 3-bit input data can be encoded to eight 4-bit crosstalk-noise-aware data. A 3-to-4 encoder is used for encoding while a 4-to-3 decoder is used for decoding.
When 3-to-4 bus coding is utilized (the coding scheme that is used in [9] ), the eight crosstalk-noise-aware codes are listed in Table III . When one coded data transitions to another coded data, the new data is propagated through the repeaters along the bus. Provided that ground gating technique (with footer sleep transistors) is applied to the repeaters, the sizes of the sleep transistors are determined by the data patterns that are propagated through the repeaters. For example, when the coded data "1011" transitions to "1000", two repeaters among the eight repeaters in the two consecutive columns experience high-to-low transition at the outputs, as shown in Fig. 4(a) . The discharging currents go through the NMOS transistors inside the repeaters as well as the footer sleep transistor to the ground. With higher discharging currents, larger sleep transistors are required to maintain high data transmission rate along the data bus. The worst-case discharging scenario occurs when the coded data "1111" transitions to "0000" or when the coded data "0000" transitions to "1111", as shown in Fig. 4(b) .
For a 32-bit data bus, the bus is divided into 12 groups for encoding. The ten groups with three input bits are encoded to four bits for transmission on the bus. The last two groups with one bit in each group are not encoded. A power/ground wire is inserted between the adjacent groups since boundary wires between the 
VDD / GND shielding wire
adjacent groups cannot be guaranteed to avoid the Type 4 switching scenario. The total number of wires that are used for a 32-bit data bus is 55 (42 wires for data transmission + 13 shielding wires), as listed in Table II . For each group of the 4-bit encoded data, the maximum number of discharging repeaters in every two consecutive columns is four, as shown in Fig. 4(b) . Therefore, the maximum number of discharging repeaters in every two consecutive columns is 42 assuming a 32-bit data bus (which is encoded to a 42-bit data bus). Using the 3-to-4 coding scheme however maximizes the worst-case number of discharging repeaters. The footer sleep transistors therefore have to be sized significantly large to avoid serious performance degradation along the data bus. Alternative crosstalk-noise-aware bus coding schemes are explored in the paper for reducing the discharging currents through the footer sleep transistors, thereby minimizing the sizes of the sleep transistors. Provided that 4-to-6 coding scheme is employed, 21 crosstalk-noise-aware codes exist, as listed in Tables II and III . For 4-bit input data, 16 noise-aware codes are sufficient for the coding. Five codes among the 21 noise-aware codes are therefore redundant. Algorithm 1 is proposed to select the 16 required noise-aware codes while minimizing the discharging currents through the repeaters by using this 4-to-6 coding scheme. Based on the analysis for the 3-to-4 coding scheme, the amount of discharging currents that are caused by the transition from one encoded data to another encoded data is maximized when the number of different bits in the two encoded data is maximized. For example, there are 6 different bits between "111111" and "000000" while there are 4 different bits between "101111" to "000010". The transition from "111111" to "000000" also produces higher discharging currents as compared to the transition from "101111" to "000010". The encoded data that have the largest numbers of different bits compared to other encoded data are eliminated with Algorithm 1. For the 4-to-6 coding scheme, the maximum number of different bits between any two of the selected 16 codes is four. For a case of 32-bit data bus, the data bus is divided into eight groups for encoding. The eight groups with four input bits are encoded to six bits for transmission on the bus. The maximum number of discharging repeaters in every two consecutive columns is 32 assuming a 32-bit data bus (which is encoded into a 48-bit data bus). The maximum number of discharging repeaters is reduced by 23.81% with the 4-to-6 encoding scheme as compared to the 3-to-4 encoding scheme. The total size of the sleep transistors is therefore reduced with the 4-to-6 encoding scheme as compared to the 3-to-4 encoding scheme, thereby suppressing the leakage power consumption of the repeaters along the bus.
With the alternative bus coding scheme, such as 4-to-6 coding, the number of different bits between any two noise-aware codes is reduced as compared to the 3-to-4 coding. The amount of switching activities is therefore also reduced when the data on the global bus transitions from one to another. The dynamic power consumption, which is linearly related to the switching activities of the signals, is thereby also suppressed with these alternative bus coding schemes as compared to the 3-to-4 coding that is employed in [9] . for i: =1 to ini-1 19:
for j:=i to ini 20: diff := difference(NA_Code(i), NA_Code(j)) 21:
Store diff in an array diff_ar 22: end for 23: end for 24:
while ini > req 25:
Eliminate the maximum value in diff_ar and the two corresponding noise-aware codes 26:
ini := ini -2 27:
if ini < req 28:
Add one of the two just eliminated noise-aware codes back to NA_Code and stop 29: end if 30: end while
The maximum number of different bits between any two encoded data (equivalent to the maximum number of discharging repeaters) and the total number of wires that are required for a 32-bit data bus with different encoding schemes are listed in Table II . The maximum number of different bits between any two noise-aware codes can be minimized to 32 with multiple coding schemes, such as 2-to-3, 4-to-6, 6-to-9, and 8-to-12 coding methods, in a 32-bit bus system (the bus width is 32 before coding). The total number of wires that are used for a 32-bit bus system is minimized to 49 with 8-to-12 coding method among the coding schemes that are listed in Table II . However, the encoder and decoder with the 8-to-12 coding method have significant delay, power consumption, and area overheads as compared to coding schemes such as 4-to-6 coding. 4-to-6 coding scheme provides a better tradeoff among the effectiveness in suppressing power consumption, the amount of wires used for the bus coding system, and the complexity of encoder and decoder. The 4-to-6 coding scheme is therefore employed in this study and compared with the conventional bus system without coding as well as the bus system with 3-to-4 coding scheme.
Optimization and Evaluation of Crosstalk-Noise-Aware Bus Coding Systems
Different on-chip data bus coding schemes are optimized and evaluated in this section. An industrial 40nm low-power multi-threshold voltage CMOS technology is used in this study.
Optimization of Bus Coding Systems
The design and optimization of the encoders, decoders, repeaters, and wires in the bus system are discussed in this section. For a particular coding scheme, the selected crosstalk-noise-aware codes (without redundant codes) are a set of codes without any order. The mapping from the original codes to the selected encoded codes determines the design complexity and power consumptions of the encoder and decoder. A heuristic searching algorithm is used to find the most power efficient mapping between the original data and the coded data. Assume that X and Y are two binary codes with the same length. X and Y are both either original codes or encoded codes. Define a function dif_bit(X, Y) to measure the number of flipping bits between the two codes X and Y. For example, dif_bit(000,111) = 3 and dif_bit(011,110) = 2. We define a new parameter logic similarity (LS), where
dif bit e e e e e e e e      (1) for the 3-to-4 coding case. The general expression of LS is
where Bi and Ei are the original data and the encoded data, respectively. k is the number of input bits that are to be encoded. For 3-to-4 encoding, k=3. The value of LS is related to the complexity and the dynamic power consumption of the encoder/decoder. Larger LS corresponds to more complicated CODEC and higher power consumption due to the higher switching activities. To achieve the lowest power consumption with the encoders/decoders, an exhaustive searching is performed to find out the mapping from the original codes to the encoded codes with the lowest LS for both 3-to-4 encoding and 4-to-6 encoding. The mappings with the 3-to-4 coding scheme and 4-to-6 coding scheme are listed in Table IV . The global interconnects used for the data bus and the ground shielding wires are the top metal layer (Metal 8) in this industrial 40nm CMOS technology. The width of each global interconnect is 0.4μm, which is the minimum width that is allowed in the layout design rule. The distance between the adjacent global interconnect is 1.5μm, which is the minimum distance that is allowed in the layout design rule. The thickness of the Metal 8 is 1.25μm. Assume that after buffer insertion, the bus interconnect is divided into N segments, each of which has a length of l. By adopting the optimization method in [8] and the values of parameters from the 40nm CMOS technology, the optimum length lopt of interconnect between every two repeaters is 1095µm. When low threshold voltage (LVT) transistors with minimum channel length (L = 40nm) are used, the optimum width of the NMOS transistor in the repeaters is 21µm. The width of the PMOS transistor in the repeaters is sized to be 42µm to achieve symmetric pull-up and pull-down strengths in the repeaters.
Evaluation of Bus Coding Systems
Different on-chip data bus coding schemes, including the conventional data bus with ground shielding, the previously published 3-to-4 crosstalk-noise-aware bus coding with ground-gated repeaters, and the newly proposed 4-to-6 crosstalk-noise-aware bus coding with ground-gated repeaters, are evaluated in this section. 32-bit data bus is used in this study. For the conventional data bus with ground shielding, two types of repeaters are examined in this paper, namely repeaters with LVT short-channel (L = 40nm) transistors and repeaters with LVT long-channel (L = 45nm) transistors for leakage current suppression. Alternatively, repeaters with LVT short-channel (L = 40nm) transistors are used for the two data bus systems with encoding since ground gating technique is effective in suppressing the leakage currents produced by the LVT repeaters. The length of the data bus is assumed to be 8mm [9] . LVT repeaters are inserted along the data bus for every other 1095µm. 8 columns of repeaters are employed in each of the data bus. The layouts of the four data bus systems (including encoders and decoders) are drawn with Cadence Virtuoso. Post-layout SPICE simulations are performed to evaluate the different data bus systems. The delay and power consumptions of the encoders and decoders are included in the evaluation. The simulation temperature is 90˚C.
For the 3-to-4 and 4-to-6 bus coding schemes, LVT transistors are used for the design of the encoders and decoders. High threshold voltage (HVT) NMOS transistors are used as footer sleep transistors for the encoders, decoders, as well as the repeaters of the 3-to-4 and 4-to-6 bus coding systems. The footer sleep transistors are sized to achieve lower than 10% speed overhead along the critical signal path with the 3-to-4 and 4-to-6 bus coding schemes as compared to the conventional short-channel-repeater-based ground-shielded data bus without any gating. The sleep transistor widths for each column of repeaters in the 3-to-4 and 4-to-6 bus coding systems are 272µm and 216µm, respectively. The total widths of sleep transistors that are used in the 3-to-4 and 4-to-6 data bus systems are 2328µm and 1912µm, respectively, including the sleep transistors used for encoders and decoders as well. The new 4-to-6 bus coding scheme therefore reduces the sleep transistor size by 17.87% compared to the 3-to-4 bus coding scheme. For the data bus with long channel repeaters, the channel widths of the repeaters are increased to achieve lower than 10% speed overhead along the critical signal path compared to the conventional data bus with short channel repeaters. The sizes of the long channel repeaters are (W = 52.8μm, L = 45nm) for PMOS transistors and (W = 26.4μm, L = 45nm) for NMOS transistors.
The huge LVT repeaters in the conventional data bus system with ground shielding consume significant leakage power, as listed in Table V . The leakage power consumption of all the data bus systems is the average when all the input data are "0" and "1". While using repeaters with long channel transistors (L = 45nm), the leakage power consumption of the data bus system is reduced by 49.01% as compared to the short-channel-repeater-based data bus with ground shielding. Alternatively, due to the utilization of sleep transistors, the leakage power consumption is reduced by ~99% with both 3-to-4 and 4-to-6 bus coding systems as compared to the conventional ground-shielded data bus system without any gating, no matter when short-channel repeaters are used or long-channel repeaters are used. Furthermore, the 4-to-6 bus coding system reduces the leakage power consumption by 12.5% as compared to the 3-to-4 bus coding system due to the smaller sleep transistors.
The operating frequency of data bus systems is assumed to be 1GHz for the simulation of active power consumption. The highest active power is consumed when the number of switching bits is 42 and 32 on the coded bus of the 3-to-4 and 4-to-6 encoded data bus systems, respectively. The number of switching bits along the data bus is reduced by the 4-to-6 bus coding scheme compared with the 3-to-4 bus coding scheme. The 4-to-6 bus coding system thereby achieves 18.26% active power savings as compared to the 3-to-4 bus coding system under the worst-case switching scenarios. The long-channel-repeater-based data bus system with ground shielding consumes the highest worst-case active power among all the data bus systems that are evaluated in this paper primarily due to the larger repeater size. The worst-case active power consumption is increased by 30.74% by the long-channel-repeater-based data bus system with ground shielding as compared to the new 4-to-6 data bus system.
Exhaustive testing of the average active power consumption with the different data bus systems is infeasible. The average power consumption is related to the average switching probabilities of the data buses. By considering the code mapping that is listed in Table IV , the average number of switching activities in a 32-bit data bus system can be obtained by exhaustively counting the switching between any two codes (after coding). The new 4-to-6 bus coding achieves 9.24% reduction in terms of the average number of switching compared to the 3-to-4 coding scheme. 64 groups of random data patterns are applied to the four data bus systems to evaluate the average active power consumption. As listed in Table V , the 4-to-6 bus coding system achieves 6.36% active power savings as compared to the 3-to-4 bus coding system with random data patterns.
With the newly proposed 4-to-6 data bus system, the number of ground shielding wires is reduced significantly as compared to the conventional data bus with ground shielding. The 4-to-6 data bus system therefore reduces the total number of global wires by 12.31% compared to the conventional data bus with ground shielding, thereby avoiding routing congestion in the physical design of highly complicated integrated circuits in advanced technology nodes. The total silicon area can also be reduced with lower routing congestion by the proposed 4-to-6 bus coding scheme.
In order to evaluate the overall performance of the different data bus systems, a comprehensive Electrical Quality Metric (EQM) is evaluated next. EQM is
where Delay is the critical signal path propagation delay. Pleakage is the leakage power consumption of different data bus systems. Pworst_active and Paverage_active are the worst-case active power consumption and average active power consumption, respectively, of the data bus systems. Nwire is the total number of global interconnects on each data bus. The normalized EQM (with respect to the conventional short-channel-repeater-based data bus with ground shielding) is listed in Table V . The new 4-to-6 encoded data bus system achieves the highest EQM. The EQM is enhanced by 232.21x, 141.59x, and 23.06%, with the 4-to-6 encoded data bus system as compared to the conventional short-channel-repeater-based data bus with ground shielding, the long-channel-repeater-based data bus with ground shielding, and the 3-to-4 encoded data bus system, respectively.
CONCLUSIONS
A new 4-to-6 crosstalk-noise-aware bus coding scheme with ground-gated repeaters is proposed in this paper for robust low power data bus systems. The proposed crosstalk-noise-aware bus coding scheme reduces the number of global wires that are used in the system, avoiding routing congestion in the physical design of highly complicated integrated circuits. The new 4-to-6 bus coding scheme reduces the number of switching data bits in the system, thereby enabling 17.87% miniaturization of sleep transistors, 12.5% leakage power savings, 18.26% worst-case active power savings, and 6.36% average active power savings, as compared to the previously published 3-to-4 bus coding scheme. The 4-to-6 bus coding scheme also enhance the overall performance of the data bus system by 23.06% compared to the 3-to-4 bus coding scheme. The newly proposed crosstalk-noise-aware bus coding scheme therefore provides a robust and low power solution to on-chip data transmission in complex nanoscale CMOS integrated circuits. 
