Introduction
Power dissipation is one of the most important design specifications for low-power System-on-a-Chip (SoC) designs. Load capacitance (C L ) and coupling capacitance (C C ) are two major sources of dynamic power dissipation and wire propagation delay. If coupling capacitance increases, dynamic power consumption and wire propagation delay on a bus will also increase. Coupling capacitance also depends upon data-dependent transitions, and the increase or decrease of crosstalk effects is dependent on the relative switching between adjacent bus wires (Vittal and Marek-Sadowska 1997 , Macchiarulo et al. 2002 , Kaul et al. 2004 , Ghoneima et al. 2006 , Lyuh and Kim 2006 . Thus, the overall performance of the system may be degraded. Consequently, the crosstalk effect is an important factor during the design process.
There are many approaches to reducing inter-wire capacitances (Cong 2001, Elgamel and Bayoumi 2003) . The first approach applies bus coding schemes to reduce the dynamic power consumption. It mainly reduces switching activity on a bus. Such coding methods are applied to two different bus modes, i.e., data buses and address buses. In the previous well-known bus coding techniques , Benini et al. 1997 , 2001a , 2001b , Ramprasad et al. 1999 , Fornaciari et al. 2000 , the INC-XOR (Ramprasad et al. 1999) , T0 (Benini et al. 1997) , and T0-XOR (Fornaciari et al. 2000) coding schemes are designed for instruction address buses because the instruction address is predictive. The hihrTS (Youngsoo et al. 2001a) , bus-invert (BI) , and partial bus-invert (PBI) coding schemes are designed for data buses which are generally random in values. In Micea and Wayne (1995) , the number of transmitting transitions does not exceed half of the bus width. When the number of transmitting transitions is more than half of the bus width, the output word is inverted and the control line is set to 'High.' Otherwise, the original data are transmitted and the control line is set to 'Low.' The PBI method is an extension of the BI method, which then partitions bus lines into two parts, and each part is coded by the BI method. It enhances the advantage of the BI method, but it requires redundant control lines, but, crosstalk effects are not considered in Ramprasad et al. (1999) , Benini et al. (1997) , Fornaciari et al. (2000) , Youngsoo et al. (2001a) , Micea and Wayne (1995) , and Youngsoo et al. (1998 Youngsoo et al. ( , 2001b . In recent years, system engineers have aimed at the bus coding methods for crosstalk effect reduction. These methods are separated into two modes, including 2-and 3-bit conditions. For a 2-bit bus, the methods in Ayoub and Orailoglu (2005) , Lyuh and Kim (2006) , Huang and Hwang (2004) , Kim et al. (2000) , Macchiarulo et al. (2001) , and Zhan et al. (2002) are applied to encode data buses. Similarly, there are many well-known bus coding technologies (Hirose and Yasuura 2000 , Macchiarulo et al. 2002 , Tu et al. 2006 applied to 3-bit buses. For example, Hirose and Yasuura (2000) apply repeater technology to reduce total bus delay. Khan et al. (2006) use the alternate complement value and shield wires to reduce crosstalk effects. Macchiarulo et al. (2002) use wire placement to reduce crosstalk effects and it aims at address buses.
The second approach widens the pitch between bus lines, but the total area of bus systems may grow too large. The third approach uses place and route tools to avoid the routing of bus lines side by side. In SoC systems, the interconnect complexity and routing time cause the difficulty in minimizing coupling capacitances. The fourth approach inserts a shielding line (V DD /Ground) between two adjacent signal lines (Vittal and Marek-Sadowska 1997 , Kaul et al. 2004 , Ghoneima et al. 2006 . Thus, the fourth technology induces a larger bus width. In SoC systems, the crosstalk effect is an important issue to influence dynamic power dissipation and wire propagation delay (Lajolo 2001, Huang and Hwang 2004) . Yu and Lin (2006) develop a comprehensive study on the viability of on-chip bus encoding methods from the perspectives of energy, delay, and peak noise reduction. Lin (2008) derives a theoretical analysis of the BI coding for coupling reduction. Lin's discoveries include a set of closed-form formulas to compute the number of couplings per bus transfer for a non-partitioned bus versus a partitioned bus.
In this article, we mainly aim at the reduction of crosstalk effects and transition activity on a bus. Thus, the bus coding method is applied to reduce the dynamic power consumption and wire propagation delay in signal transitions. Although these coding methods in Lyuh and Kim (2006) , Khan et al. (2006) , Macchiarulo et al. (2002) , Huang and Hwang (2004) , , Macchiarulo et al. (2001) , Duan et al. (2001) , and Zhan et al. (2002) focus on the reduction of crosstalk effects, these techniques have many shortcomings. For instance, they must increase redundant control lines and hardware areas, and cannot reduce coupling and switching activities at the same time. Therefore, we present two new low-power bus coding methods to enhance previous coding methods. First, our proposed methods are not complex. Second, they simultaneously reduce coupling and switching activities. Third, the proposed methods use a smaller number of redundant shielding lines to obtain the same efficiency as the previous methods. Finally, their coding efficiencies are better than the other coding methods. By testing various random data streams, the experimental results show that the first proposed coding scheme reduces coupling activity by 25.7-36.4% and switching activity by 4.5-8.5%. It reduces total power dissipation more than the other bus coding methods when the load capacitance is more than 0.3 pF/bit with UMC 0.09-mm CMOS technology. Moreover, the second proposed coding scheme reduces the coupling activity by 28.4-38.4% and switching activity by 10.1-14%. It reduces total power dissipation more than the other bus coding methods when the load capacitance is more than 0.2 pF/bit with UMC 0.09-mm CMOS technology. For a 0.8 pF/bit load capacitance, both the proposed methods reduce total power consumption by 19.3-30.9% when systems are implemented with UMC 0.09-mm CMOS technology. Similarly, both the proposed methods also reduce total power consumption more than the other bus coding methods with TSMC 0.18-mm CMOS technology. Meanwhile, the crosstalk detector bus invert (CDBI) and the enhanced CDBI (ECDBI) schemes reduce total propagation delay up to 31.8% and 34.2%, respectively, on a 32-bit data bus.
The rest of this article is described as follows. In Section 2, we review and define the power expression, the bus models, and the dynamic power consumption of three adjacent bus wires. In Section 3, we present two new bus coding techniques and describe their advantages, and the proposed methods greatly decrease dynamic power dissipation and wire propagation delay. The simulation and implementation results and comparisons of different bus coding methods are shown in Section 4. Finally, we state a conclusion.
124
C.-H. Fang and C.-P. Fan
Power expression and bus models for deep submicron buses
After the bus coding skill is applied to encode the bus data, the dynamic power consumption on a 3-bit bus with the encoded data is calculated as follows:
where C L is a load capacitance, C C a coupling capacitance, V DD a supplying voltage, f a clock frequency, and a capacitance ratio which is defined as follows.
The value is dependent on technologies. For example, the interconnect width, the pitch, the aspect ratio, and the dielectric thickness will affect the value. In Equation (1), T S indicates the average values of switching activity for load capacitance, and then T C the average values of coupling activity for coupling capacitance. However, T S value lies between 0 and 1. The un-coded T S value is equivalent to 1. T C value also lies between 0 and 1. Similarly, the un-coded T C value equals 1. The activity defines the variation among three adjacent bit lines. The signal transitions on 3-bit lines are classified into five types, as shown in Table 1 . We apply Equation (2) as follows (Hirose and Yasuura 2000) :
where DV 2 is the voltage variation of the center wire, DV 1 and DV 3 the voltage variations of the adjacent wires, E the power supply voltage, equaling the rail-to-rail signal voltage in CMOS circuits, and C eff the coupling capacitance variation. For example, the Type-1 crosstalk capacitance equals C C , the Type-2 crosstalk capacitance equals 2C C , etc. By the above-mentioned definition, the dynamic power consumption on un-coded buses is defined as follows:
Equation (4) gives the power reduction ratio as follows.
Power reduction ratio
where P D,un-coded in Equation (3) is the dynamic power consumption on a bus with un-coded data, and P D,coded in Equation (1) is the dynamic power consumption on a bus with encoded data. Following some of the literature (Vittal and MarekSadowska 1997 , Duan et al. 2001 , Kaul et al. 2004 , Ayoub and Orailoglu 2005 , Ghoneima et al. 2006 , Lyuh and Kim 2006 , Tu et al. 2006 , we explain the interconnecting bus characteristics of crosstalk effects by a simple case of three adjacent wires shown in Figure 1 . We only consider coupling and load capacitances, and neglect parasitical capacitances. The dynamic power dissipation generated by the load capacitance is proportional to the number of signal transitions of bus lines. On the other hand, the dynamic power dissipation generated by the coupling capacitance depends on the signal transitions between coupled interconnects. There are five kinds of crosstalk effects shown in Table 1 . We use Equation (2) to enumerate all kinds of serious crosstalk effects. Type-1 crosstalk produces less serious crosstalk effects than the other types of crosstalk, and Type-4 crosstalk produces the worst crosstalk effects of all crosstalk types. Thus, we use the Type-1 crosstalk effect as a standard to define the other crosstalk effects. Serious crosstalk effects include Type-2, Type-3, and Type-4 crosstalk.
Proposed low power bus coding methods
In Section 2, different kinds of crosstalk effects are described. Then, we present two new low-complexity bus coding techniques specially devised for three adjacent wires. The proposed methods can not only improve system reliability but also reduce total power consumption and total propagation delay. Type-3 Type-4
CDBI method
Now we propose the first crosstalk detector coding method, called the CDBI method, and it aims at crosstalk effects to reduce dynamic power dissipation and propagation delay by reducing coupling and switching activities. The n-bit bus is separated into clusters of 4-bit data. Each 4-bit cluster is, respectively, encoded according to the CDBI method by the relationship of the original data and last encoding data (i.e., b(t) and B(t À 1)), and then an extra control bit (i.e., the INV line) is obtained, and the unique decoding ability is achieved. Figure 2 shows 4-bit bus encoder circuits. The encoder circuits are composed of a crosstalk detector, a selector, and a NOT gate. First, the crosstalk detector judges whether the b(t) value (i.e., the bus value to be sent presently on the bus at time t) will cause crosstalk effects about the previous bus state B(t À 1) (i.e., the encoded bus value to be sent last on bus lines at time t À 1). Second, the crosstalk detector generates a control signal and the INV line is set to the OC signal. Then, the encoder circuit uses the OC signal to select the encoding value. Therefore, the B(t) value does not produce the worst crosstalk effects. The internal circuits of the crosstalk detector are composed of basic logic gates shown in Figure 3 (a). We combine the function of the selector circuit and a NOT gate, and then we get optimum circuits shown in Figure 3 (b). Thus, the CDBI method is low complexity in implementation, and it uses the crosstalk detector for a low-power design and the minimization of the worst crosstalk effects.
The algorithm of the CDBI method is described as follows. The crosstalk detector first receives the b(t) and B(t À 1) values, which are reciprocally compared. The purpose of the crosstalk detector is to observe whether adjacent wires generate transition 01 ! 10 or 10 ! 01. If a transition happens, a three-input AND gate may obtain the 'High' signal because adjacent wires produce crosstalk effects. In addition, we can group the crosstalk detector into three contrast circuits. Each contrast circuit detects crosstalk effects between wires 1 and 2, between wires 2 and 3, or between wires 3 and 4. After 3 three-input AND gates are operated, the contrast circuits produce 3-bit outputs, and these outputs pass through a three-input OR gate to acquire the OC signal, which is exported from the crosstalk detector. The purpose of a three-input OR gate is to determine whether two adjacent wires cause the transition 01 ! 10 or 10 ! 01. If a three-input OR gate is set to 'High,' then the OC signal is also set to 'High.' Therefore, the B(t) value (i.e., the encoded bus value to be sent on bus lines at time t) acquires the bðtÞ value and INV is set to 'High' when the OC signal is set to 'High'; otherwise, the B(t) value can acquire the b(t) value and INV is set to 'Low' when the OC signal is set to 'Low.' The proposed CDBI algorithm is shown as follows:
:
Each 4-bit encoded datum needs an additional control bit. Therefore, after the bus encoding is used, the n-bit bus is extended to n þ n/4 bits, i.e., the n-bit bus encoding has a wire redundancy of 25%. The CDBI method generates two complement values (either b(t) or bðtÞ), and then the OC signal is applied to choose between them, i.e., the OC signal is set to 1 if the b(t) value produces crosstalk effects; otherwise, it is set to 0. We observe that one of the two data greatly 
126
C.-H. Fang and C.-P. Fan reduces Type-4, Type-3, and Type-2 crosstalk to about the B(t À 1) value. In other words, the CDBI method turns Type-2, Type-3, and Type-4 crosstalk into Type-0 and Type-1 crosstalk. However, Type-1 and Type-0 crosstalk need not be considered.
ECDBI method
The CDBI method generates a slight amount of Type-4 crosstalk and greatly reduces Type-2 and Type-3 crosstalk from 4-bit clusters. However, it reduces switching activity slightly. For this reason, we propose the second bus coding method to improve the coding efficiency of the CDBI method. The second proposed bus coding method is called the ECDBI coding method. It aims at further reductions of crosstalk effects and switching activity. The architecture of the ECDBI method is shown in Figure 4 . It includes a switching detector and a two-input OR gate, different from the CDBI method. The internal circuits of the joint crosstalk/switching detector are composed of the basic logic gates shown in Figure 5 . The upper part performs the major function of the crosstalk detector, and the lower part the major function of the switching detector. The purpose of the switching detector is to reduce switching activity. The purpose of a two-input 
B2(t) B3(t)
A
Journal of the Chinese Institute of Engineers 127
OR gate is to judge whether the b(t) value produces crosstalk effects or exceeds half of the bus width. For this reason, the ECDBI method greatly reduces not only crosstalk effects but also switching activity. Thus, it can further minimize dynamic power dissipation. The algorithm of the ECDBI method is explained in detail as follows. First, the b(t) value is transmitted to the crosstalk detector, and at the same time it is also transmitted to the switching detector. The crosstalk detector calculates the Hamming distance value between the b(t) and B(t À 1) values. It detects whether the B(t À 1) and the b(t) values produce crosstalk effects. If a transition happens, the control signal can acquire an OC signal which is set to 'High,' and the B(t) value makes use of the OC signal to acquire the bðtÞ value, and then the INV signal is set to 'High'; otherwise, the OC signal is set to 'Low,' and then the B(t) value acquires the b(t) value and the INV signal is set to 'Low.' The proposed ECDBI algorithm is shown as follows:
where the EBW signal is obtained from the output of the switching detector. The purpose of the crosstalk detector in the ECDBI method is equivalent to that in the CDBI method. Similarly, the switching detector calculates the Hamming distance value between the b(t) and B(t À 1) values. It judges whether the Hamming distance value exceeds half of the bus width. The EBW signal is set to 1 if the number of transmitting transitions exceeds half of the bus width, and the B(t) value exploits the EBW signal to acquire the bðtÞ value, and then the INV signal is set to 'High'; otherwise, the EBW signal is set to 0, and the B(t) value acquires the b(t) value, and then INV signal is set to 'Low.' The circuit of the switching detector in the ECDBI method equals that in the PBI method. Therefore, the ECDBI method possesses the advantages of the CDBI and PBI methods simultaneously. Except for the switching detector circuit, the other functions of the ECDBI method equal those of the CDBI method. Therefore, we omit the discussion of the same functions.
The bus coding technology must provide both bus encoding and decoding functions; otherwise, it cannot be used for system applications. Figure 6 shows 4-bit bus decoder circuits. The decoder circuit is realized with low complexity. The internal circuits of the decoder consist of the selector and a NOT gate. The decoder receives the signals, which are the B(t) value, the BðtÞ value, and the INV line. When the B(t) value is transmitted to the decoder, the decoder uses the INV signal to decode the original bus data, b(t). If the INV line is set to 'High' value, the b(t) value equals the BðtÞ value. Otherwise, when the INV line is set to 'Low,' the b(t) value equals the B(t) value. In addition, the CDBI method also uses the same decoding circuit as the ECDBI method. Furthermore, we simplify the decoding circuit to obtain the optimum circuit, which is the same as the one given in Figure 3(b) . The proposed decoding scheme is described as follows: if ðINV ¼¼ 1Þ bðtÞ ¼ BðtÞ else bðtÞ ¼ BðtÞ:
Figure 7(a) shows an 8-bit un-coded bus, and the scheme does not apply any shielding wires or coding technology. The un-coded bus will generate many severe crosstalk patterns. If we directly increase the bus width and spacing for un-coded buses, it will result in some drawbacks. According to the methods in Weste and Harris (2005) , Cong (2001) , and Elgamel and Bayoumi (2003) , there are a lot of factors affecting load capacitances and coupling capacitances which are slightly increased when the bus width triples. Load capacitance is slightly increased and coupling capacitance is slightly reduced when the bus spacing doubles. Although the bus width and spacing are increased, these procedures still produce the worst crosstalk patterns, of at least 2C C .
Thus, shielding wires and coding technologies are used to reduce the worst crosstalk pattern. The coding methods apply the INV line to reduce the worst crosstalk pattern among 4-bit clusters, but the worst crosstalk pattern in these coding methods may happen in the inter-cluster region or between redundant bits and clusters when the bus is divided into n/4 clusters, and the bus is also expanded spatially to carry redundant bits. Therefore, there are some coding methods using shielding wires. Figure 7 (b) shows an 8-bit bus using the method in Khan et al. (2006) , which simultaneously uses coding technology and three redundant shielding lines to reduce the worst crosstalk pattern. The 8-bit bus width is extended to 13-bit bus width, i.e., 8 þ 2 þ 3 ¼ 13. In Figure 7 (c), the proposed methods only use a redundant shielding line to reduce the worst crosstalk pattern between intercluster regions. It is noted that the interconnect routing Journal of the Chinese Institute of Engineers 129 scheme in Figure 7 (c) is applied to all compared coding methods in this article for equitable comparisons. We find that a 32-bit width bus can be encoded with 47 wires by our proposed methods, compared with 55 wires as mentioned by existing papers (Khan et al. 2006, Lyuh and Kim 2006) . For this reason, our proposed coding methods use a smaller number of redundant shielding lines to obtain the same efficiency as the previous methods.
Experimental results and comparisons
In this section, we evaluate and compare all kinds of bus coding performance, such as switching activity, coupling activity, areas, power consumption, and the total propagation delay of the codec. To simulate different data bus coding techniques, we adopt 8-bit, 16-bit, and 32-bit data bus lines to compare their dynamic power dissipation and wire propagation delay. For tests, the test steams include random data, image files, PPT files, MP3 files, and PDF files.
Power reduction results and comparisons
In the simulation with random patterns, the BI method performs at better self-switching reductions because it is suitable for the data buses. It mainly uses an extra control line to minimize switching activity, and then the dynamic power dissipation is reduced greatly. Simultaneously, it also slightly reduces coupling activity. According to the papers in Ayoub and Orailoglu (2003), coupling activity is more serious than switching activity. To reduce dynamic power dissipations, the PBI method is better than the BI coding method because it uses smaller bus segmentation to increase coding efficiency. However, it needs a lot of extra lines. For instance, the PBI method in an 8-bit bus calls for two extra control lines. The simple-odd/even bus invert (OEBI) method (Zhan et al. 2002 ) is equivalent to the PBI method. Therefore, it has the same efficiency and bus lines as the PBI method. The calculated-OEBI method (Zhan et al. 2002) uses complex circuits to reduce the worst crosstalk pattern. Although it can greatly reduce coupling activity and slightly reduce switching activity at the same time, it increases hardware cost a lot. The method in Duan et al. (2001) uses a codebook to reduce the worst crosstalk pattern. Although it can eliminate the Type-4 and Type-3 crosstalk, it greatly increases the Type-0, Type-1, and Type-2 crosstalk. Besides, it also results in a , and our proposed methods use the same bus lines. In other words, the original data bus is changed from 8-bit to 11-bit width, from 16-bit to 23-bit width, or from 32-bit to 47-bit width. We calculate the worst crosstalk couplings and switching activities from data on buses at the same time. The different bus coding efficiencies are shown in Figures 8 and 9 . Figure 8 shows the reduction of coupling and switching activities in 8-bit image data, and Figure 9 shows the reduction of coupling and switching activities in 8-bit random data. In Figure 8 , the PBI method reduces more switching activity than the other coding methods do. To reduce Type-2 crosstalk, the method in Duan et al. (2001) is worse than the other methods. To reduce Type-3 crosstalk, the methods in Khan et al. (2006) , Duan et al. (2001) , Zhan et al. (2002) , and our proposed methods are better than the other coding methods. To reduce Type-4 crosstalk, the methods in Micea and Wayne (1995) , Kim et al. (2000) and Zhan et al. (2002) are worse than the other methods. To produce Type-1 crosstalk, Khan et al. (2006) and our proposed methods are better than the other methods. Although Type-1 crosstalk is increased, the crosstalk provides less deleterious crosstalk effects than the other coupling modes. To produce Type-0 crosstalk, the proposed ECDBI method and the methods in Youngsoo et al. (1998 2001b , Duan et al. (2001 , and Zhan et al. (2002) are better than the other methods. Although Type-0 crosstalk is greatly increased, the occurrence does not cause dynamic power dissipation. The Type-0 and Type-1 crosstalk have been mentioned in Section 2. In Figure 9 , all cases of the coding methods are similar to the above-mentioned results in Figure 8 .
In Figure 8 , the proposed CDBI method reduces Type-2 crosstalk by 59%, Type-3 by 73.8%, and Type-4 by 95%, but Type-1 and Type-0 are increased by 31% and 13.5%, respectively. The reduction of T S is 7.8% and that of T C is 25.9%. Furthermore, the proposed ECDBI method reduces Type-2 crosstalk by 69.5%, Type-3 by 72.1%, and Type-4 by 95.2%, but the Type-1 and Type-0 are increased by 28.9% and 51.4%, respectively. The reduction of T S is 14% and that of T C is 28.5%. Thus, the proposed methods efficiently reduce coupling and switching activities at the same time. By testing various random data streams, the experimental results show that the proposed CDBI method reduces coupling activity by 25.7-36.4% and switching activity by 4.5-8.5%. Moreover, the ECDBI method reduces coupling activity by 28.4-38.4% and switching activity by 10.1-14%.
According to the 8-bit statistics in Figures 8 and 9 , the comparisons in the 16-bit and 32-bit bus modes have the same coding efficiencies as shown in Figures 8  and 9 ; so, the comparisons in 16-bit and 32-bit bus modes are omitted. Journal of the Chinese Institute of Engineers 131 Table 2 shows the comparison of dynamic power parameters in the coupling and switching activities with various bit widths for each benchmark. The sizes of the test files, i.e., random data, image files, PPT files, MP3 files, and PDF files, are 4.78, 1.63, 8.1, 2.78, and 0.63 MB, respectively. To compare our methods with the un-coded data bus, we simulate five different test files (i.e., the random data, image files, PPT files, MP3 files, and PDF files) to calculate the averages of coupling activity and switching activity, and the parameter. The parameter is set to 10, which is in agreement with previous papers (Vittal and Marek-Sadowska 1997 , Hirose and Yasuura 2000 , Duan et al. 2001 , Victor and Keutzer 2001 , Ayoub and Orailoglu 2005 , Ghoneima et al. 2006 , Lyuh and Kim 2006 . In this article, the BI (Micea 1995) PBI (Youngsoo , 2001 Simple-OEBI ( parameter ranges from 1 to 5. By Equation (1), the dynamic power dissipation from coupling activities is larger than that from switching activities when the parameter becomes larger. The method in Micea and Wayne (1995) is an exception because it mainly aims at switching activities. The proposed CDBI and the methods in Khan et al. (2006) , Micea and Wayne (1995) , Youngsoo et al. (1998 Youngsoo et al. ( , 2001b , Kim et al. (2000) , Duan et al. (2001) and Zhan et al. (2002) are worse than the methods in Youngsoo et al. (1998 Youngsoo et al. ( , 2001b ) when the parameter is 1. The coding efficiency of the proposed CDBI method is equivalent to that of the method in Khan et al. (2006) . The coding efficiency is less than 1% when the proposed CDBI method is compared with the method in Khan et al. (2006) . The CDBI method is better than the methods in Micea and Wayne (1995) , Youngsoo et al. (1998 Youngsoo et al. ( , 2001b , Kim et al. (2000) , Duan et al. (2001) and Zhan et al. (2002) when the parameter exceeds 2. Thus, it reduces coupling and switching activities by from 20.3% to 34.1% in comparison with the un-coded method.
To reduce coupling and switching activities, the proposed ECDBI method performs with better coding efficiency than the other methods with various bit widths for each benchmark. It reduces coupling and switching activities by from 23.8% to 36.4%, compared with the un-coded method. Moreover, it reduces coupling and switching activities by from 1.1% to 20.9%, compared with other coding methods, and it also improves the efficiency of the CDBI method by from 2.3% to 3.5% shown in Table 2 .
Every data bus coding method may incur some overheads, including area and power. In Table 3 , we estimate the actual overheads of different bus encoder and decoder circuits with various bit widths. We use the Verilog HDL to model and simulate different bus encoder and decoder circuits, and we also simplify all coding hardware to obtain the optimum circuits. Then, we use the Design Vision TM logic synthesizer with TSMC CMOS 0.18-mm and UMC 0.09-mm standard cell technologies to evaluate the power consumption and the area of bus codec circuits. Furthermore, the operation frequency of different bus codecs is set to 125 MHz, and the operating voltages are 1.8 and 1 Volts for the 0.18 and 0.09-mm technologies, respectively. By reading Table 3 , the chip area of the calculated-OEBI circuit is larger than the other coding methods. Thus, it causes the most serious hardware overheads. The proposed CDBI method has the same capability as the method in Khan et al. (2006) but the hardware cost of the CDBI method is less than that of the method in Khan et al. (2006) . The area of the ECDBI codec is a little larger than that of the CDBI codec. These coding methods with various bit widths are selected for benchmarks, and their power reduction efficiencies for both coupling and switching activities are shown in Figure 10 , by setting the parameter to three for estimations of dynamic power dissipation. We use Equation (3) as the un-coded base power to evaluate the percentage of power reduction efficiency with various bit widths by each benchmark, and we also use Equations (1), (3), and (4) to calculate the reduction of power dissipation. The proposed CDBI method reduces dynamic power consumption more than the methods in Micea and Wayne (1995) , Kim et al. (2000) and Zhan et al. (2002) do, but it is not better than the methods in Khan et al. (2006) , Youngsoo et al. (1998 Youngsoo et al. ( , 2001b and Zhan et al. (2002) . It reduces dynamic power consumption by from 21% to 29.2% with various bit widths for each benchmark. Then, the proposed ECDBI method reduces dynamic power consumption more than the other methods in each benchmark. It also reduces dynamic power consumption by from 24.6% to 32.2% with various bit widths for each benchmark. Thus, the ECDBI method reduces dynamic power by 1.5-19.4% when compared with the other coding methods. The power consumption of the ECDBI codec is also a little larger than that of the CDBI codec.
The power consumption at the bus encoder is defined as P enc , and that at the bus decoder is defined as P dec . In order to estimate the total power dissipation, the additional power dissipation on the bus encoder and decoder sides must be considered. In Equation (1), P D,coded is generated by the reduced switching and the coupling activities after bus coding methods are used. For various widths, Figure 11 shows the comparisons of different bus coding methods for the total power ratio (including P enc , P D,coded , and P dec ) versus a load capacitance per bit line with TSMC 0.18-mm CMOS technology. When the 0.18-mm technology is applied, the CBI, BI, PBI, simple-OEBI, calculated-OEBI, Khan and Duan coding methods reduce total power consumption by 6.4-8. 8%, 11.1-11.2%, 17.5-27.1%, 15.8-25.7%, 11.5-14.7%, 17-25.5%, and 15.3-25 .4% with various bit widths at a 0.8 pF/bit load capacitance, respectively. The proposed CDBI and ECDBI methods reduce total power consumption by 17.5-26.6% and 20.4-29.1% with various bit widths at a 0.8 pF/bit load capacitance, respectively. Figure 12 shows that all coding methods require small power consumption when the technology is scaled down. When 0.09-mm technology is applied, the power consumption of the bus encoder and decoder shown in Table 3 CBI BI (Micea 1995) PBI (Youngsoo , 2001 Simple-OEBI ( 9-10.3%, 13.1-13.3%, 19.1-28.7%, 17.4-27.3%, 12.8-21.9%, 18.2-27.7%, and 16.9-26 .2% with various bit widths at a 0.8 pF/bit load capacitance, respectively. The proposed CDBI and ECDBI methods reduce total power use by 19.3-28.5% and 22.2-30.9% with various bit widths at a 0.8 pF/bit load capacitance, respectively.
Delay reduction results and comparisons
Section 4.1 discusses the improvement of power reduction by diminishing the worst crosstalk effects.
Since the proposed bus codec is inserted, the performance influence includes the wire propagation delay reduction by diminishing the worst crosstalk effects and the extra circuit delays generated by the encoder/ BI (Micea 1995) PBI (Youngsoo , 2001 Simple-OEBI ( BI (Micea 1995) PBI (Youngsoo , 2001 Simple-OEBI ( BI (Micea 1995) PBI (Youngsoo , 2001 Simple-OEBI ( decoder. In Elgamel and Bayoumi (2003) , the crosstalk effect is only noticeable in the sub-micron technology (i.e., 0.25 um or below), and we simulate the wire propagation delay and the delay of bus codec with TSMC CMOS 0.18-mm technology by HSPICE. In our simulations, the circuit model used for the interconnect wire is the RC circuit model. The used circuit model for the gates is the MOSIS MOS circuit model. We suppose that all drivers and receivers have a uniform size, and all signal wires and routes have uniform length, width, and spacing. In the 0.18-mm technology, the length, width, thickness, and spacing of the BI (Micea 1995) PBI (Youngsoo , 2001 Simple-OEBI (Zhan 2002 BI (Micea 1995) PBI (Youngsoo , 2001 Simple-OEBI (Zhan 2002 BI (Micea 1995) PBI (Youngsoo , 2001 Simple-OEBI (Zhan 2002 
136
C.-H. Fang and C.-P. Fan un-coded signal wire are 1300, 0.99, 0.53, and 1.37-mm, respectively. However, the length, width, thickness, and spacing of the coded signal wire are 1300, 0.72, 0.53, and 0.96-mm, respectively. We suppose that synchronous latches are located at the transmitter side. Thus, all signals switch at the same time on buses. The case of a 32-bit bus is adopted as an example for analyses. Table 4 shows the propagation delay of wire and different bus codecs. The wire propagation delay is reduced by exploiting bus coding methods. The methods in Kim et al. (2000) , Micea and Wayne (1995) , Youngsoo et al. (1998 Youngsoo et al. ( , 2001b , Zhan et al. (2002) , Khan et al. (2006) and Duan et al. (2001) , and the proposed CDBI and ECDBI methods codec reduce wire propagation delays by 24.3%, 29.7%, 51.4%, 42.2%, 59%, 48.7%, 59.5%, and 62.2%, respectively. The total propagation delay is the summary of the wire propagation delay without the worst crosstalk patterns, the encoder delay, and the decoder delay. The total propagation delay of using the proposed methods is smaller than that of the other coding methods. The reduction ratio of delays is evaluated by comparing total propagation delay among different bus coding methods with the un-coded value. The proposed CDBI and ECDBI methods reduce total propagation delays by 31.8% and 34.2%, respectively. The proposed CDBI and ECDBI methods are more efficient in reducing the total propagation delay, compared with the methods in Kim et al. (2000) , Micea and Wayne (1995) , Youngsoo et al. (1998 Youngsoo et al. ( , 2001b , Zhan et al. (2002) , Khan et al. (2006) and Duan et al. (2001) .
Conclusion
Two novel bus-coding methods are proposed to reduce dynamic power dissipation and wire propagation delay on buses efficiently. The proposed bus coding methods reduce dynamic power dissipation and wire propagation delay more than previous bus coding schemes do. The proposed CDBI coding method has the same coding efficiency as the Khan et al. (2006) method but the chip area of the CDBI method is smaller than that of Khan's method. The experimental results show that the CDBI coding method reduces coupling activity by from 25.7% to 36.4% and switching activity by from 4.5% to 8.5% on 8-bit to 32-bit data buses, respectively. It reduces total power dissipation more than the other bus coding methods when a load capacitance is more than 0.3 pF/bit with UMC 0.09-mm CMOS technology. Furthermore, the proposed ECDBI coding method reduces coupling activity by from 28.4% to 38.4% and switching activity by from 10.1% to 14% on 8-bit to 32-bit data buses, respectively. It reduces total power dissipation more than the other bus coding methods when a load capacitance is more than 0.2 pF/bit with UMC 0.09-mm CMOS technology. For a 0.8 pF/bit load capacitance, both the proposed methods reduce total power use by 19.3-30.9% when systems are implemented with UMC 0.09-mm CMOS technology. Similarly, both the proposed methods also reduce total power dissipation more than the other bus coding methods with TSMC 0.18-mm CMOS technology. Meanwhile, the CDBI and the ECDBI schemes reduce total propagation delay by up to 31.8% and 34.2%, respectively, on 32-bit data buses. Nomenclature b(t) the original data bðtÞ the complement value of b(t) B(t À 1) the encoded bus value to be sent on bus lines at time tÀ1 B(t) the encoded bus value to be sent on bus lines at time t BðtÞ the complement value of B(t) C C the coupling capacitance C L the load capacitance C eff the coupling capacitance variation.
E the power supply voltage, equaling the rail-to-rail signal voltage in CMOS circuits f clock frequency INV an extra control bit P D,un-coded the dynamic power consumption on a bus with uncoded data P D,coded the dynamic power consumption on a bus with encoded data P enc the power consumption at the bus encoder P dec the power consumption at the bus decoder T S the average values of switching activity for load capacitance T C the average values of coupling activity for coupling capacitance V DD the supplying voltage the capacitance ratio DV 1 the voltage variations of the adjacent wires DV 2 the voltage variation of the center wire DV 3 the voltage variations of the adjacent wires
