Abstract-A new bus-invert coding circuit, called Two-bit Bus-Invert Coding (TBIC) is presented. TBIC partitions a bus into a set of two-bit sub-buses, and applies the bus-invert (BI) algorithm to each subbus. Unlike ordinary BI circuits using invert-lines, TBIC does not use an invert-line, so that it sends coding information through a bus-line. To transmit 3-bit information with 2 bus-lines, TBIC allows one bus-line to have a mid-level state, called M-state. TBIC increases the performance of BI algorithm, by suppressing the generation of overhead transitions. TBIC reduces bus transitions by about 45.7%, which is 83% greater than the maximum achievable performance of ordinary BI with invert-lines.
I. INTRODUCTION
Low-power design is one of the hottest issues in VLSI design, especially for VLSI chips for mobile devices. Although the low-power design of circuits and functional modules is essential to reduce the power consumption of VLSI chips, the design of low-power bus is not less important than that of the circuits and modules, because a substantial part of the total power is dissipated by the buses. Efforts for low power bus design are carried out, either to decrease the dynamic power per activation, or to reduce the number of bus-activations.
Bus-Invert (BI) [1] coding is one of the well-known techniques that reduce the number of bus-transitions. BI is a simple coding that reduces the transitions of buslines in the following algorithm; If sending a datum activates more than half of the bus-lines, BI transmits its complement. The coding information, called inv-bit which indicates the inversion state of the transmitted data, is transmitted simultaneously with the datum.
Because of its simplicity and usability, many enhanced BI algorithms [2] [3] [4] [5] [6] [7] have been developed. Although many variations of BI algorithm have been presented, almost all of these algorithms have used an auxiliary line, called invert-line, to send an inv-bit. However, the invertline has two major drawbacks. The first is the increase of bus-areas due to the additional line. The second is the performance degradation due to its transitions.
Some activation of the invert-lines or bus-lines occurs in sending the inv. These activations are the overhead of BI coding that should be paid to reduce the transitions of the bus-lines. However, this overhead transition (OT) is the major factor reducing the performance of BI circuits. It is known that the invert-line generates many OTs, so that it significantly degrades the performance of BI.
To remove the invert-lines in implementation of the BI circuit, Selectively Activated Flip-Driver (SAFD) [8, 9] sends the inv through the bus by using a special busdriver called flip-driver. SAFD increases the performance by effectively suppressing the generation of OTs, so that it can reduce bus transitions by 35%.
According to a theoretical analysis [9, 10] , BIC can reduce bus transitions by up to 50% for independent data, if no OT is generated. The ordinary BIC with invert-lines, however, can reduce transitions by a maximum of 25% because of OTs generated by invert-lines.
A new BI implementation scheme, called Two-bit BusInvert Coding (TBIC) is presented in this paper. TBIC divides an n-bit bus into n/2 sub-buses of width 2, and BI coding is independently applied to each sub-bus. TBIC transmits inv through a bus-line to avoid the increase of bus-width. Furthermore, it is devised to generate OTs that are as small as possible. 
II. TWO-BIT BUS-INVERT CODING WITH A MID-LEVEL STATE BUS-LINE
According to the theoretical performance analysis [9] , in general, the performance of BIC decreases with the increase of bus-width. Therefore, a bus with large buswidth needs to be partitioned into several sub-buses with narrower bus-width. For the transmission of independent data, the maximum achievable reduction ratio of BIC is 50%, when a bus is partitioned into a set of two-bit subbuses, on the condition that the coding circuit does not generate OTs [9] . When a bus is partitioned into two-bit sub-buses, the ordinary BIC which uses the invert-line can also get its maximum performance. However, it requires 50% increase of bus-width, and can only get 25% of reduction ratio which is only half of the maximum achievable value.
The Two-Bit Bus-Invert Coding (TBIC) with a midlevel state bus-line scheme is developed to increase the reduction ratio as much as possible, and to get rid of the problems of the invert-line. To maximize the performance of TBIC, TBIC deals with a two-bit bus so that it also partitions a bus into a set of two-bit sub-buses, and independently applies TBIC algorithms to each subbus. To avoid the increase of buswidth, TBIC does not use the invert-lines. Instead, it uses one of the bus-lines to send the inv-bit. In addition, TBIC is intended to minimize the overhead transitions of the bus-lines in sending the coding information. For this purpose, TBIC introduces a mid-level state to the line carrying inv-bit. By these two bus-lines, the TBIC can send and receive the three bits of information, i.e., two data bits, and one bit of coding information (inv), as follows.
Encoder Circuits
Because only two bits of a datum are involved in TBIC, the decision logic is simple: inverted transmission occurs only when both of the two bits are different from the current values of the corresponding bus-lines. The inversion information, inv, should be sent through the two bus-lines in the same cycle. Eight different states are required to transmit three bits, but only six stable states are possible with an N-line and an M-line. To get eight different states, TBIC uses the history (sequence) of two bus-lines. Fig. 2 (a) shows the encoder circuit of TBIC. It has a 3-bit register, and the entries of the register are named R0, R1, and RM. R0 and R1 store the values to be transmitted through the bus-lines B0 and B1 (let B0 be M-line, and B1 be N-line). RM is used to control the transmission value of M-line. If RM=1, the M-line transits to M-state, otherwise, the value in R0 is transmitted through the M-line. Table 1 shows the truth table for inv, and the next values of the registers. The truth table is designed to minimize the transition of bus-lines. According to the table, no more than one bus-line changes at any transmission cycle. The inv, and the next values of the register are determined by the following logic functions:
shows a simple example circuit for the busdriver. M-line is driven by a normal bus-driver when RM=0, while it is driven by Mid-level generator circuit when RM=1. Fig. 3(a) shows the decoder circuit for TBIC. The decoder also has a 3-bit register, of which the entries are R0, R1, and RM. Initially, the RM is reset to 0, and the values of M-line (B0) and N-line (B1) are stored at R0, and R1 of the register, respectively. The inversion state of the received data is decided by the values of the bus-lines and the register.
Decoder Circuits
At first, the level detector checks the voltage level of M-line to determine the value of M and B0. If the M-line is at mid-level, M is set to 1, and B0 is set to the value of R0. Otherwise, M is reset to 0, and B0 becomes the value of the M-line.
The inversion state of the received bits is determined by the result of the level detector and the previous bus states stored in the register. If M=0, inv=0. When M=1, 
*B0: M-LINE, **B1: N-LINE inv=1 if either RM=0 or B1=R1. The logic function for inv is given by
The results of the level detector and the value of N-line are stored in the register for decoding in the next cycle. Fig. 3(b) is an example of the mid-level detector circuit corresponding to the mid-level generator in Fig.  2(b) . The transient point of the first inverter on the upper path is set lower than the mid-level voltage while that on the lower path is set higher than the voltage. Therefore, the outputs of the upper path and the lower path are different from each other when the M-line is at mid-level voltage, so that M becomes 1. If M-line is at 1 or 0 state, both the outputs have the same value so that M becomes 0. When M is 1, the value stored in R0 is used as the received bit.
The circuits in Figs. 2(b) and 3(b) are just example circuits to show the operation of the mid-level generator and level detector. For high speed applications, a faster mid-level generator, and/or a more sensitive level detector are required. As it will be described in the next section, the exact voltage level of M-stage is not critical in operation although it affects the power consumption of overhead transitions. Using this property, various pairs of a mid-level generator and a level-detector are possible. The mid-level generator circuit and corresponding leveldetector circuit in decoder should be designed together by considering the speed, power, size, etc.
III. DYNAMIC BUS POWER WITH TBIC
Because of the existence of M-state transitions, counting the number of overhead transitions of TBIC is more complicated than that of the ordinary BIC. For convenience, let us define the bus-transition as the transitions of bus-lines required to transmit data, and the overhead transition (OT) as the transitions required to transmit coding information. Since the inv is transmitted through M-line, all transitions of N-line are bustransition. The M-line can move between 0 and 1, 0 and M, and 1 and M. All transition between 0 and 1 is included in bus-transition, but it is not clear whether the transition between M and the other normal states (0 and 1) is bus-transition or OT.
To distinguish OT and bus-transition, let us compare the waveform of bus-lines between TBIC and the ideal BIC. Assume that no OTs happen in the ideal BIC, so that all transitions of the ideal BIC are bus-transitions. Fig. 4 shows the waveforms of bus-lines for the ideal BIC circuit and TBIC. As we can see in Fig. 4 , M-line shows quite a different waveform from B0 of the ideal BIC, while N-line and B1 have the same waveform.
For convenience, the transition of M-line is classified into three patterns: direct transition (DT), via-transition (VT), and round transition (RT). A DT is a transition between 0 and 1. Both VT and RT are composed of two transitions involving M-state. The first transition of VT and RT is a transition from a normal state (0 or 1) to M. If the second transition from M goes back to its originated state, it is classified as RT. If the second transition goes to the other normal state, it is classified as VT. For example, the combined transitions a and b in Fig.  4 form a VT, while m and n become a RT.
Every DT can find its matching transition in B0 of ideal BIC. For VT, it can find a matching transition in B0 of ideal BIC at the position of the second transition. The dynamic power dissipated by a VT is the same as the power dissipated by a DT. For example, the power consumed by a and b can be calculated by This is the same as the power dissipated in a DT. Note that the dynamic power for a VT is independent of the voltage level (V M ) of M-state. The DT and VT of M-line can be seen as bus-transition because the same amount of bus power required in the ideal BIC circuit.
As can be seen in Fig. 4 , however, there are no matching transitions in B0 of ideal BIC for RT. Therefore, RTs are OTs of TBIC. Let us denote
The dynamic bus power for a RT-0 such as m and n in Fig. 4 is 0 RT m n
Similarly, for 1→M→1 transition 2 1 ( )
The power for a RT depends on V M . If 1→M→1 transitions and 0→M→0 transitions happen in equal rate, the average power of RT is minimum when V M =V DD /2, and then ∆V MH = ∆V ML =V DD /2, and
The relation shows that the power dissipated in a RT is 1/2 of that of DT, which means that the effective number of OT is half of the number of RTs. This helps to increase the performance of TBIC. Fig. 5 shows the simulation result of the M-line driver circuit. The simulation is performed by HSPICE with IBM's "1.2V-0.13µm 8RF-LM" model parameters [11] . The mid-level voltage is in the range of 0.5~0.7V. Although it may slightly affect the power dissipation of OTs, it is not a serious problem in operation.
IV. EXPERIMENTS
Simulations are performed to estimate the performance of TBIC and to measure the amount of OTs occurring in TBIC. For the application to multimedia VLSI chips, some audio and video files are used in the experiments. The experiments are carried out with 9 different files. Three of them are random binary files generated by a random number generator. Other three are music files of the MP3 format: "For Elise", "Under the Sea", and "The Cup of Life". The other three are movie trailers of the MOV format: "Kung Fu Panda 2", Transformers 3", and "Water for Elephants".
The simulations count the number of transitions of bus-lines during the transfer of each file through 16-bit, 32-bit, 64-bit and 128-bit buses. The performance of TBIC is compared to that of the ordinary partitioned BI (PBI) scheme which uses invert-lines. Two PBIs are used; the first PBI (PBI-1) is partitioned into 8-bit subbuses which is usually used to get moderate performance with small increase of bus-width, and the second PBI (PBI-2) is partitioned into 2-bit sub-buses which can provide the maximum performance.
For these three BIC schemes, the number of transitions of bus lines, and the number of overhead transitions are obtained by simulations. Through the simulation, the following numbers are obtained. The simulations results are shown in Table 2 . The averaged values are used to simplify the table; the values in the columns of the bin, mp3, and mov represent the average of the three files of the same format for random binary, mp3, and mov, respectively. P B , P OT, and P T in Table 2 are the percentage of N B N OT and N T against N RAW , respectively. The reduction ratio R (=100-P T ) represents the percentage reduction of N RAW by the applied BIC scheme, so that it can be used as the performance of the scheme.
As we can see in Table 2 , there is no significant performance difference among the three data formats. PBI-1 reduces transitions by around 18%, while PBI-2 reduces transitions by around 25%. As expected, the performance of PBI-1 is poorer than that of PBI-2. Partitioning with the smaller bus-width can provide a higher reduction ratio. PBI-2 can reduce bus transitions by 25% at the expense of a 50% increase of bus width. The reduction ratio of TBIC is about 45.7%, which is almost triple that of PBI-1, and 83% greater than that of PBI-2.
The discrepancy of the reduction ratio comes from the difference of the P OT . Note that the P B of TBIC and PBI-2 are almost the same. Both schemes reduce bustransitions by about 50%. However, the P OT of PBI-2 is about 25% of N RAW , which is about half the number of the reduced transitions. Therefore, in PBI-2, about 50% of the performance is lost by the transitions of the invertlines.
The number of RTs in TBIC is about 8.3% of N RAW , and the effective number of OTs is about 4.2% when V M =1/2V DD . Theoretically, the maximum reduction ratio achievable by the BI algorithm is 50%. The reduction ratio of TBIC is about 45.7%, which is only 4.3% smaller than the theoretical maximum reduction ratio.
This shows that implementation that prevents generation of OT is very important in improving performance of algorithm.
V. CONCLUSIONS
A new bus-invert coding circuit called TBIC (Two-bit Bus-Invert Coding with a mid-level state bus-line) is presented in this paper. TBIC intends to remove the problems of the invert-line and to approach the maximum performance of the BI algorithm. To avoid the increase of bandwidth, TBIC removes the invert-line by transmitting the coding information through a bus-line. To send 3-bit by two buslines in a cycle, a mid-level state called M-state is added to the normal 1 and 0 states. The encoding and decoding logic of TBIC is developed based on the transition among the 0, 1, and M states. The result of simulations shows that TBIC can reduce bus-transitions by 45.7%. This reduction ratio is 83% greater than the maximum reduction ratio achievable by ordinary BIC with invert-lines. The large performance improvement of TBIC comes from the effective suppression of OTs. The number of OTs generated in TBIC is only 17% of the OTs generated in invert-lines.
