Abstract-Dynamic power dissipation on I/O buses is an important issue for high-speed communication between chips. One can use coding techniques to reduce the number of transitions, which will reduce the dynamic power. Bus-invert coding is one popular technique for interchip buses, where the dominant contribution is from the self-capacitance of the wires. This algorithm uses an invert line to signal whether the bus data are in its original or an inverted form. While the method appears to be a greedy algorithm, we show that it is, in fact, an optimal strategy. To do so, we first represent the bus and invert line using a trellis diagram. Then, we show that applying bus-invert coding to a sequence of words gives the same result as would be obtained by using the Viterbi algorithm, which is known to be optimal. We also show that partitioning an -bit bus into subbuses and using bus-invert coding on each subbus can be described as applying the Viterbi algorithm on a 2 -state trellis.
Optimality of Bus-Invert Coding I. INTRODUCTION
T HE dynamic power dissipation due to wire-to-ground coupling of a multiwire bus is given by the following equation: (1) where is the load capacitance of a single bus wire, is the power supply voltage, is the transition frequency, and is the activity factor [1] - [8] . Assuming that has been decreased to its optimal value and has been optimized through layout, (1) indicates that further reductions in power dissipation are available by applying coding, which lowers the total amount of switching activity on the bus. Moreover, reducing the number of transitions on buses can also help to minimize noise on the bus, as shown in [4] .
The general idea of coding for low power is to use a -bit bus to transmit bits of data together with -control bits in such a way as to reduce the Hamming distance between successively transmitted words. It should be noted that low power coding is not equivalent to error control coding, although there exist some relationships between these two sets of problems [10] . Significant prior contributions, including [1] - [7] , can be divided into two application areas, i.e., minimizing power on data buses or on address buses. The bus models used also vary, with a wire-to-ground capacitance model adopted in [1] - [5] , and a wire-to-wire capacitance model used in [6] and [7] . A transition signaling scheme is incorporated in [3] together with low-power coding whereas others use level signaling. The redundancy introduced as a result of the coding can be placed in space (as additional bit lines) or in time (as extra transfer cycles), as shown in [2] . Bus-invert coding can be profitably applied to interchip data buses that are modeled as wire-to-ground capacitances and where wire-to-wire capacitances are assumed to be a less important factor. The idea is to send either the original or the inverted form of the next data word, making that decision by calculating the Hamming distance between two consecutive words on the bus. One additional signal line is needed for the receiver to be able to correctly interpret the bus values. With this scheme, a 50% reduction in peak power is guaranteed and up to a 25% reduction in average power can be achieved [1] .
The authors in [1] state that bus-invert coding is optimal in the sense that given the same redundancy (one extra bus line), no other coding can achieve a larger reduction in the number of transitions between adjacent data words. However, the following question still remains: is there an optimal method for an entire sequence of transitions, i.e., one that considers the total number of transitions required to transmit an entire sequence of specific data words? We will show that bus-invert coding is, in fact, optimal in this sense.
The remainder of the paper is organized as follows. In Section II, the importance of a trellis diagram and the applicability of the Viterbi algorithm are explained. In Section III, we map bus-invert coding onto the trellis diagram and find the minimum weight path through the trellis (which translates to the minimum number of transitions) using the Viterbi algorithm. Section IV shows that partitioning an -bit bus into subbuses and using bus-invert coding on each subbus can be described as applying the Viterbi algorithm on a -state trellis diagram. Finally, in Section V, the paper is summarized and our conclusions are given.
II. TRELLIS DIAGRAM AND VITERBI ALGORITHM
The trellis diagram is a representation of a finite-state machine (FSM) that shows all possible state transitions over time [9] . The diagram contains states and branches (i.e., connections between pairs of states) that correspond to a particular state transition in a particular clock cycle. Each transition is triggered by an input and produces an output , where denotes the time index. A sequence of branches through the trellis diagram from a beginning state to an end state is called a path, and an input sequence corresponds to a unique path. A branch metric is associated with each branch, and one can calculate the 1549-7747/$25.00 © 2008 IEEE path metric, which is the sum of the branch metrics along that path.
The Viterbi algorithm is a computationally efficient algorithm that relies on the special structure of the trellis to achieve a complexity that is proportional to [9] . For every state at each time , the Viterbi algorithm calculates the shortest path that leads to that state and eliminates all other paths as being suboptimal. This is accomplished by calculating a state metric for each state at time that is defined as the accumulated distance along the minimum path leading into that state. The state metrics at time can be computed in terms of the state metrics at time via the equation (2) where is the predecessor state of and is the branch metric for the transition from state to . The next phase of the algorithm involves tracing back to determine the optimal path, which is called the survivor path for that state. Trellis diagrams and the Viterbi algorithm have been widely used in communications applications such as channel coding and source coding, as well as in many other areas [9] .
III. OPTIMALITY OF BUS-INVERT CODING
Consider an -bit bus, where the additional line is the signal bit (Sb) that indicates whether the information is in its original or an inverted form. Here, we define information as the original -bit data before appending the signal bit, while a codeword is defined as the -bit encoded data. The codeword is constructed by appending the information with either (i.e., noninverted codeword) or with (i.e., inverted codeword). Sb plays the role of the invert bit used in bus-invert coding. We construct a trellis diagram where each top node corresponds to a noninverted codeword state and each bottom node corresponds to an inverted codeword state at time , where . The first node corresponds to the initial state in which the codeword is initialized as the noninverted codeword. Each input corresponds to the Sb value at time and each output corresponds to bits of either the noninverted or the inverted codeword at time . For , label each of the branches in the th stage of the trellis with a weight equal to the Hamming distance between the th and the st node. To obtain the minimum number of transitions on the data bus over time slots, we apply the Viterbi algorithm as described in Section II.
An example is presented to illustrate this procedure. An 8-bit data bus over 16 time slots is shown in Fig. 1(a) . The data are a uniformly distributed random sequence of values, and there are 64 transitions during the 16 time slots. Fig. 1(b) shows the same sequence of data as in Fig. 1(a) coded using the bus-invert method. In the latter case, there are only 53 transitions during the 16 time slots. Fig. 2 shows the mapping of the data sequence of Fig. 1(a) onto the trellis diagram. It is a two-state trellis with 16 stages. The first node codeword vector is , which corresponds to B0 through B7 followed by Sb. For , the noninverted codeword vector is while the inverted codeword vector is . The branch metric values for the first node to the noninverted state and for the first node to the inverted state are 4 and 5, respectively. Therefore, the state metric at time for the noninverted and inverted states are also 4 and 5, respectively. At , the state metric for the noninverted and inverted states are 54 and 53, respectively, which directly translates to the number of transitions made along each survivor path. Therefore, to optimally minimize the total number of transitions over all of the time slots, the survivor path that leads to 53 transitions is selected. Fig. 2 shows the survivor path that leads to this optimal solution over the 16 time slots. Tracing back the survivor path through noninverted and inverted states to the starting node gives the sequence of codewords that must be transmitted from time to in order to achieve this minimum value. Fig. 3 shows the codewords obtained after tracing back the survivor path.
We note that the survivor path of the trellis graph yields exactly the same set of invert bit values that bus-invert coding generated. Thus, we see that bus-invert coding is an optimal method for this sequence. The question then becomes: will the bus-invert and Viterbi algorithms always produce the same result? In  Fig. 2 , we observe the following symmetry property: at any time , the branch metric for state has the same value as for state , while the branch metric for state has the same value as for state . Thus, the survivor path into the and nodes will enter them either both horizontally or both diagonally. So, regardless of whether we are coming from or , we will make the same decision about whether the bits should be inverted between time and time . This corresponds precisely to the fact that in bus-invert coding, the inversion decision at time does not depend on the inversion status at time . Thus, we come to the conclusion that the Viterbi algorithm will produce the same set of decisions as in bus-invert coding. Fig. 1(a) . N and I denote noninverted codeword and inverted codeword states, respectively. The dotted line shows the survivor path after applying the Viterbi algorithm, which results in 53 transitions, the minimum number of transitions during the 16 time slots. The reason for the existence of the symmetry property of the branch metric values is explained as follows. Consider two -bit codewords, and at time and , respectively, having codeword bits and , where . Note that and are in GF(2). The complemented forms of the previous codewords are denoted by and having codeword bits and . The Hamming distance between two codewords is computed by counting the number of 1s at the output of an XOR operation on each pair of codeword bits. Equations (3) and (4) show certain well-known properties of the XOR operation (3) (4) Therefore, the following relationships between Hamming distances are obtained:
IV.
-STATE TRELLIS DIAGRAM
In order to decrease the average power dissipation for wide buses, the bus can be partitioned into several narrower subbuses [1] . Each of these subbuses can then be coded independently using its own signal line. If an -bit bus is partitioned using a partition factor of , it will produce subbuses, where each subbus contains data lines. The resulting bus width is bits, having a total of invert bits. Fig. 4 . Same sequence of data as in Fig. 1 (a) using a partition factor of 2 and applying bus-invert coding. Fig. 4 shows the same sequence of data as in Fig. 1 (a) using a partition factor of , resulting in two subbuses where each subbus is coded using the bus-invert method. Here, contains the lines and contains . In this case, there are only 47 transitions over the same 16 time slots.
Bus-invert coding on a partitioned bus can be directly translated into two 2-state trellis diagrams, one for each subbus, as shown in Fig. 5 . Tracing back the survivor path for each separate trellis diagram leads to the codewords that must be transmitted at each time to in order to achieve the minimum number of transitions on each subbus.
However, the aforementioned partitioning example can also be viewed as a four-state trellis graph, as shown in Fig. 6 . In this trellis graph, there are four states: ''both subbuses noninverted '' , ''subbus0 noninverted and subbus1 inverted'' , ''subbus0 inverted and subbus1 noninverted '' , and ''both subbuses inverted '' . The two invert bits for each , , , and state are , , , and , respectively. At , the state metrics for the , , , and states are 48, 47, 49, and 48, respectively, which directly translates to the transitions made along each survivor path. Therefore, to minimize the transitions over time slots, the survivor path that leads to 47 transitions is desired. Fig. 6 shows the survivor path that leads to the minimum number of transitions over the 16 time slots. Tracing back the survivor path through , , , and states to the starting node translates into the set of codewords that must be transmitted from time to in order to achieve the minimum number of transitions. Fig. 7 shows the codewords obtained after tracing back the survivor path. It leads to the same result as applying bus-invert coding to each of the two subbuses.
In general, for a partitioning factor of , we can construct a -state trellis graph. There are two states corresponding to ''all subbuses noninverted'' and ''all subbuses inverted.'' The other states correspond to various noninverted and inverted patterns on the subbuses.
V. CONCLUSION
In this paper, we have shown that a symmetry property of the branches in the trellis diagram leads to the result that the Viterbi algorithm produces the same result as bus-invert coding. Thus, we find that bus-invert coding is optimal for a sequence of data words. We have also shown that partitioned bus-invert coding can be viewed as applying the Viterbi algorithm on a -state trellis diagram.
