This letter proposes a low-overhead MSB-controlled inversion coding technique to reduce the transition activity in a matrix transposer. A family of designs is identified in which this technique is applied to different bit slices of the matrix data and the optimal design within the family is determined using transition activity analysis for DCT and IDCT applications.
MSB-controlled inversion coding algorithm:
Because MSB-CIC works best for those bits that have high spatial correlation, which for DCT/IDCT data tends to be in the most significant bits, we characterize a family of designs in which the MSB-CIC is applied to the most significant k bits. 
, and c i l = b i l otherwise. As mentioned earlier, the different values of k identifies a family of designs for which we expect MSB-CIC to be most effective.
Notice that when the magnitude of B i is small, the Hamming distance between adjacent C i data will be smaller than that of adjacent B i data. Consequently, the idea behind our MSB-CIC algorithm is to reduce the transition activity in the matrix transposer by transposing C i instead of B i and then reconstruct B i after transposition.
Low-power matrix transposer:
The proposed low-power matrix transposer is composed of a conventional matrix transposer, additional MSB-CIC and decoding (MSB-CID) circuits as shown in Fig. 1 . Most commonly, matrix transposition is implemented using a two-dimensional array of transposition cells (TCs) each containing a multiplexer and a register [3] [4] . The conventional matrix transposer is typically constructed with the 2-dimensional array of TCs, and the multiplexers inside the TCs and the output multiplexers route data either vertically or horizontally depending on the status of the row/column control signal, sw. For clarity, we decompose the conventional matrix transposer into an upper slice (k th and higher bits) in which the input data is encoded and a lower slice (k-1 th and lower bits) in which the input data is not encoded.
Fig. 1 Proposed low-power matrix transposer (4×4)
As shown in Fig. 2 , the MSB-CIC and MSB-CID circuits are implemented using a bank of exclusive-OR gates, one gate for each of the k most significant bits, excluding the MSB itself. The MSB is passed directly into the matrix transposer and is the conditional inversion control signal for all other most significant bits. Unlike the conventional conditional inversion coding, MSB-CIC doesn't require an additional control signal for coding and decoding.
Fig. 2 Circuits for MSB-controlled inversion coding a MSB-CIC circuit b MSB-CID circuit
Results: We first gathered statistics of the intermediate data to be transposed between two 1-D DCT/IDCT processors for 10 frames of three image sequences (i.e., flower garden, football, and table tennis), each having 720×480 resolution, or equivalently 81,000 8×8 blocks. As illustrated in Fig. 3 , we observed that for more than 50% of DCT data, the 8 most significant bits are SEBs and that for more than 80% of IDCT data, all bits are SEBs. The statistics of the data reflects the fact that most of the DCT/IDCT data is small in magnitude for which most bits are SEBs. Therefore, a large fraction of the transition activity can probably be attributed to adjacent data values that alternate between small positive and small negative numbers. 
Fig. 3 Average probability distribution of SEBs a DCT data b IDCT data
We then determined the optimal value of k for the same data through a bit-level hardware model of the conventional and proposed architectures. The transition activity is measured in both the TC array and the MSB-CIC and MSB-CID circuits. It is assumed that the switched capacitance of one TC is twice that of a single exclusive-OR gate. The percentage reduction of the transition activity for the MSB-CIC is measured by t(C) -t(O), where t(C) is the transition reduction ignoring the transition activity of the MSB-CIC and MSB-CID circuits and t(O) is the fraction of transition activity of the overhead circuits. Fig. 4 shows the average reductions with and without including the overhead for all values of k. The results indicates that the optimal value for k for DCT and IDCT data of the matrix transposition is the 9 th and 12 th bit from MSB, respectively. Although not shown, our experiments suggest that the optimal value of k does not significantly vary across different image sequences. The resulting reduction in transition activity for the optimal value of k is approximately 33% for DCT data and 46% for IDCT data. Due to the randomly distributed bit patterns in the LSBs, as the range of the MSB-CIC exceeds k bits from the MSB, the efficiency is lowered because the reduced transition activity does not overcome the additional overhead.
Fig. 4 Optimal k-bit position extraction from the MSB-CIC a DCT data b IDCT data
Conclusions: We have presented MSB-CIC technique to reduce the transition activity with low hardware overhead, and have shown its on-chip realization for reducing power of matrix transposer. Whereas the bus-invert coding techniques are recommended only for system level high capacitive buses [2] , the proposed MSB-CIC technique can be successfully applied for on-chip two's complement buses that typically transmit small numbers.
The technique can be viewed as an efficient partial and local transformation of two's complement data into sign-and-magnitude data. Thus, our matrix transposer design can seamlessly be incorporated into existing typical data-paths that are designed for two's complement data. This feature is important since most existing multimedia standards assume two's complement representation. 
(C ) t(O ) t(C ) -t(O )

