Abstract-High levels of integration in integrated circuits often lead to the problem of running out of pins. Narrow data buses can be used to alleviate this problem provided that the degraded performance due to wait cycles can be tolerated. We address bus coding methods for low-power core-based systems incorporating narrow buses. We show that transition signaling combined with bus-invert coding, which we call BITS coding, is particularly suitable for the data patterns of typical DSP applications on narrow data buses. The application of BITS coding to real circuit design is limited by the extra bus line introduced, which changes the pinout of the chip. We propose a new coding method, which does not require the extra bus line but retains the advantage of BITS.
Narrow Bus Encoding for Low-Power DSP Systems Youngsoo Shin, Kiyoung Choi, and Young-Hoon Chang Abstract-High levels of integration in integrated circuits often lead to the problem of running out of pins. Narrow data buses can be used to alleviate this problem provided that the degraded performance due to wait cycles can be tolerated. We address bus coding methods for low-power core-based systems incorporating narrow buses. We show that transition signaling combined with bus-invert coding, which we call BITS coding, is particularly suitable for the data patterns of typical DSP applications on narrow data buses. The application of BITS coding to real circuit design is limited by the extra bus line introduced, which changes the pinout of the chip. We propose a new coding method, which does not require the extra bus line but retains the advantage of BITS.
Index Terms-CMOS, low-power, switching-activity, systemlevel.
I. INTRODUCTION
R ECENTLY, core-based design of digital systems has become prevalent, in order to cope with ever decreasing time to market. A system designed with cores often contains a lot of components, such as core processors, digital signal processors (DSPs), peripheral interface circuits, and application specific integrated circuits (ASICs). The number of pins, which directly contributes to the size and cost of a chip, is one of the problems with such a high level of integration because pin-count naturally increases with the number of components integrated into the chip. Communication with off-chip devices by means of narrow data buses is one way to reduce the number of pins, at the cost of reduced performance due to an increased number of wait states. Narrow buses are also used to build low-cost high-volume embedded systems. For example, a system requiring a 16-bit wide external memory can use an 8-bit wide narrow bus if the increased delay for memory transfer can be tolerated. The cost of the system is reduced because a cost-effective 8-bit wide memory and bus can be used and fewer pins are required.
Power consumption is another problem with high integration, especially in portable systems such as cellular phones and personal digital assistants (PDAs). It is a well-known fact that a lot of power is consumed driving the off-chip bus, due to the large off-chip driver, the pad capacitance, and significant off-chip capacitance [1] . Power consumed by off-chip driving becomes dominant as devices are scaled down, because off-chip capacitance does not depend on process technology, but depends on the package and printed circuit board (PCB) technologies.
Bus encoding reduces the power consumption of a bus by encodingtheinformationtransferredonthebusinsuchawaythatthe encoded version has fewer transitions than the original. There are various low-power coding methods for data buses: bus invert (BI) code [2] , [3] for uncorrelated data patterns, transition signaling [4] for data patterns where the probability of and is very different and probability-based mapping [5] , [6] , followed by transition signaling for patterns with nonuniform probability densities.
In the case of instruction address patterns, gray code [7] , code [8] , and inc-xor [6] are efficient.Forspecial-purpose applications, where the information about the sequence of patterns is available a priori, the characteristics of patterns can be exploited to reduce bus transitions. The beach solution [9] and the partial bus invert (PBI) code [10] perform well in this case.
In this paper, we study bus coding schemes for low-power core-based systems incorporating narrow buses. 1 Although various low-power bus coding techniques have been proposed, none of them has explicitly addressed the problem of coding patterns on a narrow bus. In contrast to a bus of full width, the narrow bus exhibits one very different property: correlations between consecutive patterns, if they exist, are entirely lost. We show that transition signaling combined with bus-invert (BI) coding [3] , which we call BITS coding in this paper, is particularly suitable for this situation.
The extra bus line used in BITS coding makes the coding difficult to use in real circuit design because it implies changes to the interface specification of the chip. Furthermore, power reduction is obtained at the cost of the overhead of encoding and decoding circuits, which introduce delay, take up area and, indeed, themselves require power. To overcome the need for the extra bus line and the overhead of encoding and decoding circuits, while retaining the advantage of BITS coding, we propose a new coding method called half-identity half-reverse and transition signaling coding. In this technique, coding logic is greatly simplified and the overhead of power, delay, and area accompanying the encoder and decoder circuits is much reduced. More importantly, coding does not use any spatial and temporal redundancy and thus provides a coding method that is efficient in broad class of circuit designs. Furthermore, when a narrow data bus is used between a processor core and an off-the-shelf external memory, data can be stored in the memory in -encoded form and then used by the processor after loading and decoding, which is difficult with BITS coding. 
II. CODING FOR A NARROW BUS

A. BI and BITS Codings
If data patterns are randomly distributed in time and mutually independent in space, transferring them across a narrow bus (each pattern in multiple cycles) does not destroy the original randomness of the patterns. BI code (for which an encoder and decoder are shown in Figs. 1(a) and 2(a), respectively) still performs well in this case. In BI encoding, if the number of transitions between the current pattern on the bus, denoted by , and the previous pattern, denoted by , exceeds half the bus width, the current pattern is transferred with each bit inverted. An extra bus line, denoted by in Fig. 1(a) , is used to signal the inversion.
When data patterns do not follow this ideal property, as is often the case in telecommunications, speech, and image processing applications, BITS code (for which an encoder and decoder are shown in Figs. 1(b) and 2(b), respectively) can be shown to perform better. In BITS encoding, if the number of s in is larger than half the bus width, then each bit of is inverted (with line set to ) and then transition-encoded. Otherwise, each bit of is transition-encoded without alteration. 2 As an example, consider the decimal representation in Fig. 3(a) , which is drawn from samples of human speech [11] . Fig. 3(b) shows a typical 16-bit two's complement fixed-point representation with an 8-bit integral part and an 8-bit fractional part. 3 As shown in the figure, the low-order bits tend to be random whereas the high-order bits tend to have high spatial and temporal correlations. However, the temporal correlations of the high-order bits are lost when the original patterns are transferred on a narrow bus (8-bit wide bus in this example) since the high-order and low-order bits appear on the bus lines alternately, as illustrated in Fig. 3(c) . Instead, the patterns of high-order bits are more spatially correlated, that is, the probability that the occurrence of or dominates each pattern increases compared to the original patterns.
If we apply BI encoding to the patterns in Fig. 3(c) , the number of transitions is reduced from 37 to 30, as illustrated in Fig. 4(a) , where bold face indicates the value of the signal on line . Further reductioncan beobtainedif weapplyBITSencodingas illustrated in Fig. 4(b) . This is because it is highly probable that the occurrence of or dominates each pattern due to sign extension [see Fig. 3(c) ], and BITS encodes as a transition in the former case and as a transition in the latter case.
B. Coding
Although substantial power saving is possible with BITS coding, its application to real circuit design is severely limited by the extra bus line, which calls for a change in the pinout and interface specification of the original chip. Another problem is that the overhead of BITS coding cannot be neglected, as will be illustrated in the implementation of the encoder and decoder circuits in the next section. We tackle these two problems by focusing on the first: the solution to the second problem is obtained as a byproduct.
The advantage of BITS coding mainly comes from the fact that it is highly probable that the occurrence of or dominates each pattern. This is graphically shown in Fig. 5(a) , where the distribution of patterns on an 8-bit wide narrow bus for noisy speech signals, which are represented by 16-bit two's complement fixed-point numbers with an 8-bit integral part and an 8-bit fractional part, are shown. Note that the values along the horizontal axis show the values of the 8-bit patterns represented in decimal as if they were two's complement signed integers. Fig. 5(b) shows distributions for music signals (16-bit two's complement signed integers). Although different kinds of signals are used for Fig. 5(a) and (b) , distributions of 8-bit patterns look very similar: patterns around occur most frequently. This implies that, during BITS encoding, most inversions occur for the patterns near the left-hand side of , where patterns with high weight (the number of s) are concentrated.
Recognizing that a coding method that does not require spatial redundancy (redundant bus lines) implies a one-to-one and onto mapping from a space consisting of all -bit patterns to the same space, we construct a mapping called an mapping, which uses an identity function for the set of patterns on the right-hand half (from to ). We map the set of patterns on the left-hand half (from to ) to the same set with orders reversed (e.g., is mapped to ). Thus, patterns near to the left-hand side of , which have high weight and thus are candidates for inversion in BITS encoding, are mapped to those having low weight. Since the patterns have high frequency, reducing the weights of those patterns effectively reduces the total weight.
Note that the majority voter used in BITS encoding to decide whether to encode or as having a transition, which is the main overhead, is not needed in encoding ( mapping followed by transition signaling. 4 ) In other words, encoding [see Fig. 1(c) (2) where denotes a transition encoding of with respect to .
The decoding process of can be carried out by if otherwise (3) where . The application of encoding for our example is illustrated in Fig. 4(c) .
The main observation behind encoding is that it is highly probable that the guess is correct when the MSB corresponds to the actual sign of the pattern [ in Fig. 3(c) ]. Furthermore, even for the remaining patterns [ in Fig. 3(c) ], the probability of an incorrect guess 5 is kept low. Specifically, if an -bit wide pattern is completely random, it can be shown that the probability of an incorrect guess is given by (4) For example, the probability is 0.227 for an 8-bit pattern and 0.125 for a 4-bit pattern. Although encoding obtains less transition reduction due to incorrect guesses, the overall power consumption (including the power consumed by the encoder itself) is comparable to that of BITS encoding because the encoder consumes less power than the BITS encoder. More importantly, the extra bus line is not used in coding and, as we see in the next section, both the delays of the encoder and the decoder for coding are below , 4 In the coding framework proposed in [6] , hihrTS encoding is equivalent to f = hihr mapping and f = identity without function F . 5 Because inversion is not determined by the weight but by the MSB of a pattern in hihrTS encoding, even a pattern with its weight less than half the bus width is inverted if its MSB is equal to one. This is an incorrect decision because the number of transitions is increased rather than decreased. thus lends itself to adaptation as a coding method in a broad range of circuit designs.
C. Experimental Results
To evaluate the efficiency of encoding in bus transition reduction, we perform experiments for the following set of sample patterns.
• and input and output signals from a noise canceller [11] , which receives two signals (noisy speech and reference noise signals) as inputs and produces a noise-cancelled speech signal as an output. Each signal is encoded as a 16-bit two's complement fixed-point with an 8-bit integral part and an 8-bit fractional part.
• and music signals from WAVE files, which store 16-bit PCM samples as two's complement signed integers.
• and patterns of real and imaginary parts of data processed by a 128-point complex FFT processor, which is one of the blocks in an audio decoder that is designed with VHDL [12] . Each pattern is encoded as a 20-bit two's complement signed integers, and is extracted through VHDL simulation.
• the temporary result of speech processing from a DSP core, which is a part of an industrial example, a digital hearing aid (DHA) system [13] , to be stored in external SRAM. The pattern encoded as 16-bit two's complement signed integers is extracted through VHDL simulation. For off-chip communication using a narrow bus, we assume that two cycles are needed to transfer each pattern, meaning that the width of the narrow bus is half the width of each pattern.
The resulting percentage reduction in the number of transitions compared to the unencoded case for BI, BITS, and is shown in Table I . BITS encoding obtains a substantial reduction compared to unencoded patterns and BI-encoded patterns.
encoding performs as well as BITS, although it relies on a fairly simple encoding mechanism and does not use the extra bus line. This is an important result because, if we take the power consumed by coding logic into account, coding obtains similar power saving (in some cases, more power saving) without the extra bus line. We will elaborate on this in more detail in the next section. We also report the result obtained with probability-based mapping (pbm) [5] , [6] in the fourth column, although it is of limited practical use due to the severe complexity of the associated encoding and decoding logic. Because pbm requires a representative data set to obtain a mapping function, we use one half of each pattern to obtain the mapping function, and the other half to obtain the result of percentage reduction in the number of transitions compared to unencoded case.
III. IMPLEMENTATION OF CODING LOGIC
Bus coding inherently introduces area, delay, and power overheads due to the encoding and decoding circuits. This overhead should be kept as low as possible in order for the coding to be used in a broad class of circuit designs. We now present the implementation of encoding and decoding circuits for BI, BITS, and codings for an 8-bit wide bus, and compare their power, area, and delay overheads. The coding circuits are designed with VHDL and synthesized using the synopsys design compiler. Layouts are obtained using the cadence silicon ensemble. The circuits are mapped on to a m, 3.3-V gate library developed for the TSMC m CMOS process. We assume a 100-MHz clock frequency.
Note that the latches and output drivers on the encoder side, except for those involved in line of BI and BITS encoders (see Fig. 1) , are also present in the unencoded case and, thus, do not contribute to the overhead of the encoder and decoder circuits.
The area, delay, and power consumption of the encoder and decoder for each coding method are summarized in Table II . The power is simulated using IRSIM, with the pattern used as input vectors. The delay is measured using HSPICE. The delay of BITS encoding may limit the application of BITS coding to systems that are already delay optimized, or clocked at very high speed. takes much less than for each encoding and decoding operation. The power consumed by off-chip driving, denoted by , consists of two parts [1] . One is the power used to drive off-chip capacitance, bonding wires, and the pad capacitance. The other is the power consumed by the encoder, latches, and output drivers. To evaluate , each circuit in Fig. 1 is loaded with the total off-chip capacitance, denoted by . Then the circuits are simulated with IRSIM, applying example patterns as input vectors. The power consumed by the decoder side, denoted by , is also obtained by applying the encoded version of example patterns as input vectors to each circuit in Fig. 2 . The results are shown in Fig. 6 for . We vary from 10 pF, which is typical for multichip module technology, to 30 pF, which is for an advanced package and PCB [1] , and compare the percentage reduction in the total power consumption ( ) for each coding method, compared to the uncoded case.
Compared to Table I , which only counts the number of transitions and, thus, does not take the effect of encoders and decoders into account, the difference between BITS and is very small, due to the reduced power consumption of the encoder. In particular, gets more power saving when is smaller than 20 pF, meaning that can be used even for on-chip data buses having moderate capacitive loading.
IV. CONCLUSION
In this paper, we address bus coding methods, targeting highly integrated low-power systems incorporating narrow buses. For data patterns found in typical DSP applications, we show that transition signaling combined with BITS coding can achieve significant power saving for narrow buses. Since the application of BITS coding in circuit design is limited by the extra bus line and the overhead of the encoder and decoder circuits, we propose coding, which does not require the extra bus line but retains the advantage of BITS coding. Although coding employs a much simpler encoder circuit, overall power saving is shown to be comparable with BITS.
