Abstract: This paper proposes a loop prediction encoding method for decreasing power consumption on instruction memory address bus. The loop prediction encoding is based on detecting and predicting loop programs. The experiment results show that our method can decrease switching activity up to 81.5% on average, with small overheads on performance and area.
[13] Texas Instruments Inc: TMS320C62x DSP Library Programmer's Reference (2003) http://www.ti.com/lit/ug/spru402b/spru402b.pdf.
Introduction
With the rapid increase in the complexity of chips and the popularity of hand held and portable devices, power consumption has become one of the main design criteria, especially in battery-driven applications such as mobile phones, laptops, PDAs, etc. that require longer battery life. Reliability concerns and packaging costs have also made power optimization even more important in current designs. Moreover, with the developing trend towards System on a Chip (SoC) application, power has become a critical parameter that needs to be considered along with speed and area.
In a processor, a considerable amount of power is consumed on the off-chip or on-chip buses. It is estimated that power dissipated on the buses of an IC ranges from 10% to 80% of the total power dissipation with a typical value of 50% for circuits optimized for low power [1] . Because the capacitance of these bus lines is usually several orders of magnitude higher than the capacitance of transistors. A typical capacitance value of bus is 50 pf [2] . It is desirable to add some logic to encode the address before sending it over the bus, and in this way, decrease the switching activity on address buses.
Conventionally, instructions are stored in memory in a sequential order. In order to execute a program, the processor fetches instructions from memory one after another. Each instruction fetch will cause some bit lines on the instruction memory address bus to switch. What's more, in the embedded systems, programs spend most of execution time on small loops, and generally 10% of the code occupies 90% of the execution time [3] . Take a simple program, shown in Fig. 1 , as an example. These instructions are saved in the memory, with each instruction being expressed by its decimal memory address. Instruction memory address bus 29, 30, 31, 32, 33 are a loop and it will be repeatedly executed Nþ2 times. As can be seen, if loop programs are correctly predicted, the encoding bus can keep original value for a long time and the switching activity of the address bus will be obviously reduced. The main contributions of loop prediction bus encoding are: 1) This paper proposes an instruction memory address bus encoding method. This method can remarkably reduce switching activity and power consumption for both sequential instructions and backward jump instructions by using loop prediction.
2) In order to reduce the area overhead, this bus encoding method employs loop base register and loop offset register to calculate the prediction address. Thus, the area overhead can be dramatically optimized.
The rest of this paper is organized as follows. Section 2 provides a review of previous address bus encoding approaches. Section 3 proposes the Loop Prediction (LP) encoding method and its implementation. Section 4 presents experimental results to verify the proposed approach. Finally, the paper is concluded in section 5.
Related work
In this section we will review similar works in address bus encoding and compare various encoding approaches.
In There is another class of encoding approaches that avoids the use of redundant bits. These approaches exploit the de-correlating characteristic of the Exclusive-OR function. The most efficient one of these codes is T0-XOR, which was proposed in [5] by Fornaciari et al. The encoder works as follows:
And the following notation will be used throughout this paper: b ðtÞ : address value to be sent on the bus at time t (source-word at time t). È: logic xor operation. It can be easily seen that when the address is sequential, no switching activity occurs. In the same work, the authors proposed another encoding approach, called Offset-XOR. The encoder works as follows:
The bus encoding problem was even generalized by Ramprasad et al. in [6] . In this paper, the authors presented an encoding framework where an encoding method is abstracted as a two-step process: de-correlating and encoding. Data to be transferred over the bus is first de-correlated for high entropy, which then leads to minimal bus bit switching activity. With different data de-correlating algorithms and different encoding approaches, the framework can generate many bus encoding schemes.
In practice, one out of every seven instructions is a control transfer instruction [7] . It is not always possible to calculate the current address by incrementing the previous one. Building on T0 and Offset-XOR encoding techniques, in [8] Aghaghiri proposed three irredundant bus encoding approaches. If a backward jump is encountered in an instruction trace, the D will be negative. In addition, based on statistics reported in [7] , more than 95% of all the branches in any program have offsets that need less than 8 bits to be binary encoded. Therefore, this negative number tends to have a small magnitude, and it will contain many 1's. To solve the "small negative offset" (backward jump) problem, the paper proposed Offset-XOR-SM encoding approach. The encoder works as follow:
The LSBInv (x) function inverts all bits of x except the most significant bit (MSB).
In order to solve backward jump problem, in [9] Sun proposed irredundant sorting encoding. This approach reordered the ten least significant bits of address offset according to the value of address offset.
In [10] Hui et al. presented a design approach to enhance the switching reduction efficiency by using a shifted Gray code for a given application.
In [11] Hajkazemi et al. proposed a power-aware bus encoding method. This method is based on some modifications in tree updating stage of the adaptive Huffman encoding.
The T0 encoding and bus-invert (BI) encoding were applied to a SoC in [12] . And about 10% power savings was obtained.
Loop prediction encoding
The existing address bus encoding approaches mainly focus on optimizing sequential instructions [4, 5, 6] or backward jump instructions [8, 9, 10] .
LP encoding method, to further decrease the switching activity on the address bus, exploits the feature that generally 10% of the loop programs occupy 90% of the execution time. This suggests that a large reduction in the switching activity can be achieved by concentrating on loop programs. As a result, the switching activity can be reduced for both sequential instructions and backward jump instructions.
However, the loop start also needs to be detected. Let us reconsider the example shown in Fig. 1 . In fact, a large number of backward jump instructions are the start of a loop program. The relationship of backward jump with loop is shown in Fig. 2 . Nearly 80% backward jump instructions cause loop start for most applications.
This section first explains the address bus switching activity reduction by using LP encoding. Then we present the detailed algorithm and depict the hardware implementation.
Address bus switching activity reduction
The address bus Switching Activity (SA) can be measured by bit switching of all parts of consecutive address code-words on the bus. For an n-instructions address sequence, the total SA can be determined by the following equation:
A i is the ith instruction address. The total SA of programs can be also divided into two parts, as can be seen from the following equation:
w m and w h are proportions of out of loop programs and loop programs, respectively. w m 2 ½0; 1, w h 2 ½0; 1 and w m þ w h ¼ 1.
If the loop programs can be correctly predicted, the address bus will be unchanged, and that is, the SA loop is zero. Hence, the switching activity and power consumption can be saved. In order to estimate the SA saving ratio, we assume that SA out-of-loop is roughly equivalent to SA loop .
Based on above equation and analysis, the SA saving ratio can be easily obtained:
Eq. (6) indicates that if the proportion of loop programs is big enough, the LP encoding can remarkably save switching activity and power consumption. 
The LP encoding algorithm
The LP encoding algorithm is shown in the Table I. In LP encoding, if there is a backward jump, the encoder will set JumpFlag to one. Otherwise, JumpFlag is zero. Moreover, two extra bits are added to the bus. We call the extra bits LoopStart and LoopHit. The LoopStart and the LoopHit inform the decoder that there is a backward jump and the prediction address is hit, respectively.
In the step 1, if the most significant bit of D is one (i.e., JumpFlag is one) in the first time, it indicates that a backward jump occurs. The control unit of LP module transfers from the idle state to the detect state, and the register bank unit of LP module saves the b ðtÞ to the loop base register, then the loop count register begin to count the length of the possible loop. At the same time, the loop offset register needs to save the each address bus of the possible loop in every clock cycles. The memory space of an ninstructions m-bits width bus can be expressed by the following equation:
Eq. (7) indicates that it will definitely cause a large area overhead. Nevertheless, note that most of loop inside instructions are sequential or unchanged, as shown in Fig. 1 . Therefore, we can use only one bit to record this information. For example, if the b ðtÞ is sequential, take the value of loop count register in the current clock cycle as the index address, then the corresponding bit of loop offset register is set to one. Otherwise, if the b ðtÞ is unchanged, the corresponding bit is set to zero. The optimized memory space can be expressed by the following equation:
In the step 2, if the JumpFlag is one in the second time, the control unit of LP module transfers from the detect state to the predict state, and starts to calculate the prediction address. The prediction address is equal to the value of the loop base register in the first clock cycle. If the prediction address is equal to the b ðtÞ (i.e., there is a hit), the B ðtÞ can be unchanged and the power consumption of address bus can be saved. The prediction count register begin to work and take the value of this register as the index address, then read the corresponding bit of loop offset register. If the prediction count register is not more than the loop count register, the prediction address adds reading result from the loop offset register. Otherwise, the prediction address is equal to the loop base register. If the prediction address is not equal to the b ðtÞ (i.e., there is a miss), the b ðtÞ is directly assign to B ðtÞ . Then the encoder and decoder discard the saving information, and wait the next backward jump instruction. More formally, our encoding method can be described as follows: 
The B ðtÞ p is the prediction address of LP module. The corresponding decoding method can be formally defined as follows:
The LP encoding implementation
The LP encoder architecture is shown in the Fig. 3 . The only input signal is b ðtÞ . The sub unit detects the backward jump, and the MSB (D) is directly assigned to the JumpFlag. The LP encoder employs a LP module to predict instruction memory address bus. The LP module mainly divided into two parts. One is control unit, and it is a Finite State Machine (FSM) that controls the state transition of the LP module and calculates output signals. The other is register bank unit, and it is composed of five registers. The encoder working process is already specified in the above subsection. Similar to the encoder, Fig. 4 shows the architecture of LP decoder. The B ðtÞ and the LoopStart input the LP module. The functionality of LoopStart is equivalent to the encoder JumpFlag. And the LP decoder selects the decoding bus based on the value of the LoopHit.
Experiment results
To evaluate the proposed encoding method, some classic DSP benchmarks [13] are utilized. The LP encoding method is evaluated on gate-level with the power analysis tool (Prime Time PX) based on the switching activity. The gate-level netlist is generated by the synthesis tool (Design Compiler) with the technology library of SIMC 90 nm process. In order to prove the effectiveness of this method, the Offset-XOR-SM bus encoding is also accomplished in our experiment. The simulation results can be seen in Table II. In Table II , the address bus SA with respect to binary address encoding is given in column 2. Column 3 shows the SA related to Offset-XOR-SM address bus encoding, with its percentage SA reduction as compared with the binary encoding shown in column 4. Column 5 and 6 list results that are obtained by using the LP address bus encoding method.
As can be seen, the SA reduction rate of LP varies from application to application, ranging from 45.04% to 96.56%. The improvement of the LP encoding, as compared with the Offset-XOR-SM, is shown in the Table II . The LP encoding is especially effective for applications that contain many loop programs.
The area and delay overheads are shown in the Table III. As can be noted, the delay overhead of LP encoding is comparable to the Offset-XOR-SM. But the area overhead of LP encoding is more than that of the existing approach. Considering the fact that the reduction in switching activity obtained with LP encoding, we think this extra overhead in area is acceptable.
The original address bus power consumption is given by Eq. (11) where x is the SA of the b ðtÞ .
For a given encoding scheme, we employed Eq. (12) to calculate the power consumption as the sum of encoder, bus and decoder.
In Eq. (12), Power consumption of P c-encoder occurs in the transmitting module and P c-decoder occurs in the receiving module. Thus, the power con- 
The P c-encoder and P c-decoder of these two encoding methods are reported in Table IV . Since the load capacitance of bus is normally multiple orders of magnitude higher than that of standard cells [2] . If the P c-bus can be effectively reduced, the total power consumption will be dramatically saved.
In our experiments, the total power savings is shown in the Fig. 5 for different values of bus capacitance. It can be seen, the LP encoding bring higher power savings than the Offset-XOR-SM encoding. With increase of the bus capacitance, the power saving of each encoding reaches to their SA reduction rate. 
Conclusions
An instruction memory address bus encoding method based on loop prediction has been proposed in this paper. This method can effectively reduce switching activity for both sequential instructions and backward jump instructions. This paper analyzed bus switching reduction and identified the close relationship of backward jump instructions with loop programs. It can be found that the backward jump instructions generally are the start of loop programs. Hence, this paper proposed a LP encoding algorithm so that loop programs can be detected and predicted. The simulation results show that the switching activity reduction rate ranges from 45.04% to 96.56%. The area overhead has also been considered. In order to reduce the area overhead, this paper found the pattern of loop inside instructions, and then employed the loop based register and loop offset register to calculate the prediction address.
The method proposed in this paper incurs small area and delay overheads in comparison with the power saving on the instruction memory address bus. Therefore, it is highly feasible and efficient.
