It is now well admitted that interconnects introduce delays and consume power and chip resources. To deal with these problems, some studies have been done on performance optimization. However, as the results presented in this paper show, such techniques are not based on good criteria for interconnect performance optimizations. We have, therefore, developed a high-level estimation tool based on transistor-level characteristics, which provides fast and accurate figures for both time and power consumption. These results allowed us to create a new interconnect consumption model and also to determine new key issues that have to be taken into account for future performance optimizations.
INTRODUCTION
Today, System on Chip (SOC) are more and more complex and require many computational resources, implying a large volume of data to be stored or to be transmitted. To transfer this data from memory to processor or from one processor to another, on chip interconnect buses or networks have to be used. In state-of-the-art SOC, interconnect can represent up to 50% of the total power consumption. 1 2 Moreover, the transistor and wire dimension scaling has a strong impact on propagation time; indeed the propagation time of a wire becomes higher than that of a gate. 2 Therefore, estimation and optimization of power and delay due to interconnections has become a major issue in SOC design.
With the increase of the die size and the device count, more wires (which are getting longer) are needed for interconnections. One of the means used to help to compensate for interconnections' density is increasing the number of metal layers. However, this wire dimension scaling in turn increases wire resistance which could then be compensated by modifying the aspect ratio (wires are rather thick than large). As a result, the crosstalk capacitance increases for the upper metal layers which is accompanied by extra interconnect power consumption and propagation time. 3 In older technologies (over 250 nm), interconnects were made up of aluminium and were separated from each other by an insulator whose permittivity was near to 4. With the change in the manufacturing processes, 4 aluminium has often been replaced by copper whose conductivity is greater. With this change, the propagation time due to interconnects has decreased. Simultaneously with this replacement of aluminium by copper, insulators with weak permittivity appeared. The advantage of using these insulators is to reduce crosstalk phenomena and wire-tosubstrate capacitances. Despite these technological modifications, interconnects are known to be a bottleneck now. Thus, it is essential to take interconnect power consumption and delay into account during the first design stages of a system.
In this paper, we propose, after the presentation of a power consumption model of buses, a new estimation tool that allows the user to obtain a number of different results about the power consumption of the interconnect networks. Furthermore, we suggest a new transition classification from the power consumption point of views which is different from the classical one defined from a propagation time point of view. Then, based on this new classification and on other statistical metrics, we suggest new ways to optimize the interconnect performance (delay and power consumption).
This paper is organized as follows. Section 2 presents physical parameters and power performance models for wire and bus. Crosstalk effects are carefully presented and modelled. The estimated flow and our estimation tool is introduced in Section 3. In Section 4, the tool is used to analyze the criteria used by performance optimization High-Level Interconnect Delay and Power Estimation Courtay et al. techniques; some new ways for optimization are also discussed. The last section concludes this paper.
CHARACTERIZATION FLOW: FROM WIRE TO BUS
The first step for interconnect modelling is to represent the interconnect behaviour as realistically as possible. In order to obtain the highest precision in time and power consumption, experiments must be carried out at the physical level. Therefore we decided to model interconnects at the transistor level using a SPICE simulator. Firstly, this section introduces physical modelling from wire to the complete bus system with all parasitic phenomena.
The Wire
Physical parameters which allow wires to be modelled are: Their variation depends on the wire characteristics (metal composition, wiring level) as well as dimensions. Inductance has an impact only for very deep submicron technologies (below 45 nm) and for extremely long wires. 5 Furthermore, it is an important phenomenon when peaks of current or large voltage variations occur (typically the clock tree and the power lines). The simulations presented in this paper satisfy the conditions proposed in Ref. [6] that define the range of the interconnect length in which inductance effects are not significant. Therefore, inductance is not taken into account in our model dedicated to interconnect buses. Thus, considering only an RC model for the wire gives accurate results for technologies and bus length used in this paper.
It is possible to characterize the wire with elementary parameters which can be found in manufacturer's Design Kits. These parameters are: Using these three parameters, it is possible to compute the wire resistance and capacitance from its dimensions (see Fig. 1 ) where the length (L) and the width (W ) are expressed in meters [m] . Note that C sq and C e depend on the height (H) between the wire and the substrate and thus depend on the level used for this interconnect. C s represents the total wire-to-substrate capacitance as expressed in Eq. (4) and C c the crosstalk capacitance as expressed in Eq. (5) according to physical dimensions of the wire: length L, width W , thickness T and height H .
The total resistance of the wire is given by the following equation:
Actually, the total capacitance of the wire is the sum of two capacitances: the total wire underside to substrate capacitance C pp (parallel-plate capacitance), and the total wire edges to substrate capacitance C f (fringing capacitance):
The total wire-to-substrate capacitance can be expressed by:
Now, we consider the distribution of R and C on the wire in order to model its behaviour as accurately as possible. The lumped model is a simple interconnect model which consists in putting end-to-end the values of R and C found previously. However, its precision is much less reliable in terms of propagation time than a model, where R and C are distributed. In this manner, R and C values can be split indefinitely. For the 3 model, which consists in splitting the wire resistance into three and the wire capacitance into four, the values obtained in terms of time are close to the experimental ones. We have retained the 3 model for our experiments because of its simplicity and its precision (estimation error of time less than 5%). 7 
The Bus
Consider an n bit bus, which consists of n parallel wires of the same length and at least 2n buffers (n input buffers, n output buffers and maybe others, if bufferization is used), which allows data propagation between two cells. Using several wires gives rise to a new capacitive coupling between the wires. The coupling capacitance between two adjacent wires, known as the crosstalk capacitance C c , depends on the facing areas; so it depends on the following dimensions: wire thickness (T ), wire length (L) and wire spacing (S).
where 0 represents SiO 2 permittivity.
When transitions occur on adjacent wires, there is a generation of an unwanted noise due to the coupling capacitance. The noise due to the crosstalk is relatively localized. In general, a system with crosstalk is modelled by neglecting higher-order effects on non-adjacent wires. Thus, we only consider the effect on three wires as represented in Figure 1 . The coupling capacitance between wires is also distributed on the nodes of the distributed 3 RC model defined previously, as represented in Figure 2 . We explain in more detail the crosstalk phenomenon and associated effects below.
The effects due to crosstalk can be summarized into three categories.
• The first one is that crosstalk induces noise; indeed the coupling capacitance between adjacent wires introduces a permanent link between them. When a transition occurs on a wire (aggressor), its neighbours (victims) are affected, because a voltage peak is generated on them. 8 There are two categories of coupling: positive and negative which represent the situations when the amplitude of the noise exceeds a positive or a negative voltage value on the victim respectively. The noise peaks above GND or under V dd are the most tedious, because they can cause errors, if their values are greater than the buffer threshold voltage at the end of the bus (cf. Fig. 3(a) ). With technology shrinking, noise due to crosstalk increases compared to the overall noise, since the coupling capacitance rises as well. As a result, the voltage peak generated by the coupling capacitance becomes more and more important, compared to the voltage swing on the bus.
• A second issue is the increase in propagation time. When the victim and its aggressor(s) are switched simultaneously, a voltage peak is generated. This peak can, according to the configuration of transitions, slightly accelerate (in the case of simultaneous transitions in the same direction) or slow down (in the case of simultaneous transitions in opposite side) the propagation on the victim wire (cf. Fig. 3(b) ). A transition classification has been carried out according to the propagation time on the victim: this classification is presented in Table I where g represents the delay factor and r the ratio of the crosstalk capacitance compared to the wire capacitance to substrate. Here, ↑ represents a rising transition, ↓ represents a falling one, and -means that there is no transition on the wire. In the best case, when wires are switching in the same direction, the delay is that without crosstalk (i.e., for g = 1). However, data transmissions on the bus must be clocked while taking the worst-case propagation time into account (i.e., g = 1 + 4r). Considering a real world case, C c = C s , the propagation time can be increased fivefold or more. 7 • Finally, the last issue is the increase in power consumption. Indeed, the power consumption depends linearly on the capacitance presented by a device. Since the wire capacitance C eff depends on the crosstalk capacitance value (cf. Table I), the crosstalk contributes to the increase in the dynamic power consumption. 9 The last parameters that have to be defined for bus modelling are the resistance as well as the input and output capacitances of the buffers involved in the bus. The number of buffers depends on the bus length and on the buffering technique used. These parameters can be easily found using transistor dimensions and parasitic parameters Table I . Effective capacitance C eff and delay factor g of the victim wire and corresponding transition patterns. g represents the delay factor and r the ratio of crosstalk capacitance to wire-to-substrate capacitance. Here, ↑ represents a rising transition, ↓ represents a falling one and -means that there is no transition on the wire. Transition patterns are represented as follows: (transition on the aggressor wire 1, transition on the victim wire, transition on the aggressor wire 2).
C eff
Transition patterns g
provided in technology libraries and using the formulas described in Ref. [7] . Previously, we have seen that the changes in the resistive and capacitive parameters introduce delays in the data propagation time. Thus, when this delay becomes too critical, especially for long interconnects, designers have to use buffer insertion methods to accelerate the data propagation. Much work has already been done on buffer insertion for interconnects in Refs. [10] [11] [12] . These are based on the formula of the propagation time shown in Ref. [13] . Their aim is to find an optimal value in terms of buffer number (K opt ) and strength (H opt ) in order to obtain the best temporal performance.
Knowing the physical parameters of the wires and the bus, the propagation time and power consumption can be modelled. The first step of the modelling process is to identify which parameters impact significantly on delay and power consumption.
• The first parameter is the technology used and its associated number of metal layers. Each metal layer has its own physical characteristics (dimensions) and usage: the lowest thinnest layers are used for local interconnections and the thickest highest and widest layers are used for global interconnections and power distribution.
2
• The second parameter is the metal layer used in the considered technology. It is well known that wire resistance and capacitance vary with the metal layer, since dimensions (thickness, spacing, height and width) differ with the layer used.
• The third parameter is the wire length since this parameter impacts on capacitance and resistance. When interconnects are quite long or when time is critical, it is necessary to insert repeaters along wires, thus, both repeated and non repeated lines have to be modelled. Wire length impacts driver, repeater and termination buffer sizes which are included in our model.
• Since crosstalk capacitances have effects on power consumption and propagation time, as shown in Table I , the different kinds of transitions are also parameters to be modelled. As previously noted and shown in Figure 2 , the crosstalk capacitances are spread out along the wire using the 3 model. Using these parameters, power consumption and delay modelling can be realized at the circuit level using SPICE simulations (we used ELDO v5.7 in this paper). These simulations have been done for three different technologies (130, 90 and 65 nm). The results obtained with SPICE, in terms of time and energy consumption, have been summarized in multi-input tables for various previously mentioned parameters. These tables are used by the high-level estimation tool that will be presented in the next section.
ESTIMATION FLOW

Estimation Flow Presentation
A tool, called Interconnect Explorer, has been developed for high-level estimation of interconnect performances. This tool is based on energy and timing multi-input tables. These tables depend on input parameters (technology, metal layer, wire length, buffer and repeater size, transition type) and their values are obtained with transistor-level characterization. The estimation flow used by Interconnect Explorer is explained in Figure 4 and detailed below.
When using Interconnect Explorer, users have to choose their bus configuration by setting the following parameters in the tool configuration window (Fig. 5) : technology, metal layer, bus length, bus width, frequency, and bufferization type. Users have also to provide an input file which contains the data that the bus is handling.
Some additional plug-ins have been included in this tool to compute commutation rate per bit on the bus as well as the probability of the appearance of each transition class (defined in Table I ). Commutation rates per bit are obtained by using the data input file. We compute the activity on each wire from the ratio of the number of transitions on the wire to the total number of data.
Similarly, the probability of appearance of each transition class is obtained by computing the ratio of the number of occurrences of each transition class to the total number of transition occurrences.
After configuration, Interconnect Explorer provides users with, in the output window (Fig. 6) , results in terms of:
• energy consumption, • static power consumption, • average dynamic power consumption,
• maximum dynamic power consumption, • instantaneous dynamic power consumption,
• maximum frequency allowed on the bus (determined by worst case transition), • area on the bus (area for wires and buffers), • commutation rate per bit (useful to evaluate performance optimization techniques), and • percentage of appearance of each transition class of Table I (the same remark as above).
In the next paragraph, Interconnect Explorer is used to check the transition table classification presented in Table I .
Transition Table Validity
In the state of the art on performance optimizations for interconnects, most of the proposed techniques eliminate the most tedious transition classes of Table I (i.e., 1 + 3r and 1 + 4r). For instance, we can cite the shielding, [14] [15] skewing, 16 and temporal coding 17 techniques. As these techniques assume that the transitions that are the most tedious for delay are the same for power consumption, we checked these results using Interconnect Explorer.
Experiments have been carried out using our tool on metal layers reserved for buses with the length of 1 mm in the 65 nm technology (note that the results are the same for other technologies and metal layers). Table II shows the propagation time and the power consumption for different transition patterns of bus data. Results are given for the simple bufferization case of Figure 7 (a). The results of Table II (transitions are classified from the weakest to the strongest value) show that the temporal transition classification, according to the importance of the capacitance seen by the victim wire, is the same as the one presented in Table I . Secondly, it is important to note that the transition classification, from a power consumption point of view, is not similar to the delay classification. In the right part of Table II, on power consumption two parts can be seen:
• In the upper part of the table, transitions are exclusively rising and are ordered from the lowest to the strongest capacitance. It is very important to note that the power consumption used by rising transitions is always similar, whereas for the falling ones it increases with the growth of the capacitance.
To understand why falling transitions consume more power than the rising ones, it is necessary to know when the current is coming from the power supply. Two cases can be considered: Interconnect lines can be simply bufferized (one input buffer and the other at the end of the line, Fig. 7(a) ) or fully bufferized (Fig. 7(b) ) according to the desired performance. Depending on the transition type (rising or falling), the line capacitance (or line segments in a full bufferization case) are charged or not, as illustrated by Figure 7 .
(a) (b) Fig. 7 . Line or segments charged (in bold) according to the bufferization state; (a) is for simple bufferization and (b) is for full bufferization. The high or low part of the inverter symbol is blackened depending on the inverter transistor (PMOS for high and NMOS for low) activated according to the wire transition type.
• For simple bufferization (cf. Fig. 7(a) ), when a rising transition occurs, the NMOS transistor is activated, and thus the line capacitance is not charged through the power supply. In the other case, when a falling transition occurs, it is the PMOS transistor which is activated, and thus the line capacitance is charged by current coming from the power supply.
• For full bufferization (cf. Fig. 7(b) ), the number of buffers inserted in the line must be even, so that the signal at the output of the line is the same as at its input. Thus, there is always an additional segment to be charged with falling transitions compared to rising ones. Consequently, falling transitions are more penalizing in terms of power consumption than the rising ones.
Our experiments allow us to conclude that the transitions which are the worst from the point of view of delay and power consumption are not the same, since falling transitions consume more energy than the rising ones (all are classified according to the importance of the capacitance presented by the line).
In the rising case, the power consumption varies from only 5.6% around the average value, and all transitions can be classified in the same category. In fact, the power consumption for rising transitions is due to the shortcut path between the power supply and the ground during output switching. During a falling transition (rising transition at the output of an inverter), charging energy of the wire capacitance extracted from the power supply depends on the transition type, and thus on the wire capacitance value. This accumulated energy is then released to the ground during the next rising transition.
A key point for future power optimization could be to encode data such as falling transitions on the bus are achieved with the lowest crosstalk capacitance (e.g., transition (↓, ↓, ↓)) and thus consume less energy as possible. Table III shows the consumption transition pattern classification.
With this new transition pattern classification, a more accurate dynamic power consumption model is defined in this section. The energy consumed on an N bit -bit bus is defined by
Table III. Transition patterns classification for energy consumption where j is the transition pattern type and is varying from 0 to 4 and E i j is the energy consumption of a j type transition pattern on wire i.
where: P i j is the probability of having a j type transition pattern on wire i and E i j is the corresponding energy consumption. Note that j varies from 0 to 4 as shown in Table III whereas i varies from 0 to N bit − 1. For a full transition cycle (if there is a rising transition on a wire, there is, of course, a falling one afterwards), the energy consumption E i j can be computed by the following equation:
where E shortcut is the energy consumption due to the shortcut path between the power supply and the ground during output switching for a rising transition and the C L i j · V dd · V swing term represents the energy consumption due to charging the load capacitance and to the shortcut. In this term V dd represents the supply voltage, V swing the switching voltage, and C L i j is the loading capacitance of a j type transition pattern on wire i, which can be computed according to, C L i j = C s + j · C c as shown in Table III with j ∈ 0 4 . The dynamic power consumption can be computed by
where F is the data sending frequency on the bus. By substituting Eq. (6) in Eq. (8), we obtain for P dynamic :
ANALYSIS OF PERFORMANCE OPTIMIZATION TECHNIQUES
In this section, a non-exhaustive state of the art of performance optimization techniques at different abstraction levels is discussed. Then, the performance of these techniques on key parameters having effect on timing and consumption are analyzed by using Interconnect Explorer.
Technological Level
Wire Sizing
Since it is known that wire capacitances depend on physical dimensions, the first optimization method to be considered here consists in changing the wire dimensions:
• Height (H ) and width (W ) to reduce the wire capacitance to substrate (C s );
• Thickness (T ) and spacing (S) to reduce the crosstalk capacitance (C c ).
High-Level Interconnect Delay and Power Estimation
Courtay et al.
The method suggested in Ref. [18] consists in modifying the spacing between wires in a non-uniform way; i.e., a wire i and its neighbour i + 1 are separated by a space of S 1 and the wire i + 1 and its neighbour i + 2 are separated by a space of S 2 , and so on. Results show a reduction of the bus power consumption up to 30% as well as a decrease in propagation time, since crosstalk capacitance decreases. Unfortunately, this is achieved at the cost of extra bus area, since the spacing between wires has to be increased. Design rules are also modified.
Spatial Shielding
Shielding consists in inserting additional wires between bus wires. These additional wires can have logical levels which are fixed (static shielding) either change with transmitted data (dynamic shielding).
The first type of static shielding consists in inserting between each signal of the bus a wire connected to the ground or to the supply voltage, which makes the elimination of all crosstransitions (such as type 1 + 3r and 1 + 4r) possible. So, there is a strong acceleration of the data transmission, since only 1 + 2r transitions remain. On the other hand, all transitions of type 1 and 1 + r are eliminated. Indeed, each victim wire which carries out a transition, has two neighbour wires whose level does not change; this causes that less consuming transitions must now be considered along with more consuming transitions. The data activity remains unchanged.
An evolution of this technique can be found in Ref. [14] , where the shielding technique consists in having an alternative shielding to the ground and to the supply voltage. Performances in terms of speed and power consumption are the same as with the previous technique. One advantage of this technique is, besides shielding, is interleaving of power supply lines according to the following pattern GND, S 1 , V dd , S 2 throughout the chip (where S i is the signal on the ith wire).
In Ref. [15] , the selected shielding technique is as follows: the shield wire has the logical level of the logic AND of its two neighbours. Since the shield wire level moves with each data transmission, this shielding is called dynamic shielding.
Another very simple method of shielding consists in duplicating each signal by transmitting on the shield wire the same signal as its neighbour. 19 The acceleration brought about by this technique is greater than the one presented in Ref. [14] , because the case where the two aggressors are stable is eliminated.
In conclusion, the main advantage of shielding techniques is that they considerably increase data transmission on the bus, since the worst cases of Table I are eliminated. On the other hand, they are not efficient in terms of area, since the number of wires doubles and they have a limited impact on energy.
Logical/Circuit Level
Signal Skewing
The solution presented in Ref. [16] consists in intentionally shifting the signals to avoid having simultaneous transitions on two neighbour wires. Even and odd wires are shifted temporally, thus a wire (that is even or odd) changes when its two neighbours are stable. In this manner, the worst case of the transition will be 1 + 2r. The acceleration brought about by this technique is very limited due to the fact that the data transmission is slowed down. Indeed, some latency is added between the transmission of the even and odd bits. Simulation results indicate acceleration from 5 to 20%. This technique is based only on simulations and needs a complex design for the transmitter and receiver clock; moreover there is no implementation proposed.
Dynamic Voltage Scaling
The authors of Ref. [20] propose to use different values of the supply voltage for the buffers in order to limit voltage excursions on the lines. The principle of the method is dynamic adaptation of the supply voltage (Dynamic Voltage Scaling: DVS) of these repeaters according to the operation frequency imposed on them. Simulation results show an average reduction of 4.6 times of the dynamic power together accompanied by a 15.2% latency reduction. This technique implies the addition of several analogical control blocks with the aim of controlling the voltage switching.
Architectural Level
The majority of the optimization techniques have been proposed at the architectural level. The n wires of the bus are coded into m bits with m ≥ n such that the coded data activity is lower than the original one. The various techniques of data coding are either optimized for the address or the data bus.
Gray Code (Address Bus)
The idea of this coding technique presented in Refs. [21, 22] is to have only one transition on the bus for each pair of consecutively accessed addresses. This coding is called Gray code or the reflected binary code. The experiments carried out in Ref. [22] claim a 33% activity reduction and a 77% power consumption reduction on the bus. But, for wide buses, the decoder has a long critical path, because gates are cascaded from MSB to LSB.
T0 Code (Address Bus)
The suggested idea in Ref. [23] is to add a wire called INC that will be switched to 1 when the accessed addresses are consecutive. This technique reduces the activity to 0 when the accessed addresses are consecutive, which greatly reduces consumption on the bus. Unfortunately, the coder and the decoder are rather complex (register files, adders, multiplexers ) and their power consumption is greater than the power reduction on the bus with reasonable lengths. An evolution of this technique can be found in Ref. [24] , where several incremental steps are defined for consecutive accesses.
Bus Invert/Partial Bus Invert (Data Bus)
The idea of the coding technique presented in Ref. [25] is to compare the number of bits changing between data n − 1 at clock cycle t − 1 and data n at clock cycle t. If this difference (the Hamming distance) is greater than half of the bus width, then the data n sent at cycle t is inverted. It is necessary to send to the decoder an additional line called INV to invert or not the data received. This technique, called the Bus Invert, is efficient for large buses, where there is a lot of data activity. Indeed, if the Bus Invert is applied to the whole bus, its use will likely be less frequent because the MSBs are often correlated. Therefore in Ref. [26] , the Bus Invert is only used for the part of the bus which has the greatest activity; this coding is called the Partial Bus Invert.
Code Book (Data Bus)
The aim of the Code Book coding technique presented in Ref. [27] is to store i old values transmitted on the bus, and to transmit at the current cycle the value which has the least Hamming distance w.r.t. those transmitted at the i previous cycles. It is then also necessary to send the code to the decoder on 2 i additional wires. The results show a decrease in bus consumption, but only for extremely long wires (more than 7.5 cm), which are not found in SOCs nowadays.
Temporal Shielding (Data Bus)
The temporal shielding presented in Ref. [17] consists in forcing the bus down to 0 between each transmitted data. In this manner, there is no cross transition left, so that the worst case is 1 + 2r. On the other hand, the power consumption increases, because it is necessary to transmit twice as much data as without coding. Moreover, it requires twice as high the frequency compared to the case without coding, as two data bits are transmitted for only one useful one. Forcing the bus to 0 between each transmitted data introduces undesired transitions; therefore, the activity and thus the power consumption are increased.
In Ref. [17] , the authors propose another temporal coding called Code 1 which aims at reducing the data activity of the previous Code 0 technique. A two bit sequence Table IV . Correspondence between original blocks and coding schemes.
Code1
Code 2 Coded block  1st coded block  2nd coded block   00  0000  0001  0000  01  0001  0011  1000  10  0011  0111  1100  11  0111  1111  1110 on a wire is coded into a four bit sequence, as shown in Table IV . The worst case transition of Code 1 will be 1+2r transitions. The results show a 6.7% power consumption reduction on the bus for a 1 mm wire, but the coded data activity of Code 1 is still slightly higher than the original data activity.
Original block
To avoid the activity increased by the coding, the authors propose the final evolution of Code 1, called Code 2. For the same sequence of two bits, Code 2 introduces the coding of two consecutive blocks of four bits as shown in Table IV .
During the consecutive transmission of two sequences of two bits, the first sequence of coding and then the second are transmitted alternately. The coded data activity is then the same as the original one with the worst case transition of 1 + 2r. The bus power consumption reduction is better than for both preceding techniques (up to 18.7% power consumption reduction on the bus). Unfortunately, the authors of these temporal codes provide no data about the codec power consumption. Table V summarizes the advantages and drawbacks of the techniques presented in this Section. The power/energy consumption, activity and timing are numerically quantified when they are presented in the referenced papers. If only a tendency is shown, it is represented by arrows; otherwise it is represented by a question mark. In the next Section, our tool is used to quickly evaluate the impact on timing and power consumption of some of the techniques presented in Table V .
Optimization Techniques Analysis by Interconnect Explorer
To evaluate quickly which techniques are the best, Interconnect Explorer has been used on some of the previously presented techniques for data buses. The results have been obtained using stimuli data files (picture, music, speech) for the 65 nm technology, on two bus metal layers (metal layer 2 and 4) of length of 1 mm. Table VI shows the estimation results on activity, the worst case capacitance, bus area (excluding codec area), propagation time, and energy consumption on the bus. Here, Partial Bus Invert is Bus Invert applied to the last three least significant bits. As shown in Table VI , all presented techniques always imply reduction of the propagation time. On the other hand, the best activity and energy bus consumption reduction is obtained for Partial Bus Invert (11.4%). Therefore, an optimization could be to split the bus and apply coding techniques where consumption is the highest. This key issue for future optimization is discussed in the next subsection.
Where Does Power Consumption Come From?
Bus power consumption is essentially dynamic and mainly depends on the capacitance of the wire and on its data activity. The top of Figure 8 presents the activity of each wire of a bus for image data. The power consumption of the least significant bits is given compared to the overall power consumption. As an example, the four least significant bits consume 84% of the total power for image data. Finally, the percentage of time that each transition class appears is given at the bottom of Figure 8 . It can be noticed that, for non-random data, power consumption is primarily located on the least significant bits (more than 50% power consumption for the last three bits). 28 So, optimization techniques, which try to reduce power consumption on all bits, will have very limited effect on most significant bits, since their activity is weak. It is important to note that other data flows such as pictures, music, speech, exhibit the same behaviour. Figure 9 illustrates, for Bus Invert and Code 2, the energy bus ratio of coded and non-coded data as a function of the number of bits on which the techniques are applied. For our example, the optimal case is when the technique is applied on 9 bits.
Another interesting result is that techniques which aim to remove the worst cases of Table I (i.e., 1 + 3r and/or 1 + 4r), remove only a negligible part of the total transitions. For example, 1 + 4r transition type appears only 1% of time. Eliminating this worst case transition will therefore have a negligible impact on the global power consumption. Therefore removing these transitions will decrease bus propagation time, but will not systematically decrease power consumption. Indeed, 1 + 3r and 1 + 4r transitions are replaced by other coded transitions that can consume more power (for instance a rising transition can be transformed into a falling one).
Moreover, we noticed that, while presenting the results of power consumption optimization techniques, the authors do not always take into account the extra power consumption introduced by the codec. Most of the proposed techniques have a considerable hardware overhead (register file, adders, multiplexers and so on) which involves extra power consumption to carry out the data coding. To be efficient, the power consumption overhead due to the codec should remain lower than the gain generated by these optimization methods.
In Ref. [29] , the authors show that, to be efficient, techniques must be used on buses with extremely long wires, which is in contrast to wire lengths that can be found in SOC.
CONCLUSION
This paper has first presented the physical parameters that are important for wires and bus modelling such as resistance and capacitance. Crosstalk effects have been discussed and the analysis of their impacts on delay and power consumption has shown that they must be taken into account to obtain accurate models at the bus level. We have presented our delay and power consumption modelling methodology supported by our estimation tool (Interconnect Explorer) developed. Our preliminary results show that the classical transition pattern classification has to be used carefully in the case of power consumption optimization.
The state of the art on delay and power optimization techniques has shown that techniques can optimize either time or power on the bus or both with a small gain for the best ones. Our tool have allowed us to compare various techniques proposed by other authors and to underline that these techniques should be applied where activity is the strongest on buses (i.e., on the least significant bits), as well as that eliminating worst case transitions has a negligible impact on the global power consumption.
To conclude, our future works on performance (time and power consumption) optimization techniques will be focused on the four key following issues:
• Do not only focus on 1 + 3r and 1 + 4r transitions since they are not dominant in the total transition number.
• Focus on the lines with the largest data activity (i.e., LSB) because these are the more consuming lines.
• Try to avoid most consuming falling transitions as much as possible: a key point for power optimization can be to encode data such as falling transitions on the bus are achieved with the lowest crosstalk capacitance and thus consume less energy as possible.
• Try to have a codec power overhead as weak as possible and therefore focus on very simple techniques.
