Abstract-This paper presents models for estimating the transition activity of signals at the output of adders in Field Programmable Gate Arrays (FPGAs), given only word-level measures of the correlation and variance of the input signals to these components. This will allow the power consumed in the output wires of these components to be estimated from a high-level description before RTL-synthesis, without resorting to time-consuming low-level simulation.
I. INTRODUCTION
To perform early dynamic power analysis given only highlevel system information (e.g. a block diagram), macromodeling techniques such as [3] , [4] , [5] can be used to estimate the power consumed in the logic blocks of arithmetic components such as adders and multipliers. It is shown in [5] that macro-models can also be used to estimate the power consumed by the intra-component routing wires that connect configurable logic blocks within a component.
However power consumed by inter-component routing wires which connect components together cannot be estimated using macro-models because i) the placement of components and hence the capacitance of inter-routing wires is determined by physical placement decisions made after synthesis, and ii) their transition activity may not be known without performing low-level simulation. In [6] it was shown that the capacitance of these wires can be estimated using topological information on a circuit available at a high-level.
This leaves the problem of estimating the transition activity in inter-routing wires without performing low-level simulation. For components whose outputs are registered, activities can be obtained through high-level simulation of a system, but for those that are un-registered extra transitions may appear at their outputs due to glitches created within the component or propagated to it from other components. The number of glitches that occur cannot currently be estimated from a highlevel but instead can only be known by performing low-level simulation or, if possible, taking very detailed device-level measurements.
To enable activity estimates to be made without performing low-level simulation we have developed a model that allows the bit-level activity at the output of an adder to be calculated from a closed-form expression, using only word-level information about the input signals to the adder that can be obtained from high-level simulation.
The main contributions of this paper are as follows:
• Application of the Transition Density model to the specific structure of adders in Xilinx FPGAs, in Section II-A.
• Combining the resulting model with knowledge of the bit-level activity of typical signals in DSP systems in order to allow a closed-form expression for activity to be developed, in Section II-B. In Section III, the characterization of the model using device level measurements is described, and finally in Section IV activity estimates made by the model are compared to those given by low-level simulation in order to verify the model's behaviour and accuracy.
II. MODEL CONSTRUCTION

A. Transition density model for Xilinx Adders
The transition density method for propagating activity estimates through a circuit was proposed in [1] and has been used in a variety of situations, including in FPGA applications [7] , to provide activity estimates within a circuit. Although it allows activity to be estimated without performing a timeconsuming low-level simulation of a circuit, the process of propagating transition density estimates through a circuit can still be computationally expensive for large circuits. Additionally, because delays within a block of logic are not known to the transition density model, activity estimates made can have potentially large errors due to the inability to predict the effects of these internal delays.
The transition density model estimates the expected number of transitions at the output of a logic block during a clock cycle. Given a logic function f (x 1 , x 2 , ..., x n ) of the boolean inputs x 1 , x 2 , ..., x n , the output of f will depend on input x i when the boolean difference of f with respect to x i , i.e. ∂f ∂xi , is equal to 1. The boolean difference of f with respect to x i is defined as: Thus we can estimate the number of transitions at the output of the logic block f due to activity on the input x i by calculating the probability that the boolean difference ∂f ∂xi is satisfied, and multiplying this by the expected number of transitions on input x i , denoted D(x i ). The total activity D(f ) at the output of logic block f is the sum of the activities contributed by each input of f [1] :
Thus applying the transition density model to a circuit involves propagating both signal probabilities (to allow boolean differences to be calculated) and transition density estimates through the logic blocks of the circuit. The construction of a 1-bit adder on Xilinx FPGAs is shown in Figure 1 . Modern FPGAs contain additional circuitry to allow fast-carry chains to be formed, as well as the LookUp Tables (LUTs) used to implement general logic functions. When applying the transition density model to the adder circuit in Figure 1 , we treat the LUT, carry-chain MUX and sum XOR gate separately, resulting in a model where the signals P and B are uncorrelated upon arrival at the carry-chain MUX, due to the different delays along each wire for these two signals.
Equations (3) and (4) show the transition density equations for the carry-out (C n ) and sum (S n ) outputs of full adder n in a carry chain, in terms of the adder inputs A and B, the propagate signal P , and the carry-in signal C n−1 . As expected the activity in each 1-bit adder is dependent on the activity propagated to it from the previous stage in the carry chain.
Estimating activity in an adder circuit using this model would require separate calculations to be made for the carryout and sum signals of each bit of an adder, making this method less attractive due to its computational cost which is Θ(n), where n is the number of 1-bit adders in a circuit. Also note that the calculation of P (A ⊕ C n−1 ) becomes complex to evaluate when there is non-zero correlation between A and C n−1 (i.e. due to correlation between the A input in this stage and those in earlier stages in the carry chain).
As will be seen in the following section however, we can remove the need to make complex signal correlation calculations for P (A ⊕ C n−1 ), by making some assumptions about the profile of the activity of the (word-level) input signals to each adder.
B. Incorporation of signal profile information
In the DBT model [2] the authors found that the activity of typical two's complement signals in DSP systems could be approximated by the activity of Gaussian signals with zero mean, a particular standard deviation σ, and lag-1 autocorrelation ρ. The authors showed that the bit-level activity of this set of Gaussian signals could be broken up into three regions where the bits within each region exhibit similar behaviour. The three regions are, in order from LSB to MSB: 1) spatially and temporally uncorrelated LSB bits, 2) a region with increasing spatial correlation towards the MSB, and, 3) the sign bit(s) at the MSB end of the signal.
As described in the DBT model, Region 1 can be approximated by uncorrelated bits each having an activity rate of 0.5, Region 3 can be approximated by fully spatially-correlated bits with activity T msb , and Region 2 can be approximated by linear interpolation between Regions 1 and 3. We assume that the input signals to our adder have profiles as defined in the DBT model, in order to greatly simplify our Transition Density model of a Xilinx adder, as follows.
For those bits of the adder where both inputs A and B are in Region 1, we know that all input bits are uncorrelated and have an activity of 0.5, thus A and C n−1 are uncorrelated and P (A) = D(A) = 0.5 so:
Similarly A and B are uncorrelated and P (B) = D(B) = 0.5, so P (A ⊕ B) = 0.5. As a result the carry-out and sum transition density equations simplify to:
For those bits of the adder where both inputs A and B are in Region 3, we know that all the A inputs are identical and have activity T A msb , and all B inputs are identical and have activity T B msb . Both T A msb and T B msb are calculated from the lag-1 autocorrelation of signals A and B, i.e. ρ A and ρ B , respectively, as shown in [8] . If the word-level crosscorrelation between the signals A and B is ρ ab , then the probability that these MSB (i.e. sign) bits of A and B are of different value during a clock cycle, P (A ⊕ B), can be calculated in the same way as the activity of the sign bits of a Gaussian signal with lag-1 autocorrelation ρ, as shown in [8] . So P (A ⊕ B) is given by:
We assume there is no correlation between the inputs to this region and the inputs to earlier regions, and that as a result the A and B inputs in Region 3 are uncorrelated with the carryin signal C 1 into the first full-adder in this region. We also assume that P (C 1 ) = 0.5 as P (C n−1 ) increases to this value in the part of the adder where both A and B are in region 1.
For each 1-bit adder we know that:
where A , B and C n−2 represent the adder inputs and carryin input from the previous adder stage in the carry-chain. However as both the A and B word-level inputs are in Region 3 in this part of the adder we know A = A and B = B, so:
Where C n−2 can be replaced recursively using (9) until we get P (ĀBC n−1 ) = P (ĀBC 1 ), and similarly we obtain P (AC n−1 ) = P (ABC 1 ). So in the section of the adder when both the A and B inputs are in Region 3 of the DBT model:
So for this section of the adder:
Both (6) and (12) are recurrence relations, and by solving these we can obtain closed-form expressions for the transition density at the carry-out of the nth bit in the section of the adder where both inputs are in Region 1, and for the carry-out in the nth bit in the section where both inputs are in Region 3. These closed-form expressions are defined as follows. Given the following word-level parameters of the input signals A and B to the adder: the lag-1 autocorrelations ρ a , ρ b , standard deviations σ a , σ b , and lag-0 cross-correlation ρ ab , the breakpoints between regions 1 and 2 (A BP0 ) and regions 2 and 3 (A BP1 ) in input signal A are defined by [2] :
and B BP0 and B BP1 are defined similarly for input signal B.
Within the adder the upper bound of the Region 1 model in (6-7) and lower bound of the Region 3 model in (12-13) are then given by (16) and (17) respectively.
The activities in the carry chain in regions 1 and 3 of the adder are calculated as follows. For bits n = 1..M IN BP0 , where both A and B are in Region 1:
For bits n = MAX BP1 ..W , where both input signals are in Region 3, and W is the total input word-length of the adder:
We assume that for Region 3, C 1 = 1.5, as from (18) we can see that the activity in the carry chain in Region 1 increases towards this value. By using (18) with (7) and (19) with (13) we can also build equations for the transition density at the sum output of the nth bit for Regions 1 and 3 of the adder, whilst the activities for sum output bits in between these regions can be obtained by interpolating between the last and first sum activities for regions 1 and 3 respectively.
A typical output activity profile estimated using our model is shown in Figure 2 , along with the input activity profiles used. We can see that the model predicts up to almost 2.5 output transitions for some output bits, due to the combined contribution of the input activities and the carry chain. This is likely to be much higher than actually occurs in an adder on a Xilinx device, as inertial delays within the adder will merge near-simultaneous transitions on the A and B signals, and within the carry chain. As a result we took various device level measurements to attempt to quantify the difference between this model and the true behaviour of a device, and used these to characterize the model, as described in the following section.
III. MODEL CHARACTERIZATION
We have available a Virtex 2 Pro device (an XC2VP30-FF896-7) on a Xilinx University Program board from Digilent. This board is well suited to power consumption measurements as it allowed us to measure the current drawn from only the 1.5V power supply used by the internal FPGA fabric, and not that used by the I/O pins or the remainder of the board.
To allow us to measure the activity at the output of an adder, rather than other power being consumed by the FPGA fabric, we used a simple test circuit containing an adder driven by signals from on-chip memory. Each output of the adder is connected to a chain of 50 inverters in order to maximize the power consumed due to the transition activity at its output.
Using the Xilinx FPGA Editor program on the circuit after place and route we disconnected and removed certain parts of the circuit in order to create several modified versions of it that enable us to isolate the power consumed due to activity in different parts of the adder and so measure:
• The total output activity for the adder, and, • The total output activity for the adder when its carry chain is removed, i.e. for the XOR of the input signals. By comparing these two activity measurements to the activity predicted by our transition density model when using the test signals provided we were able to quantify the effect of the inertial delay of the adder logic that can cause a lower number of glitches than predicted by the Transition Density model. As a result we have modified the model for the activity at the sum output of the adder in (4) as follows, by adding the coefficients α and β.
The coefficient α accounts for the proportion of transitions that do not propagate through the XOR gate implemented in the LUT of the adder, due to the inertial delay of the logic, whilst the coefficient β accounts for the proportion of transitions that do not propagate through the carry-chain due to its inertial delay. The two activity values taken from the board-level measurements are used to calculate α and β. Figure 2 shows the activity predicted at the output of a 32-bit adder by both the uncharacterized and characterized transition density models, given the input activity profiles shown, and also shows the output activity estimated by the low-level simulation tool XPower. The uncharacterized model shows significant over-estimation compared to XPower due to it not accounting for the inertial delay of both the input XOR gate and the carry-chain. However the characterized model corresponds well with what is predicted by XPower and should provide accurate bit-level estimates of adder output activity.
IV. RESULTS To ensure our model is able to cope with a wide variety of input signals, we compared the activity estimates made by it to those made by XPower for a 32-bit adder driven by a range of Gaussian input signals with randomly selected autocorrelation and standard deviation. The XPower tool estimates power in an FPGA by using low-level simulation to estimate the activity of elements within a circuit. Input signals 20000 samples long were used in order to ensure the adder was properly exercised during simulation. The total activities estimated by both our method and XPower are shown in Figure 3 , and exhibit a high correlation with Mean Relative Error of only 2.1% for the 500 pairs of input signals used.
In terms of computation time, for a 32-bit adder our activity model takes the time required to evaluate a closed form equation, whilst estimation using low-level simulation and XPower   16  17  18  19  20  21  22  23  24  16   17   18   19   20   21   22   23   24 Total activity estimated by XPower Total activity estimated by model Fig. 3 . The total output activity estimated by XPower compared to the total output activity estimated by our model for a 32 bit adder, using 500 different pairs of input signals. The dashed lines show ±10% relative error margins.
for vectors 20000 samples long takes 197.7 seconds. Hence our method is several orders of magnitude faster than XPower, with a low enough computational cost to be used within the inner loop of high-level power consumption optimization.
V. CONCLUSION
This paper has presented a transition density-based model to calculate the activity at the output of adders in Xilinx FPGAs. We simplified the initial model by using knowledge from the DBT signal activity model, and characterized the resulting model using device-level measurements. This model is needed to allow fast estimation of the power consumed in routing wires at the output of unregistered adders in a system, and estimates made will be used during high-level design exploration for power-aware synthesis.
