Abstract
Introduction
Reconfigurable devices [l-51, such as CPLDs and FPGAs, are parallel and general hardware platforms. For embedded systems produced in limited quantities, FPGAs are much more flexible and cost-effective alternatives to application-specific integrated circuits (ASICs). More importantly, modern applications, such as multimedia and wireless communication, have different objectives under different operating modes. A multi-mode system architecture can be supported by FPGAs in a natural way with the help of dynamic reconfiguration.
With the success of battery-based personal computing devices and wireless communication systems, low power has become a key issue in embedded system design. Although its flexibility makes FPGA a good solution for portable applications, the power consumption problem cannot be neglected. First, less efficient utilization of available resources makes an FPGA less power-efficient than an ASIC. Second, since routing resources in an FGPA include many switches, the interconnect load capacitance of an FPGA is 1OX to lOOX compared to the 'Acknowledgements: This work was supported by DARPA under contract no. DAAB07-00-C-L5 16. capacitance in an ASIC [12] . In order to effectively use FPGAs in low power systems, efficient FPGA power estimation methods are needed. However, the problem of FPGA power estimation is not that well-studied.
The rest of this paper is organized as follows. In Section 2, we discuss previous works on FPGA and ASIC power estimation. In Section 3, we introduce our highlevel power model. In Section 4, we discuss high-level FPGA power modeling in detail. In Section 5, we present an adaptive regression-based high-level power modeling technique. We demonstrate the feasibility of our methods experimentally in Section 6. Finally, we conclude in Section 7.
Related Work
Some power estimation techniques for ASICs have been presented in [6-lo] . In [7] , techniques are presented for estimating switching activity and power consumption in register-transfer level (RTL) circuits. A framework for exact and approximate switching activity estimation in a sequential circuit is provided in [SI. A lookup table based ASIC power estimation method is proposed in [9, 10] . To circumvent the problem of an exponentially increasing storage for table lookup, a four-dimensional table is used to model power consumption.
Works in [ 1 1-14] address the power estimation/ optimization problem in FPGAs. A power estimation method, which is based on power modeling of different components in an FPGA, is proposed in [12] . In this approach, the internal structure of the FPGA needs to be fully explored. Due to the periphery effect and estimation error in switching activity and interconnect capacitance, the total power estimation error may be up to 30%. The method in [ 141 characterizes the power dissipation of FPGAs using Manhattan distances between configurable logic blocks (CLBs) to model the interconnect power consumption. The error is usually less than 5% when compared to actual power measurement. However, since an iterative approach is used for updating the values of unknown signal parameters, the iteration process may not converge. Both the above methods need internal FPGA configuration information, and use internal signals to model power consumption. However, since many new macro modules, such as Block RAM [l], embedded system block (ESB) [2] and standard interfaces etc., have been integrated into FPGAs, fully exploring the internal architecture of an FPGA has become very challenging. With these "internal" signal modeling approaches, different modules may need their own description models. Hence, it becomes difficult to construct a single fixed high-level power model template.
Overview of Our Work
modeling approach with the following characteristics.
In this paper, we propose a high-level FPGA power
The model is based on input and output signal statistics to estimate the internal power consumption of FPGAs. Models for differently-configured circuits are based on the same power macromodel template. Thus, the same modeling procedure is applied to different circuits, whether combinational or sequential. An adaptive regression method is used to tackle the problem of biased training sequences. A good tradeoff between accuracy and efficiency is achieved. There are two important concerns in high-level FPGA power modeling. First, high-level power estimators need to be more efficient than gate-level power estimators. The high-level power model we propose can be used in the inner loop of system-level synthesis, putting power estimation in the critical path of synthesis. Thus, estimation efficiency is a must. We use a regression function to estimate the FPGA power consumption. In our approach, spatial correlation information is utilized to partition input signals into different subsets. Statistics for three input signal parameters for each subset and one output signal parameter are used in our regression method to construct the high-level power model. A shortcoming of high-level power estimation methods in general is that a training set of input vectors with given statistical distributions is typically used in developing the power macromodels. Hence, the macromodel is biased towards the training set. If the input trace observed in practice has a different statistical distribution than the training set, the power estimation result may not be accurate. During normal operation, there may be hundreds of different typical input traces for different applications. We cannot use a static off-line method to build a power macromodel for each of these input traces. In our approach, we use adaptive regression methods to address this problem. An adaptive regression method has previously been used for ASICs [19] , but not for FPGAs.
A low power CPLD, called Coolrunner [15] , is used for experimentally validating our technique. Coolrunner is an ideal choice for battery-operated portable devices. Training sequences with different statistical distributions are generated for timing and power simulation. In order to capture the timing information for different configurations, we chose Scirocco [ 161, a real-delay timing simulator. A low-level power simulator [17] is invoked to estimate the power consumption. Since this low-level power simulator captures all the internal switching information for this device, its accuracy is comparable to physical power measurements. Then with all the collected information, we use an adaptive regression method to construct a high-level power model. Although our experiments are based on the Coolrunner, our method is also suitable for other reconfigurable devices, including other CPLDs and FPGAs. There are two reasons for this. First, our method does not need to explore the internal structure of the reconfigurable devices. Only input and output statistical parameters are used to construct the high-level power model. Second, although the detailed internal structures of different reconfigurable devices are different, most reconfigurable devices, including CPLDs and FPGAs, have the same macro structure: embedded logic blocks connected with global and local interconnection resources.
Fig.1: High-level power modeling for FPGAs
The adaptive regression-based high-level power modeling flow is shown in Fig. 1 . First, the macromodel template analyzer is invoked to perform complexity analysis and other preprocessing steps. Then, training sequences with different statistical characteristics are generated. These sequences are used for real-delay timing simulation. With these results, the statistics analyzer is invoked to partition the input signals into subsets, and create a parameter matrix. A low-level power simulator tool is also invoked to estimate FPGA power consumption. The power results and parameter matrix form the inputs of the regression analysis step. Using feedback from the verification model, a static high-level regression power model is constructed. During the power estimation period, the input trace is analyzed to decide whether adaptive regression analysis should be used. If so, the input trace is sampled, and the sampled data are sent to the low-level power estimator, while the static high-level regression power model estimates the FPGA power consumption with the whole input sequence. Adaptive regression analysis is invoked finally to construct the adaptive high-level regression power model.
High-Level Power Modeling
In this section, we give details of the constituent parameters used in our power model.
Partitioning of input signals
In [lo] , similar to our method, four kinds of parameters are used to model the input and output signal statistics of ASICs. However, the method in [lo] ignores the nature of the signals. In practical circuits, different input signals may or may not be correlated. For example, if the circuit inputs include both data and control signals, each may have a very different statistical distribution. Thus, one should avoid using one average statistical parameter to cover both types of signals. In our input signal partitioning approach, we study the spatial correlation behavior between different input signals based on the input traces with different statistical distributions. If the correlation between two input signals is less than a pre-defined correlation threshold, we partition them to different input subsets, as shown in Fig. 2 . Then, we use statistical parameters to model the signals in each subset, as discussed next. 
Input subset signal probability
The FPGA power consumption can vary significantly with the statistical distribution of the input data. Hence, "white noise" input sequence cannot be applied to model the power consumption of FPGAs. In our approach, after spatial correlation partitioning is done using typical input traces, the average signal probability for each input signal subset is calculated and used in high-level power modeling. 
Input subset signal transition density

Output signal transition density
The last parameter is the average output signal transition density. In order to construct a more accurate power macromodel, we use real-delay timing simulation and capture the transition density of the output signals. This allows us to capture output glitching information, which also reflects the internal FPGA glitching behavior.
Regression macromodeling function
Our high-level power model is based on the average input subset signal probability (Pi,,), average input subset signal transition density (Din), average input subset signal spatial correlation (SCin), and average output signal transition density (DOut). First-order, quadratic and cubic regression templates are three possible approaches to model the FPGA power, which have different execution complexity and accuracy. In the simplest model, a firstorder polynomial template is expressed as: where ten) ,cD,", , c, , , c, ,,,, C, are regression coefficients, and n is the total number of input subsets.
In order to achieve a good trade-off between efficiency and accuracy, we design a hybrid model in our approach, which is based on the sensitivity of different parameters. First, we evaluate the sensitivity of FPGA power consumption to the different parameters. A low order term is chosen for parameters to which the FPGA power is less sensitive, while a high order term is used for parameters to which the FPGA power is more sensitive.
The pseudo-code of our algorithm is shown in Fig. 3 . In this algorithm, p denotes the total number of terms and q the total number of first-order terms (e.g., D , , is a firstorder term in Equation (3)). E ,~~~~~~,~ is a predefined error threshold. The error minimization ratio (EMR) is defined as the error minimized per term divided by the error threshold: +error 5 E ,~, ,~~, ,~,~  v terms 2 terrn,,,,,,,,h,,,,,d   " ~~EMR,,,,,,,,, 
Fig. 3: The power macromodeling algorithm
In the first step of the algorithm, a pool of terms is created, and the sensitivity of FPGA power to each term is evaluated dynamically in the following iterations. Then, in Steps 2-5, the initial regression model is constructed and the pertinent error compared to low-level power estimation results is calculated. If the error is larger than E~~~~~~~~~, a heuristic procedure is invoked in Step 6. The priority of terms in the pool is decided dynamically according to their sensitivity (Step 7), and the term (quadratic or cubic) with the highest priority is chosen and included in the macromodel (Steps 8 and 9) . The error and EMR are evaluated (Steps 11' and 13). At the end of each iteration, three criteria are used to make a decision on whether the heuristic procedure should be stopped. The three criteria are: first, the error is less than a predefined error threshold, E~~, .~~, ,~~~, or second, the total number of terms has exceeded a predefined upper bound on the number of terms, or third, the EMR is less than a predefined threshold, EMRlower~oun~. Since different circuits mapped to an FPGA have different power consumption behavior, using our algorithm, we can flexibly construct high-level power estimation models with different complexity for them, achieving a good trade-off between efficiency and accuracy. For our example circuits, a typical high-level power estimation function was found to be:
,,-I <=a POwerFPGA = ('en, .Din, e n , Dm, + 'em, .SC,, e t t , sc,n, + 'en, e n , + 'Lh,DZ", + 'SC,", S ' , , ) , 1 + c, ,2111 D,,, (7) To evaluate the accuracy of the macro-model, we define relative error as: where 4 ' s are the simulated power consumption, and I: ' s are the estimated values.
Adaptive Regression-Based Power Estimation
The method and algorithm described so far are based on static macromodels, which do not change from one design that uses the macro logic block to another. However, such a static macromodel is biased towards the set of training patterns used. It is most accurate when the training patterns reflect well the typical input sequences in a specific design implementation. However, this may not always be true. The macro logic block and its high-level power model are likely to be stored in a design library, which is used for different applications, in which the data statistics can vary dramatically. On the other hand, since low-level power estimation is time-consuming, as we mentioned earlier, if the high-level power models are to be used in the inner loop of system-level synthesis, only fast and relatively accurate power estimation is acceptable.
We use an adaptive regression modeling approach to tackle the above problem, as shown in Fig. 4 . First, the real-delay timing simulator is invoked for the given input trace. We compute the relevant statistics of the input sequence in Step 2 to determine if the input sequence under consideration has similar statistical characteristics as the training set used to derive the static high-level power model (Step 3). If so, static regression modeling is used (Step 4). Otherwise, we invoke the adaptive regression modeling engine in Step 5, where the input sequence is sampled. A low-level power estimator is invoked to estimate the FPGA power consumption with the sampled sequence (Step 6). The static high-level power model is also used to estimate the power consumed by the FPGA for the whole input trace (Step 7). Finally, in Step 8, an adaptive regression engine is invoked to generate the adaptive high-level power model. Since the length of the sampled sequences is much shorter than the whole typical input trace, the time consumed in running the low-level power simulator is acceptable. By adapting the static high-level power model using sampled vectors, we can achieve a good trade-off between accuracy and efficiency. Step 6 The method we use is similar to the one used in [19] for ASICs. Let S be the set of all input vector pairs for a macro logic block such that IS/= N . We sample a subsets c S , such that 1s ( = n . Let x, be the power estimation result obtained from the static high-level power model, and y, be the power estimation result obtained from the low-level power simulator. A linear function of high-level power estimation results, Ex,, is used to estimate the low-level power simulation results, C y , , as follows.
Z Y , =nu + P I X , (9) where cy and p are constants.
Thus, minimizing x ( y , -a -px, )' least-square-error, we geta"=y-px and
IES I E S
I E S IES
I ES
where y a n d~ are the average of y j andxi . 
Experimental Results
In this section, we present the experimental results to demonstrate the efficacy of our high-level power modeling approach.
We should first keep in mind that FPGA and ASIC libraries are different. In an ASIC library, if a component is presented as a hard core, it means that the layout and technology are fixed. When such a component is used in a system, with the same input sequence, the timing and power behavior will be the same. This may not be true for a component from the FPGA library. Since FPGAs constitute a parallel hardware platform, multiple tasks may execute concurrently on the same device. Combinations of different tasks change the task configuration. For the same macro logic block, the power consumption under different configurations may be different, i.e., power estimation cannot be strictly modeled with the same parameters or functions. Therefore, for high-level power modeling, multiple tasks need to be considered together, and macro logic block combinations, which occur with a high probability, need to be explored. The second point is that the high-level power models we construct are suitable for both combinational and sequential logic. Although in sequential logic, the initial states are difficult to ascertain, the estimation error introduced by unknown initial states is minimized in a long simulation.
A series of experiments was performed to verify our high-level power estimation model on a Sun Enterprise 4500 server with 4 gigabyte memory. In Table I , for each example, we show five results for static regression-based high-level power models: number of terms we use in the model, relative error, maximum error, time used to construct the model, and time used to estimate power consumption. For these examples, the average relative error is 1.7%. In Table 11 , we show results for adaptive regressionbased power models for input traces with very different statistical characteristics than those used to derive the static macromodel. The table includes the number of terms, relative error, and the vector-pairs sampling ratio (i.e., the percentage of vector pairs from the input trace that are sampled). Comparing these results with those in Table I , we note that adaptive modeling leads to relative errors that are quite comparable to those from the static macromodel. Thus, it is indeed able to adapt to different input traces without much loss in accuracy. For these examples, the average relative error is only 3.1 %. Figure  5 demonstrates the agreement between high-level and low-level power estimation for part of the experimental results. 
Conclusions
We have presented efficient adaptive regression-based high-level power models to estimate FPGA power consumption. For each FPGA configuration, only four types of inputloutput statistics are needed, and only one compact function is needed to estimate the power consumption. Thus, the memory space required to store the high-level power model is also small. The methods and algorithms we use are general: (i) the same power macromodel template can be used for all modeled circuits. (ii) They do not need to explore the internal structure of the reconfigurable devices. (iii) In order to improve on-line power estimation accuracy, we use adaptive regression methods to lessen the problem of biased training sequences, and achieve a good trade-off between efficiency and accuracy.
