Abstract-We propose a new power macromodel for usage in the context of register-transfer level (RTL) power estimation. The model is suitable for reconfigurable, synthesizable, soft macros because it is parameterized with respect to the input data size (i.e., bit width) and can also be automatically scaled with respect to different technology libraries and/or synthesis options. The power model is precharacterized once and for all for each soft macro and then adapted to each specific instance by means of a single additional experiment to be performed by the end user. No intellectual-property disclosure is required for model scaling. The proposed model is derived from empirical analysis of the sensitivity of power consumption on input statistics, input data size, and technology. The experiments prove that with limited approximation, it is possible to decouple the effects on power of these three factors. The proposed solution is innovative since no previous macromodel supports automatic technology scaling and yields average estimation errors around 10%.
I. INTRODUCTION
H IGH-LEVEL power estimation has become an essential task in modern design flows. Several researchers have addressed this issue, mainly at the register-transfer level (RTL), proposing solutions that are based on the construction of abstract power macromodels, starting from existing or newly synthesized implementations [1] , [2] . Among others, regressionbased [3] , [4] and table-based [5] , [6] macromodels are available.
Today's designs are increasingly based on the use of soft macros. Such macros are library macrocells for which only a parametric, synthesizable, and technology-independent description is provided. They are much more flexible than hard macros because they can be mapped onto different technology libraries and their structural characteristics (e.g., bit width) can be adapted to those of the data path. Flexibility is paid in terms of uncertainty when estimating the impact of a soft macro on the design cost metrics. For power, the lack of a presynthesized implementation does not allow accurate precharacterization based on low-level simulation.
The power consumption of a soft macro can be viewed as a function of three parameters: 1) its run-time input statistics; 2) its structural and functional parameters; and 3) its implementa- A. Bogliolo is with the University of Ferrara, Ferrara 44100, Italy (e-mail: abogliolo@ing.unife.it).
R. Corgnati, E. Macii, and M. Poncino are with Politecnico di Torino, Torino 10129, Italy (e-mail: corgnati@athena.polito.it; enrico@athena.polito.it; poncino@athena.polito.it).
Publisher Item Identifier S 1063-8210(01)06664-1.
tion, i.e., the library it is mapped onto and the synthesis script used for mapping. Most of the macromodels proposed in the literature are focused on hard macros; that is, they are typically conceived for modules with a fixed architecture and implementation. Therefore, they capture only the dependence of power on the input statistics and do not provide any flexibility to adapt the estimates to different structural parameters and technologies.
Only a few parameterized power models have been proposed that depend on the bit width of the input data [1] , [7] . In these techniques, the relation between bit width and power is modeled by specific analytical functions that need to be defined for each class of macros. On the other hand, no support is provided to adapt the power models to different synthesis scripts and technology libraries.
In this paper, we follow a different approach. We first conduct an experimental analysis of the dependence of power on the three sets of parameters discussed above. The main result of the analysis is that such parameters have an almost disjoint effect on power; therefore, they can be independently modeled with low accuracy loss. We thus propose a power modeling technique for soft macros based on the concept of scaling: A power model is first built and precharacterized for a reference instance of the macro (capturing only the dependence on input statistics) and then scaled to adapt to the actual input bit width and technology. Intellectual property (IP) issues involved in model characterization and scaling are also discussed and addressed.
The proposed approach was validated on combinational and sequential soft macros taken from the Synopsys DesignWare Library, showing that scaling can be effectively applied to adapt to soft macros traditional power models originally conceived for hard macros. The accuracy provided by the scaled power models is comparable with that provided by ad hoc models of the same type, directly characterized for the target parameters' setting and technology.
II. PRELIMINARY ANALYSIS
Soft macros may have both size and functional parameters. Size parameters are used to adapt the bit width of the input-output data of the macro to that of the data path, while functional parameters (when present) are used to select particular functional or encoding options. For our purposes, we treat size and functional parameters in two different ways. Macros with different functional parameters are regarded as different macros for which different power models can be independently built and characterized, while macros differing only for the size parameters are regarded as different instances of the same soft macro for which a unique scalable model has to be provided. There are three main practical motivations for this choice: First, size parameters are common to all macros, while functional parameters are particular to each macro so that their effect cannot be generalized; second, size parameters may take a large number of different values (making it highly inefficient to characterize independent power models for each assignment) while functional parameters (if any) usually select among a few options; third, functional parameters may completely change the functionality of the macro, while size parameters do not. In practice, we propose a power modeling approach for soft macros with only size parameters. If functional parameters are also present, the proposed approach has to be independently applied for each assignment of the functional parameters.
We represent the power consumption of a soft macro as a function of its boundary statistics ( ), its input-output bit width ( ) and its implementation ( ). Each point in the input space of is associated with a different configuration of the independent variables, originating a different power consumption.
For a given macro, the actual value of in a given point can be obtained by means of synthesis and simulation. An instance of the macro with has to be synthesized, mapped onto technology library and simulated for power at the gate-level with input statistics . We studied the dependence of power on , and for several soft macros (taken from the Synopsys DesignWare Library) by sampling the input space and observing the value of in each sampling point. Synopsys DesignCompiler, VSS, and DesignPower were used for synthesis, simulation, and power analysis, respectively.
The dependence on size is plotted in Fig. 1 for an adder and a multiplier with a unique size parameter representing the bit width of the two input ports. Power values for each macro are normalized to the highest obtained. Piecewise linear curves are drawn by connecting points associated with the same input statistics so that different curves refer to different input statistics.
All results in the plot are obtained by using the same technology library (namely, Synopsys ) and synthesis options, i.e., they correspond to sampling points laying on a plane in the input space. From the plot we observe the following. 1) The relation between bit width and power depends on the macro (for instance, it is almost linear for the adder, while it is nearly quadratic for the multiplier, as expected and already observed [1] ). 2) Curves of the same family (i.e., adder, multiplier) never intersect and their relative ratio is almost independent of the bit width. Observation 1 suggests that there is no general formula relating power consumption to bit width. Observation 2 suggests that the dependence on the bit width can be decoupled from the dependence on the input statistics.
To verify the correctness of the assumption on disjoint dependence of power from input statistics and bit width, we analyze the behavior of the ratio ( ) between the value of in points differing only for the parameter. In symbols represents the ratio between the power consumption associated with points having the same ( ) coordinates on two parallel planes of equations and . Fig. 2 shows the value of for different assignments of , , and . Each curve is associated with a pair, while labels to on the axis denote different configurations of and . We observe that for fixed values of and , is almost independent of and , so that it can be regarded as a property of the pair of parallel planes ( and ) that does not depend on the position of the points on the plane. The assumption of disjoint dependence is thus validated, suggesting a partitioning of the modeling task into two subtasks, separately focusing on the dependence of power on and on and . A second ratio, , can be defined and studied to analyze the dependence of power consumption on technology and synthesis 
III. SCALABLE POWER MODELS
According to the analysis of Section II, we model power consumption of a soft macro as a three-term product (1) where is a power model for a reference implementation of the soft macro, synthesized with size and technology library . In other words and are scaling factors that contribute to adapt to all possible realizations obtained with different bit widths, tech libraries, and synthesis directives.
A. Reference Power Model
When building , the reference instance of the soft macro can be regarded as a hard macro for which a power model can be pre-characterized by means of accurate low-level simulations. Hence, any activity-sensitive RTL power model traditionally used for hard macros [8] can be used to model . For our experiments, we chose the three-dimensional (3-D) lookup table power model (hereafter called ) proposed by Gupta et al. [6] because of its simplicity and generality (its applicability to sequential macros has been recently tested [9] ). As boundary statistics for the model we use three probabilities:
, the average input signal probability, , the average input transition probability, and , the average output transition probability. The dependence of on and is represented by means of a 3-D lookup table whose entries represent the power values associated with a triplet of boundary statistics The model allows the designer to span the tradeoff between accuracy and model dimension simply by changing the discretization step in the three dimensions. The discretization step used for our experiments was for and ; no discretization was used on the axis (the values of obtained during characterization were stored in the model together with the corresponding power values).
We remark that so far we have used to denote any -tuple of boundary statistics needed to evaluate an arbitrary baseline power model . Having chosen for our experiments, hereafter we will use to denote the triplet ( ) the model is based on.
B. Scaling Factors
Experiments have shown that the relation between bit width and power of a macro depends on its functionality. Hence, it is not possible to find a general analytical formula for . On the other hand, according to our modeling assumptions, ratio is almost independent of and . (3) where and are the arbitrary assignments of the fixed parameters. However, unlike , cannot be statically precharacterized nor tabulated since it is even impossible to enumerate all technology libraries and synthesis scripts a designer may use. Hence, technology scaling has to be dynamically performed by the designer once the target technology and synthesis parameters have been defined.
C. Model Characterization
The characterization procedure we outline in the sequel needs to be carried out once for each soft macro, either by the developer of the library of RTL macros (if the power models are to be distributed together with the library and/or there are intellectual property (IP) issues involved) or by the designer (if the library has not been precharacterized by the developer and it is completely accessible to the designer). The characterization process consists of three steps.
Characterization of : A reference technology ( ) is chosen and used to synthesize the macro for a reference bit width ( ). In our experiments, we used Synopsys library and . is then characterized for this reference implementation using input streams with different statistics. We used a sampling step of 0.1 in the and dimensions and, for every ( ) pair, we generated 20 input streams of 50 patterns each with different bit-wise statistics. All input streams were simulated using Synopsys VSS and DesignPower to obtain the corresponding values for and power. Characterization of : The size scaling factor is characterized by synthesizing the soft macro for different configurations of the bit width parameters (ranging from 4 to 32 in our experiments) and simulating each implementation with the reference input stream (we used for all input data bits). Chacterization of ): The technology scaling factor cannot be characterized in advance. Rather, it has to be computed by the designer for each target technology. What can be statically precomputed is the denominator that requires a single simulation run of the reference implementation for a given input stream.
D. Model Evaluation
The power estimate for an instance of a soft macro is given by where boundary statistics; actual configuration of the size parameters; target technology. Model evaluation consists of the following.
Technology Scaling:
The user of the model must first perform technology scaling to adapt the power model to his/her own synthesis flow and technology library ( ). To this purpose, an instance of the macro with size has to be synthesized and simulated using the same input stream used during the third step of the characterization process. The resulting power value is then used to compute according to its definition. Notice that has to be computed once for each soft macro in the RTL library. Size Scaling:
Scaling factor is precharacterized and stored in a lookup table. In this way, during model evaluation, size scaling only requires a single lookup for each instance of the macro. Power Evalutaion:
Finally, boundary statistics ( and ) are collected for each instance of a soft macro during RTL simulation of the entire design and used to evaluate the corresponding models.
E. Accuracy versus Efficiency Tradeoff
In principle, a power model for a soft macro could be viewed as a multi-dimensional lookup The sizeable advantage, in terms of model efficiency, provided by this approach is pictorially shown in Fig. 4 : The cube represents the entire input space where the original lookup  table of entries is defined, while the three orthogonal segments of Fig. 4(c) represent the linear sets of points where the three terms of our power model must be characterized. The value of power in any point of the input space can then be obtained as the product of the orthogonal models characterized on the three segments.
The efficiency improvement is paid in terms of accuracy loss due to the simplifying assumption of complete disjointness of the three effects. To improve accuracy we can reduce the strength of the assumption by using a single scaling factor modeling the dependence on both size and technology. The resulting model will be (4) The efficiency of this modified model is represented in Fig. 4(b) ; scaling factor is now defined on a plane covered by a lookup table of entries. Characterization now requires simulations and synthesis runs.
Characterization of the scaling coefficient entails the following: 1) synthesis of an instance with target size parameters; 2) mapping of the instance on the target technology; TABLE I EXPERIMENTAL RESULTS and 3) gate-level simulation using the reference input stream. The average power consumption provided by gate-level simulation is . The scaling factor is then obtained as Accuracy improvements are discussed in Section IV.
F. Intellectual Property Protection
If the soft macros are a third-party IP, the power models need to be precharacterized by the developer and made available to the designer together with the functional description of the macros. The underlying assumption behind this "user-provider" paradigm is that model characterization requires complete access to the library, while model evaluation does not require IP critical information.
Unfortunately, this assumption does not hold for technology scaling.
cannot be precharacterized by the library developer since the actual technology parameters can be arbitrarily defined by the designer. Conversely, evaluating requires synthesis and low-level simulation so that it cannot be performed by the designer if he/she has no access to a synthesizable description of the macro (that is a provider's IP).
To enable technology scaling while protecting IP we sacrifice some accuracy. The solution we propose is based on the observation that does not vary too much from a macro to another. Hence, a unique technology scaling model could be used for all macros at the cost of some accuracy loss. On the other hand, a "dummy" macro containing no IP could be freely distributed to the designers to be used for evaluating . From the provider's point of view, a dummy macro has to be chosen for technology scaling, synthesized with , mapped onto and simulated with a given input stream to obtain the value of to be supplied to the end user together with the input stream and with the synthesizable description of the macro. This process has to be performed only once.
From the designer's point of view, technology scaling entails synthesis and simulation (as described in Section III-D) with two main differences: The dummy macro has to be used instead of the actual macros, and the scaling factor has to be computed only once and then used to scale all models to the actual technology.
The IP conscious power model has the form (5) Its efficiency and accuracy are discussed in Section IV.
IV. EXPERIMENTAL RESULTS
We have benchmarked the accuracy of the proposed power models on combinational and sequential soft macros taken from the Synopsys DesignWare Library, except for macro , a parameterized accumulator consisting of a register and an adder. For each benchmark, we have compared five models.
Ad Hoc Implementation-specific models constructed for each instance of the soft macros as if they were hard macros. (4) The model of (4) with a single scaling factor. (1) The three-term model of (1). (5a) The IP conscious power model of (5), using a unique dummy macro (namely, the subtractor of size 32) for technology scaling. (5b)
The IP-conscious power model of (5), using four different dummy macros for technology scaling, as explained later in this section. All models were evaluated for three different technology libraries (i.e., , and ), four different sizes of the bit width (4, 8, 16 , and 32) and 36 different input statistics [9 pairs were randomly selected, and four input streams, with different bitwise statistics, were generated for each pair]. The resulting power estimates were then compared to those provided by gate-level simulations performed by VSS and DesignPower. Experimental results are reported in Table I in terms TABLE II  COMPUTATIONAL COST OF THE MODELS of average percentage error (Err) and standard deviation (Dev) of the percentage error.
The errors reported in the Ad Hoc column show the inherent accuracy of the model (without scaling). Its average error of 2.9% has been used as a baseline for evaluating the approximation introduced by size and technology scaling. As expected, (4) is the most accurate of the scaled models, with an average percentage error of 6.7%. Equation (1) introduces some additional errors because of the stronger assumption of disjointness of parameters , and , yielding an average error of 11.5%. Finally, (5a) trades accuracy for performance and IP protection, providing an average error of 16.9%.
It should be noted, however, that IP protection could also be achieved by using more than one dummy macro for technology scaling, in order to improve model accuracy. In fact, by analyzing the individual entries of Table I , we observe that model (5) provides good accuracy when applied to macros with high affinity to the one used for scaling (e.g., the adder), while errors tend to be larger for macros with a different structure (e.g., the decoder). More specifically, we were able to reduce the average error of model (5) at 12.7% [thus, making it comparable in accuracy with model (1)] by aggregating the benchmarks into four categories according to the functions they implement ( , , , ) and by using a different dummy macro for each category. Results are collected in column (5b) of the table.
Estimation performance are reported in Table II in terms of number of synthesis and simulation runs required for characterization (Char) and evaluation (Eval) for each model. Numbers refer to the case of a macro for which a model of 1000 entries has to be characterized, 16 configurations of the size parameters can be specified and three different technology libraries can be used for mapping. In symbols:
. The Eval column gives the total number of synthesis and simulation experiments required to evaluate the models for all possible instances of the same macro (entries and denote that a single synthesis run is required to scale either all the macros in a library or a subset of of them).
V. CONCLUSION
We have proposed a parameterizable power macromodel suitable for RTL soft macros. The main feature of the model is its capability of self adapting to different bit widths of the data inputs, synthesis options and/or technology libraries. The experiments have shown that model scaling provides an average error of 6.7% to be compared with the intrinsic average error of 2.9% resulting from a power model ideally characterized for each instance of the soft macro. Scaling requires a single experiment to be performed by the end user to adapt the models to his/her synthesis flow. IP protection, as imposed by commercial synthesizable macro libraries, can be guaranteed by performing the scaling experiments on a public-domain "dummy" macro. 
