Abstract
Introduction
Rapid increase in the design complexity and reduction in design time have resulted in a need for CAD tools that can help make important design decisions early in the design process. To do so, these tools must operate with a design description at a high level of abstraction. One design criterion that has received increased attention lately is power dissipation. This is due to the increasing demand for low power mobile and portable electronics. As a result, there is a need for high level power estimation and optimization. Specifically, it would be highly beneficial to have a power estimation capability, given only a functional view of the design, such as when a circuit is described only with Boolean equations. In this case, no structural information is known -the lower-level (gate-level or lower) description of this function is not available. Of course, a given Boolean function can be implemented in many ways, with varying power dissipation levels. We are interested in predicting the nominal power dissipation that a minimal area implementation of the function would have.
For a combinational circuit, since the only available information is its Boolean function, we consider that its power dissipation will be modeled as follows:
where Davg is an estimate of the average node switching activity that a gate-level implementation of this circuit would have, A is an estimate of the gate count (assuming some target gate library), and Cavg is an estimate of the average node capacitance (including drain capacitance and interconnect loading capacitance). The estimation of Davg was covered in [l-31. The problem of estimating A from a high-level description of the circuit corresponds t o the problem of highlevel area estimation. This problem is of independent interest, as the information it provides can be very useful, for instance, during floorplanning. The estimation of gate count (or simply, area) A of single-output Boolean functions was explored in [4, 51, where the problem was addressed using a notion of complexity of the on-set and the off-set of a Boolean function. In this paper the authors propose an area model to predict the area complexity of multi-output Boolean functions. This area model is based on a transformation, which transforms the given multi-output Boolean function into an equivalent single-output function. The transformation is such that it helps us infer the area complexity of the multi-output Boolean function from the area complexity of the single-output function, thus enabling the utilization of the complexity based area model of [5] , developed for single-output functions.
However, the proposed area model, like its singleoutput counterparts [4, 51 , is inherently limited to circuits which do not have large exclusive-or arrays in them. Circuits with large exclusive-or arrays are also the source of problems in other CAD areas, such as BDD construction for verification. One way around the problem of exclusive-or arrays is to require that the Boolean function specification explicitly list exclusiveor gates. In that case, these can be identified up-front and excluded from the analysis, so that the proposed method is applied only to the remaining circuitry. In any case, in the remainder of this paper we will not consider circuits composed of large exclusive-or arrays.
Before leaving this section, we should mention some previous work on layout area estimation from an RTL view. Wu et. al. [6] proposed a layout area model for datapath and control for two commonly used layout architectures based on the transistor count. For datapath units, the average transistor count was obtained by averaging the number of transistors over different implementations and, for control logic, they calculate the number of transistors from the sum of products (SOP) expression for the next state and control signals.
A similar model was proposed by Kurdahi et.al. [7] .
Both these models consider the effect of interconnect on the overall area, while [7] considers the effect of cell placement on the overall area. Since the controller area, in [6, 71 , is estimated based on the number of AND and OR gates required to implement the SOP expression, the optimal number of gates required to implement the function can be much smaller than the above sum. This is because it is frequently possible to apply logic optimization algorithms t o give a much better implementation.
The Multi-Output Area Model
We aim to estimate the minimum number of gates (d) required for a multi-level implementation the function, given only its high level description (Boolean equations) and a target technology library. The area model proposed for single-output Boolean functions [4, 51 is based on the notion of complexity of the on and off-sets of a Boolean function. One such complexity measure which will be used in this paper is the linear measure, defined in [5] .
Our approach t o solving the multi-output area estimation problem is inspired by the multi-valued logic approach t o address the problem of two-level minimization of multi-output Boolean functions [SI. In the first scenario, when the contribution of the multiplexor to the area of f was zero, we saw that the control inputs were absent from all the prime implicants, while in the second scenario when the contribution of the multiplexair to the area of f" is maximum, we saw that all the cointrol inputs are present in every prime implicant of f. Thus there seems t o be a correlation between the influence of the multiplexor on the area of f" and the number of control inputs in the prime implicants of f .
represents the area contribution of the multiplexor to an optimal area implementation of f. Note that after optimization it might so happen that certain control inputs become redundant for certain outputs. This manifests itself as some control inputs being absent in some prime implicants of on and off-sets of f. Thus, we may think of
as representing the area of a reduced multiplexor resulting from, the optimization. This reduced multiplexor area is related t o the number of remaining control signals, which leads us t o a method for estimating this area, as follows.
From the above considerations, we propose that an appropriate area model for a multi-output function f , in terms of the area of f and the area of a m to 1 multiplexor is given by
where A,,, is the area complexity of an m to 1 multiplexor, and 0 5 ct 5 l is a coefficient that represents the contribution of the multiplexor to the area complexity of ,f. In the following, we present an approach for estimating a .
Note that the complexity measure [5] of a m to 1 multiplexor is given by [logzm] + 1, i.e., the complexity of a m to 1 multiplexor is proportional to the number of control inputs. This is true because every prime implicant of a m to 1 multiplexor has a size given by [log, ml + 1. In [5] it was observed that the area complexity (-Amu,) is approximately exponential in the complexity measure. Hence it follows that:
A,,, oc 2r1Ogzm1 ( 3 ) Let C, denote the number of control inputs in a prime implicant P,. Then define Con to be the average number of control inputs in a prime implicant belonging to the on-set o f f , so that:
where KO, is the number of prime implicants in the on-set of f . Similarly, one can define Coff. From the above discussion it follows that Con and Coff can be used to measure the area contribution of the multiplexor to an optimal area implementation of f . Notice that the optimal implementation of f will contain a (implicit) reduced multiplexor whose area depends on the smaller of Con and C,ff. Thus, we can model this area contribution, in a fashion analogous to equation ( 3 ) , as:
It then follows from equations (3) and (5) that: a = 2min{Co,,Cotr}-rlogz ml It must be noted that Q can be computed with minimal effort from the prime implicants of f, and once cy is available, A ( f ) can be computed using (2) .
Partition based on support set sizes of individual outputs
High-Level Area Estimation Flow
The transformation, as stated in the previous section, does not place any restriction on the number of outputs that can be dealt with at a time (m). However, we have observed that in practice there is a tradeoff between run time of the area estimation procedure and m. As the value of m increases we observed that the time taken to generate the prime implicants usually increases. However, using too small a value of m can affect the accuracy by overestimating the area, as the sharing between all outputs is not captured. After experimenting with different values of m, it was found that a reasonable choice for the value of m was 16. support set Size <= 3
Typically, a multi-output Boolean function has outputs with varying support set sizes. Outputs whose support set size is very small, for instance 1, 2 or 3, consume very little area. For these outputs very little area optimization can be done. One can make a reliable area prediction for such outputs without having to resort to the above approach. In fact it was found that an area estimate of two gates for outputs whose support set size is two, and an estimate of three gates for outputs with support set size of three, works very well in practice. As far as outputs with support set size of one are concerned, their contribution to an optimal area implementation depends on whether or not they are realized by inversion of a primary input signal. Those which are realized by inversion are assumed to contribute an area of one gate while the rest are assumed not to contribute to the area. The above approach yields benefits in terms of both run time and accuracy, and has been adopted in our area estimation procedure. The flow diagram for the overall area estimation procedure is given in Fig. 2 . The area estimation tool reads an input description of f and partitions the function into two subfunctions. One sub-function ( f 1 ) comprises of all outputs whose support set size is less than or equal to three, while the other (f2), comprises of all outputs whose support set size is greater than three. The partitioning of the network into f l and f2 can be performed by a breadth first search and is fairly inexpensive. We estimate the area of f 1 in the following fashion:
.w) = mi l I + alf?i+ 31fi31
Here, If: I is the number of outputs in f 1 with support set size equal to 1, / 3 is a fraction of these outputs which are realized by inversion of a primary input signal, Iff 1 is the number of outputs in f i with support set size equal to 2, and I f & is the number of outputs in f l with support set size equal to 3. For estimating the area of f2 we use the transformation based approach described above. Let the outputs of f 2 be grouped into I groups of size sixteen each. Let the Boolean function comprising of the ith group of outputs be gi.
We apply the multiplexor transformation to gi, and compute CL, probability and the linear measure of the resultant 4%. We then compute the area complexity of gi using (2) and (6). This procedure is repeated until all the outputs have been used up, and the area of f2 is estimated as:
Finally, the area of f is computed as:
It must be noted that the proposed area model does not account for area sharing across groups. 
Empirical Results
The above proposed area model for multi-output functions was tested on several ISCAS-89 and MCNC benchmark circuits. These circuits are listed in Table 1 which, in addition to primary input and output counts, shows the functionality of these benchmarks. These circuits were optimized in SIS using rugged.script for optimization, and mapped using the library lib2.genlib. The area predicted usin,g the area model was compared with the SIS optimal area.
The performance of the model on all the benchmarks in Table 1 , except s13207* and ~35932, is shown in Fig. 3 . Circuit ~13207" is a modified version of ~13207, obtained by deleting the primary outputs which contain exclusive-or arrays in them. The SISoptimal area of s13207* was 1367. The estimated area for this circuit was 1045. The circuit ~3 5 9 3 2 could not be optimized in SIl3 in one piece. Hence the circuit was partitioned based on the support set sizes (in a fashion similar to the above discussion) and optimized separately in SIS. The resulting SIS-area that was obtained was 7252. The area estimated by the area estimation tool was 8761 41  28  85  143  199  133  138  47  10  14  34  24  135  49  51  135  94  17  45  17  13  24  24  25  24  24  59  58  37  37  39  28  28  27  27  152  1763   -ZijZG   21  18  66  139  67  81  67  36  6  8  10  21  99  37  35  99  71  39  45  20  13  27  27  13  27  27  43  42  24  24  52  32  32  25  25  783 The execution time required by our area estimation tool is also given in Table 1 , in CPU seconds on a SUN sparc5 with 24 MB RAM. We compared these run times, on the above benchmarks, with one run of SIS using script. ru,gged followed by SIS technology mapping. The speedup obtained is shown in Fig. 4 . The figure shows a speedup between 2x and 24x. Notice that a speedup of lox was obtained on large benchmarks like s35932 and s13207*. It must be kept in mind that the reported SIS time for ~35932 was obtained after the circuit was partitioned. Strictly speaking the circuit was not completed in SIS. Hence we believe that on large benchmarks the speedups that can be obtained in practice can be significant. 
Estimation of Cavg
In order to estimate the power, one needs to estimate not only the area complexity but also Caug, which is the average node capacitance (including interconnect) in a circuit. If Ctot is the total circuit capacitance of an optimal area implementation and A is the number of gates, then:
This quantity depends on the target gate library and on the fan-out structure of the circuit. In order to estimate this, it is assumed that one has access to a few area optimal circuit implementations in the desired target library. This does not appear to be an unreasonable assumption. In this case, an estimate of Caug can be obtained by performing an average of the Cavg estimates obtained from the area optimal circuit implementations.
In order to test the accuracy of this approach, only a few benchmarks from the benchmark set listed in Table 1 were used to obtain an estimate of Cavg. These benchmarks were s13207*, ~35932, IC2 and i8. This estimated value of Cavg was used to compute Ctot, assuming that the exact value of A was available. The 
. High-Level Power Estimation
The area estimate can be used to estimate the power dissipated by a Boolean function, by combining it with average activity estimates [3] and the average node capacitance estimate. We will compare our power estimates to the power dissipated by a gate level optimal area implementation of the Boolean function under two different timing models, namely, the zero-delay model and the general-delay timing model [3] . In the case of the general-delay timing model the delays were obtained from a gate library and an event driven simulation was performed. It must be noted that the activity prediction model [3] does not account for the increase in switching activity due to glitches, as is probably to be expected from a high-level model. Hence it is important to check the accuracy of the high-level power model against the zero-delay simulation results. This is shown in Fig. 7 .
Since the activity prediction model [3] depends on the input switching statistics of the circuit, we varied the signal probabilities at the circuit inputs from 0.1 t o 0.9. Thus, each benchmark circuit is represented by a number of data points in the figure. 8oo'o . For the benchmarks vda and k2, the predicted power is significantly different from the actual zerodelay power inspite of the fact that the predicted total capacitance is very close t o the true value of total capacitance. This is because of an over-estimation in the average activity of the circuit. The correlation plot between predicted and actual zero-delay power obtained after removal of the power estimates corresponding to these two circuits is shown in Fig. 8 . The better agreement in this plot shows that indeed for all but two of the benchmarks considered, the method works rather well. These two circuits are responsible for most of the bad points in Fig. 7 . Comparison between actual zerodelay power and predicted power after deletion of points corresponding to vda and k2. We also compared the predicted power against the general-delay simulation results. This is shown in Fig. 9 . As is to be expected, the error in the prediction increases. This is due to the possible presence, in the general-delay case, of multiple transitions per cycle a t a logic node, i.e., glitches. 
Conclusions
In this paper we presented a new area model to predict the area complexity of multi-output Boolean functions. This was based on transforming the multioutput function a t harid t o an equivalent single-output function. The advanta.ges of this model is that no additional characterization in necessary beyond that done for single-output functions. Moreover it offers a natural framework to account for sharing occurring in a multi-output function.. The predicted capacitance was then combined with average activity estimates [3] to get high level power estimates.
