Existing resynthesis procedures used for reducing power consumption 
Introduction
The problem of synthesizing CMOS combinational logic for low average power dissipation is the subject of considerable attention. Significant work has been completed on the topic of low power network synthesis both in the area of technology-independent optimization (e.g. [7] , [8] ) and in technology mapping techniques (e.g. [9] , [10] , [11] ).
The global synthesis techniques of the previous work are presented either with no discussion on the topic of resynthesis, or with resynthesis results which offer statistically insignificant improvements. The lack of improvement during resynthesis can be attributed to the fact that the approach taken is generally only a minor modification of the original synthesis pass. That is, the more accurate circuit information obtained after synthesis is not used to construct a good node resynthesis selection criteria or node resynthesis technique.
The focus of this paper is the development of a power cost function to guide node selection in resynthesis for low-power. Increased optimality in node selection is guaranteed for any resynthesis process. This cost function can be used to drive an incremental optimization procedure on a pre-optimized network. Further reduction in the network power would be expected even if the technique applied to resynthesize the selected nodes is the same as that used in the original synthesis pass. Alternatively, the cost function can be used to apply power resynthesis to a network for which the primary constraint is a critical variable such as delay.
The cost function presented here is a statistical estimation technique for predicting the expected change in power disspation throughout the transitive fanout of a resynthesis region prior to resynthesis. The theory applies to networks with zero or arbitrary delay assumptions. It is possible to restrict the logic resynthesis step to ensure that a the local change is always beneficial to network power globally [8] . However, in some cases this hard constraint may be pessimistic by unnecessarily placing a strong constraint on resynthesis freedom.
The need for development of an adequate cost function is justified in Sect: 3. In Sect: 4 the theory for construction of an estimator is presented, and in Sect: 5 is the formulation of this estimation strategy into a cost function for node selection in resynthesis. The accuracy of the cost function is verified by significant empirical testing in Sect: 6.
Power Dissipation in Logic Circuits
The energy dissipation of a CMOS circuit is directly related to the switching activity when a simplified model of energy dissipation is used. The assumptions in the simplified model are: (1) all capacitance is lumped at the output node of a gate; and (2) current flows only from the supply rail to the load capacitor, or current flows from the load capacitor to the ground rail; and (3) all voltage changes are full swings, i.e. from the supply rail to the ground rail voltage, or vice-versa.
For a well-designed gate, the above assumptions are reasonable [3] . For a synchronous digital system, the average power dissipated by a gate gi is given by:
where Pi denotes the average power dissipated by gate gi, Ci is the load capacitance at the output of gate gi, Vdd is the supply voltage, T is the clock period, and Ei is the average number of gate output transitions per clock cycle. Given a technology-mapped circuit or a circuit layout, all of the parameters in Eqn: (1) can be determined, except for Ei, which depends on both the logic function being performed and the statistical properties of the primary input signals.
Eqn: (1) is used by the power estimation techniques such as [1] [2] [4] to relate switching activity to power dissipation. The power estimation technique used to establish the results of this paper assumes that the network primary inputs are independent. Furthermore, the probability that a primary input is 1 is taken to be 0:5. These assumptions are made for the sake of simplicity in the theoretical presentation. All the theory can be generalized to the case of arbitrary input probabilities. The generalization is presented in [5] .
Motivation
During network resynthesis, only those network regions which are not optimal according to a cost function assessment and which do not lie on a fixed timing critical path may be altered. The goal of the procedure is to provide maximal resynthesis freedom to a restricted set of regions which are judged "highly non-optimal". This freedom can be increased by maximizing the compatible Observability Don't Care (ODC) sets [12] for these regions through the construction of a suitable node input ordering ( Sect: 5). The definition and detection of the "highly non-optimal" regions for low-power resynthesis is the main contribution of this paper.
In general, a resynthesis region consists of a set of nodes and a subset of their transitive fanin nodes. For this paper, a single output node is assumed so a resynthesis region is a "cone" of logic with a single fanout point at its apex. The more general case is handled by the techniques of Sect: 4.1.2. The selection of an appropriate node for resynthesis must consider four effects, pictorially represented in terms of their regions of influence in Fig: 1. 1. Cone Power Change: Power change within the resynthesized section of network.
Functional Transitive Fanout (TFO) Effect:
When the ODC set is used for resynthesis, the resulting change in functionality throughout the TFO can result in overall increase in circuit power dissipation even if the Cone Power Change is negative.
3. Spurious Dynamic Activity TFO Effect: Resynthesis which changes the delay or function at the cone fanout may change the delay dependent activity throughout the TFO.
Delay Affected Spurious Dynamic Activity:
The loading effect of the resynthesis region may change thereby affecting delays. This can alter spurious dynamic activity in circuitry other than just the TFO of the cone. The analysis within this paper is concentrated on the Functional TFO and Spurious Dynamic Activity TFO effects for general resynthesis. The Delay Affected Spurious Dynamic Activity is omitted as it is a lesser effect, and the Cone Power Change is left to studies of specific resynthesis techniques.
The Functional TFO effect is a consequence of the non-linear relationship between power and onset probabilities. Consider a circuit with zero-delay elements. It follows from Eqn: 1 that the power consumption at a node a is proportional to Pr ( a ) :Pr(ā) = Pr ( a ) : ( 1 Pr ( a )) where Pr ( a ) is the probability of node a being 1. The trivial example of Fig: 2 demonstrates a case in which a local improvement results in an overall increase in power for the circuit.
In this example, the NOR gate inputs are independent, and the capacitance at each node is assumed equivalent. The power consumed at input b is the highest power consumption at any of the three nodes. Suppose that during resynthesis the probability of b being one to be reduced to 0:4. This reduces the power consumption local to b. However, the probability of z being one increases from This introduces a key notion essential to the generation of a resynthesis routine targeting low-power. Power consumption at nodes with the highest functional switching probability is the least sensitive to small changes in onset size. In this context, choosing a node for resynthesis becomes much less obvious than it might initially appear.
It is possible to restrict the construction of the ODC in such a way as to guarantee that a local resynthesis step does not detrimentally influence the TFO [8] . However, this may not be desirable. The empirical results of Sect: 6 show that number of conditions under which this restriction is beneficial are limited. Such an approach might then be overly constraining on resynthesis freedom.
Estimation of Change in Activity
There are two aspects of node activity: the delay-independentfunctional activity, and the delay-dependent spurious dynamic activity. An estimation scheme for changes in both forms of activity is presented in the following section. A definition of the terminology is first necessary.
Consider node n embedded in a digital circuit. The circuit has a set of primary inputs I = fi1; i 2 ; :::;i3g. Let V (I) be the set of all possible pairs of input vectors. It is assumed that the circuit is allowed to settle completely after application of any input vector.
Time t = 0 may now be set as the application time of the second vector in any input vector pair, v. Let Pv be the probability that vector pair v 2 V (I) occurs. Over the set, V (I), the earliest arriving transition at node n occurs at time t min n , the latest at time t max n .
For vector pair v, the total transition activity, T T n (v), at node n is the number of all 0 ! 1, or 1 ! 0, transitions in the interval where: F t n (v) is the logic value of node n at time t under input vector pair v. This is equivalent to the activity at node n under the zero-delay model for all nodes.
Definition 4.2 The Spurious Dynamic Activity of node n is
The following naming conventions will be used throughout the paper:
fn(Z) is the static logic function at the output of node n in terms of variables in set Z.
fn(Z)jz is the boolean co-factor of logic function fn(Z) with respect to z 2 B jZj ΩZ(f ) is the set of minterms of the function f contained within the space defined by the variables Z.
Functional Activity
Consider a theoretical circuit in which all elements are of zero delay. The total transition activity and functional transition activity are equivalent at every node. A node n is a good candidate for resynthesis if a local change in activity, plus the change in activity throughout the transitive fanout of the node,reduces the overall power consumption ( [5] , [8] ). A functional activity change is achieved by altering the function of n locally within the bounds of the ODC set. A node is defined to be a good candidate in a statistical sense if there is a high expectation of a significant decrease in total power resulting from a local change in functional activity. For mathematical simplicity, the analysis presented in this section assumes the entire Boolean input space, ΩI , as a bound on the ODC set at a node. The modification to the theory to include more realistic bounds on the ODC set is presented in Sect: 5.
The Single Fanin Change
Consider the circuit element of Fig: 3 . The diagonally shaded regions indicate the original function onsets. The set of nodes is embedded in a circuit with a tree graph structure. Assume that node n1 is changed in a resynthesis step such that a set of minterms, An 1 , is added to the original onset of the node. An 1 is disjoint from the original onset of n1. Assume that prior to resynthesis the size of An 1 can be estimated, but that the exact elements which compose it cannot.
Define the set of inputs to n, N = fn1; n 2 ; :::;njg. Elements within the set of An 1 may be added to, removed from, or not propagated through to the onset fn(I) of the node n. 
is the negative sensitivity at n with respect to n1. S p n (n1) and S n n (n1) are disjoint. The standard definition of node sensitivity is derived from the union of these functions:
To address the problem of determining the size of an expected change in fn(I), the following proposition is required: The proof of this proposition follows from basic probability theory. Similarly,
The effect of a decrease in the onset size of node n1 may also be computed. The relevant lemmas are: Rn(Rn 1 ) = Ω I ( S p n ( n 1 )) \ Rn 1
It follows that:
Sets An 1 and Rn 1 are disjoint by definition. Consequently, for the case in which there is both a set of points added to and removed from fn, the expectations may be summed. E(An) = Pr ( S p n ( n 1 ) n f n 1 ) : j A n 1 j+Pr ( S n n ( n 1 ) n f n 1 ) : j R n 1 j E ( R n ) = Pr ( S n n ( n 1 ) n f n 1 ) : j A n 1 j+Pr ( S p n ( n 1 ) n f n 1 ) : j R n 1 j To translate this result to an expected change in power, the following assumption is required: 
The functional activity power consumption of node n obeys the proportionality relation
The assumption allows the following approximation to be made ∆(P (n)) / 1 2 2jIj :(2 jIj :E(n) 2:jΩI(fn)j:E(n) (E(n))
2 ) (11) These local changes in power may be summed over the transitive fanout to determine the total expected change in power. A functional power sensitivity can be formulated by linearizing the expression prior to summation.
The Multiple Fanin Change
In a general circuit structure, it is necessary to examine the effect of changes to multiple inputs to a single node. Even in the case of resynthesizing a single node in the network, this situation may arise as a consequence of reconvergent fanout.
Consider the example depicted in Fig: 5 . Node n is in the transitive fanout of a. Node a is altered in a resynthesis step. In this case, inputs 1 and j are changed, their onsets being increased by sets An 1 and An j respectively. Consider just the elements in An 1 which will also be added to ΩI(fn). x 2 An 1 \An j \ΩI(S p n (n1)) will be added, but x 2 An 1 \An j \ΩI(S p n (n1)) may not be. Sensitivities for multiple input changes can be calculated, but these computations are exponential in the number of fanins to the node. This motivates the use of an approximation scheme, such as the following:
Half the points within the intersection of set An j , An 1 and ΩI(S p n (n1)) \ ΩI(fn)) are assumed added to the onset of fn. This is extended to the general case in a straight forward manner. The general case includes both addition and removal of onset minterms from arbitrary inputs. The approximation scheme is associative. It has been empirically verified to be the best of a number of possible approximation schemes. 
Spurious Dynamic Activity
Resynthesis of a node may change its spurious dynamic activity as well as its functional activity. Prior to a resynthesis step, the final spurious dynamic activity local to the resynthesized node is extremely difficult to estimate with any degree of accuracy. This implies that the sensitivity of the transitive fanout of a node to these local changes is a very important consideration during a resynthesis algorithm.
The sensitivity of the spurious dynamic activity in the transitive fanout of a node n to a local change in delay, spurious dynamic activity, and functional activity will be referred to as the dynamic sensitivity of node n. For use in the choice of node in a resynthesis algorithm, it needs to be in terms of T F n ; and T D n . This set of variables set is the only possible choice for establishing a dynamic sensitivity as more detailed information regarding delay and functionality is not available prior to resynthesis. The theory of the previous section provides a technique for estimating the change in functional activity. Consequently, it is desirable to isolate the estimate for spurious dynamic activity so that the estimate for total activity is the sum of the two separate estimates.
An elementary estimator for total activity at the output of node n with input activities fT T n i g is:
Pr ( S n ( n i )):T T n i (12) In general simulation, the problem with this approximation is that it totally neglects correlations between the inputs. Furthermore, it does not take into account any reduction in activity due to simultaneous input arrivals. In the case of resynthesis, however, changes are expected to be incremental. The statistical properties relating the input activity to the output spurious dynamic activity is unlikely to change significantly. The sensitivities, Sn(ni), can be viewed statistically as a measure of how significant a change in the activity of input ni is to a change in the spurious dynamic output activity.
Using the results of an initial simulation of the network, a ratio can be computed which relates the actual spurious dynamic output activity to the input activities.
To estimate a change in activity at the output of a node given a change in the spurious dynamic activity at the inputs, the ratio R D n is assumed constant giving:
The dynamic sensitivity is computed by propagating this sum throughout the transitive fanout.
Combining the Estimators
A change in functional activity at a node in a network affects the node sensitivities throughout the transitive fanout. This, in turn, affects the transmission of spurious activity.
An expected change in functions S n n ; S p n is computed using an identical technique to that described for estimating changes in fn.
The result is again provably average for circuits described by a tree graph.
The spurious dynamic sensitivity now includes terms from the expected change in sensitivities, (Pr ( S n ( n i ))). Eqn: 14 becomes (to first order):
+(P r ( S n ( n i ))):T T n i (orig) )
Node Selection for Resynthesis
The selection of a region for resynthesis depends not only upon the expected change in power when the functionality at the output node is changed, but also upon the flexibility available in the ODC and how that relates to the ability to resynthesize the region. A large ODC provides more resynthesis options than a small one, and the size of the ODC set at a node is dependent upon a node input ordering [12] . A node input ordering which maximizes the ODC subsets for the most highly non-optimal nodes maximizes resynthesis flexibility.
To define an input ordering, the maximum possible resynthesis freedom for each node needs to be defined by establishing a bound ODC max . The simplified theory of Sect: 4.1 assumes the entire function space ΩI as available for resynthesis. The modification of the theory presented there is the computation of probabilities within ODC max rather than ΩI. For example, Eqn: 5 becomes:
The expected benefit of resynthesizing a region with output node n is an approximation which correlates the size of a change in functionality within ODC max (n) to the expected change in power local to the resynthesis region. This can be established from an empirical study of the specific resynthesis algorithm used. The change of transitive fanout power corresponding to possible changes in onset probability can be computed using the statistical estimation theory. These can then be scaled by the probability of such an event occurring, the summation of these being a measure of the non-optimality, C(n), of the node.
To maximize the resynthesis flexibility at the most non-optimal node, a node input ordering for the compatible ODC construction needs to be based upon proximity and non-optimality of the transitive fanin nodes. i.e. Let weight wn i be a weight assigned to the inputs of node n . Let Ti be the set of highly non-optimal nodes selected for resynthesis which are in the transitive fanin of n. For x 2 Ti, dx the number of levels (excluding buffers) between x and n.
An input order is then assigned to give higher priority to the higher weight inputs.
Results
A program was written to empirically verify the theory of Sect: 4. The program randomly selects a set of nodes from a network. Every node in this set is resynthesized several times, each time with a random expansion or contraction of the onset of the node which guarantees a local change in functional activity. Full symbolic simulation of the circuit using a technique based on the principles outlined in [2] is performed before and after each modification. This allows a direct comparison between actual change in power and that which the estimators predict.
Fifteen circuits from the ISCAS 89 benchmark set were examined in this experiment. They varied in size from 187 to 1096 literals. The circuits were initially mapped into the msu gate library and optimized using script.rugged within SIS. All the results except those of Table 3 Table 1 : Functional Activity Estimation Table 1 contains the results for functional activity change estimation. The first two columns indicate the form of the statistic. The first column indicates whether the change in power local to the resynthesized node was decreased or increased. The second column indicates the actual global effect on power. A multiplier in this column indicates a bound on the global change in power relative to the local change. For example, " 0:5x" in Row 2 of the table implies that the beneficial change in the local power is reduced by more than a factor of 0.5 by the corresponding increase in transitive fanout power. Column 3 (True) is the probability of actual occurrence of the event indicated by the first two columns; and Column 4 (Est) is the estimated probability of this occurrence. Columns 5 and 6 indicate the accuracy of the estimator. Found is the percentage of actual occurrences correctly detected; Wrong is the percentage of the estimated points for this occurrence which are incorrect estimates (e.g. 0.10 in this column for a particular event implies that in 10% of the predicted occurrences, a different event actually occurred).
Each row of the table presents a condition which would significantly influence the suitability of node for resynthesis. Rows 1 and 2 are conditions under which a node becomes less suitable; Rows 3, 4 and 5 the converse. In particular, the nodes counted in Rows 3 and 4 are excellent candidates for resynthesis due to the strongly beneficial influence of local improvements in power throughout the transitive fanout. The estimator demonstrates an ability to predict better than 90% of the cases in which resynthesis becomes less suitable, and is only 7% overly conservative. More than 77% of the increased optimality conditions were detected with less than a 10% possibility of error. Fig: 6 depicts the the strong correlation of the functional estimator to simulated changes in power. The y-axis is the estimated functional power change, the x-axis the actual functional power change. Each point corresponds to the result of a different possible resynthesis step. The power has been scaled relative to a 20Mhz input vector The accuracy in estimating the spurious dynamic activity was also examined. Using the estimation technique to update node sensitivities improves the correlation of the estimation scheme from 0.56 to 0.93, as depicted in Fig: 7 and Fig: 8 .
The ability of the combined estimation schemes to detect overall optimality conditions is summarized in Table 2 . The accuracy of the results is only about 10% worse than that of the functional activity estimator alone. Table 3 contains the estimator correlation results for individual circuits tested. The respective columns are the network name (Ckt.), the area of the network measured in literals (Area), and the correlation coefficient for the functional (Func.) and dynamic (Dyn.) activity estimators.
Conclusion
In this paper two simple statistical estimation schemes useful for guiding power resynthesis algorithms are presented. The techniques estimate expected change in network power for zero or arbitrary delay assumptions. The theory has been empirically verified on the ISCAS 89 benchmark set and exhibits a very high degree of accuracy for these general networks. The use of these estimators will ensure that accurate prediction of the anticipated overall change Table 3 : Estimate Correlation for Change in TFO Power in power can be obtained prior to resynthesis. This will enhance the optimality of any resynthesis algorithm independent of the particular resynthesis approach.
