We present a novel method for estimating the power of sequential CMOS circuits. Symbolic probabilistic power estimation with an enumerated state space is used to estimate the average power switched by the circuit. This approach is more accurate than simulation based methods. Automatic circuit partitioning and state space exploration provide improvements in run-time and storage requirements over existing approaches. Circuits are automatically partitioned to improve the execution time and to allow larger circuits to be processed. Spatial correlation is dealt with by minimizing the cutset between partitions which tends to keep areas of reconvergent fanout in the same partition. Circuit partitions can be recombined using our combinational estimation methods which allow the exploitation of knowledge of probabilities of the circuit inputs.
INTRODUCTION
We present an accurate way to estimate average power dissipation in sequential CMOS circuits. We use symbolic power estimation with an enumerated state space to estimate the average power dissipated by the circuit. This approach is more accurate than simulation based methods which are biased depending on their starting state or states.
*Corresponding author, e-mail: mel@ece.neu.edu e-mail: vohm@sequencedesign.com 187 Our approach differs from similar approaches in two important ways. We enumerate the state transition graph (STG) incrementally using the state space exploration method [2] developed for formal verification. This saves significant space in the storage of the STG. In addition, we use automatic circuit partitioning which speeds up the run-time for power estimation. Combining incremental state space exploration with partitioning allows us to accurately estimate larger sequential circuits than was previously possible.
Our approach is similar to cell-based estimation, which has previously been applied to combinational circuits [6] . It differs in that we partition circuits automatically. This results in a more accurate estimation because partitions are generated to provide the best size for power estimation and do not depend on the size of a gate or macrocell.
Our work is similar to other sequential probabilistic power estimation, but is more accurate, while addressing issues of efficiency. For example, Monteiro et al. [7] construct the complete STG and compute state probabilities using the Chapman-Kolmogorov equation. They restrict themselves to circuits with a smaller number of flip-flops than we can handle. Chou and Roy [3] build an extended STG structure that includes primary input vectors in the states. This extended STG is considerably larger than a normal STG, which greatly increases the required storage space and the computation for finding state probabilities. Other researchers [9, 10] accommodate larger STGs by unrolling the circuit a few cycles to approximate the correlation. Tsui et al. [10] use Chapman-Kolmogorov to compute probabilities, while Schneider et al. [9] use Boolean equations. Najm et al. [8] run a Monte Carlo logic simulation on the sequential circuit to collect the signal and transition probabilities of the next-state lines, which are then fed into their combinational estimator. Our method is more accurate than these because it builds the whole STG; it requires less storage because the STG is built incrementally.
In Section 2 we describe combinational power estimation, which is used for sequential estimation.
In Section 
State Probabilities
To achieve these goals we partition the circuit using the ratio-cut algorithm [11] . Ratio-cut is a modification of the Kernighan-Lin algorithm [4] for variable-sized partitions. Rather than minimizing the cutset while perfectly bisecting the graph, it seeks to minimize CAB R(AB) 
Ii is the compatible set of input vectors for state i, ninp is the number of primary inputs to the circuit or subcircuit, p(k) is the signal probability of primary input k, and xj(k) is the logic value on primary input k in input vector j. The state probability of state is the probability that the circuit (or subcircuit) will be in state at any given time. The signal probability of gate k is the probability that the logic value on gate k is equal to 1. The signal probability of a primary input is the probability that the input is 1.
In the case where all values of p(i) are 0.5 (all input vectors are equally probable), Eq. (5) reduces to Pi-II otl (6) where IIotl is the number of possible input vectors. Pi" Xi(j). (7) P(J) -[ V where xe(j) is the logic value of gate output j in state and Pi is the state probability for state described above. The signal probabilities are stored in a probability vector as they are computed.
The behavior graphs need to be processed in order of dependencies. To visualize this, we can form a directed graph of the subcircuits with edges representing dependencies. probabilities are derived from n-step transition probabilities. The 1-step transition probability, p0(1) (or p#) is defined as the probability that the STG will transition from state to state j in one time step, and is computed directly from the signal probabilities of the primary inputs. The n-step transition probability, po(n) is the probability that the STG will transition from state to state j in n time steps. We compute it from a modified version of the Chapman-Kolmogorov equation: p/(n)= k pik(n-1)pj.
In the STGs of practical circuits, we observe that as n grows, pij(n) becomes independent of both n and and converges to the steady-state probability Pj, which is the probability that the STG will be in state j at any given time.
At For the example in Figure 6 we compute the total average switched capacitance for edge (i,j) with: P/jl Cij (Pjk Cijl,jkl "J-Pjk Cijl ,jk + njlCij',jl + PjmCijl,jm) + (Pig' Cii2,jkl + Pik Cij2,jk + + (9) PO Pii' + Pij (10) We compute the total average switched capacitance for edge (i,j) with:
where Pii E Pijn (12) is the total transition probability for the edge (i,j Figure 7 shows the pseudo-code for this procedure and Figure 8 shows an example tertiary tree for a set of partial circuit vectors.
Recombination with Tertiary Search

Trees
In the previous section, we described the construction of a tertiary search tree for a set of partial The search procedure will return a list of all of the vectors that could successfully intersect with the circuit vector in question. Figure 9 shows the pseudocode for the search procedure. Though there is quite likely to be many more vectors in the result list than will successfully intersect, the size of this list is typically much less than the number that would be intersected by the brute-force method. Figure 10 shows the result of using tertiary search trees for several sequential circuits. The x-axis is the normalized partition size and the y-axis is the number of actual intersections performed divided by the number of intersections that would have been required using the brute force method. We find that for unpartitioned circuits, the number of intersections is reduced by over 80%. For reasonable numbers of partitions (up to four, in these examples) that there is still reasonable improvement.
Partitioning in Power Estimation
In combinational estimation of partitioned circuits, we avoided recombination due to the complexity. As a result, we also expect to lose a certain degree of accuracy by neglecting some correlation in the circuit. We experimented with estimating the power in sequential circuits without recombination to determine if there is a significant enough speed improvement that the cost in lost accuracy is acceptable. In unpartitioned estimation, with each edge we stored a list of pairs. Each pair consists of a circuit vector/input vector combination and the transition probability for each. When we include circuit partitions, we store a list of these pairs for each circuit partition. Some recombination is still required to combine the sequential circuit vector with each behavior graph. When we fill in the probability matrix we will already have an edge's total transition probability when we intersect it with the first behavior graphall of the circuit inputs are already accounted for in the partial sequential circuit vector.
The switched capacitance computation is similar to the recombined case. We compute the switched capacitance for each partition and add each value into the capacitance matrix as they are computed.
At this point, limiting state probability and total power dissipation computation are computed the same way as for unpartitioned and recombined circuits.
State-space Exploration
A third problem, that must be addressed is the storage requirements of enumerating and saving the entire STG. While our method clearly requires the full enumeration of the STG, there is no Table IV shows some typical power estimates on the partitioned benchmark circuits from Figures l(c) and (d). Figure 12 shows the partitioned estimates never deviate more than 3% from the estimates on unpartitioned circuits. 
Sequential Results
