The power consumption of a digital circuit can be reduced by decomposing it into sub circuits which can be turned off when inactive. Power can also be reduced by careful state encoding. Clock-gating techniques have been shown to be very effective in the reduction of the switching activity in sequential logic circuits. Modeling a given circuit as a finite-state machine, we formulate its decomposition into submachines as an integer linear programming (ILP) problem. A simple, but powerful state encoding method is used for the submachines to further reduce power consumption. The strategy consists in partitioning the original circuit into two structural sub circuits so that each sub circuit can be successively tested by the Computer Aided Testing (CAT) environment. In partitioning the circuit and planning the test session, the switching activity in time interval (i.e. the average power) power consumption are minimize. To minimize the average switching activity, we search for a small cluster of states with high stationary state probability and use it to create the small sub-FSM.
INTRODUCTION
With increasing integration scale and clock frequency, power dissipation forms an important design constraint for integrated circuits. Power consumption has become a major design parameter for integrated circuits. Two independent factors have been considered. One factor, low power consumption is essential to achieve longer autonomy for portable devices. Other factor, increasingly higher circuit density and higher clock frequencies are creating heat dissipation problems, which in turn raise reliability concerns and lead to more expensive packaging. Various technologies, from the transistor and gate levels to the operating system and application levels, have been studied to reduce system power consumption. [1] [2] In this work, the problem of optimizing logic-level sequential circuits for low power is considered. Several techniques for state assignment have been presented which aim at reducing the average switching activity of the present state lines, and consequently of the internal nodes in the combinational logic block [3] . Retiming has also been tailored so that the distribution of the registers within the logic block minimizes the total amount of glitching in the sequential circuit [4] .For a sequential circuit, an effective way to reduce power dissipation is to turn off part of the circuit when inactive. Finite state machine (FSM) decomposition is usually applied to facilitate this approach. The states of the target FSM, M are partitioned to form a set of submachines, and each submachine implement the functions of M in one state partition.When one submachine is active, all the other submachines are disabled, leading to a reduction of the switching activities involved in state transitions. [5] - [8] Different FSM decomposition structures have been studied. One approach lets the submachines maintain their own states and compute their own state transitions. Fig. 1 shows the circuit structure of this type of decomposition. We refer to this approach as sequential logic decomposition (SLD). The other approach separates the state update and state transition computation. Dedicated logic is used to maintain circuit states and schedule submachine activities, while the submachines are used just for computing the next states and outputs [2] . The approaches in [5] - [8] decompose the target FSMs incrementally using heuristic or generic algorithms, and are only able to explore a very limited design space.
FSM encoding has been studied for decades. Circuit power consumption can be further reduced by taking advantage of the flexibility possible in submachine state encoding. Traditionally, encoding algorithms aim to minimize area. Several encoding algorithms explicitly targeting low power have also been proposed [9] [10] . However, the submachines obtained by decomposition differ from traditional FSMs in that they have transitions to and from other submachines. Hence, the state assignment of one submachine affects the state assignments of other submachines.
Figure 1 FSM decomposition method: sequential logic decomposition
In this paper, we formulate FSM decomposition as an integer linear programming (ILP) problem with power minimization as the objective.The rest of the paper is organized as follows; Section 2 presents energy and modeling with notations, Section 3 describes ILP based FSM decomposition. Section 4 describes power estimation of sequential circuits; Section 5 summarizes the estimation results. Finally, Section 6 presents the conclusion of the proposed approach.
ENERGY AND POWER MODELING
For the current CMOS technology, dynamic power is the dominant source of power consumption, although this may change for future developments of high scaled integration. The average energy consumed at node i per switching is ½C i V 2 DD where C i is the equivalent output capacitance and V DD the power supply voltage. Therefore, a good approximation of the energy consumed in a period is ½C i s i V 2 DD where s i is the number of switchings during the period. Nodes connected to more than one gate are nodes with higher parasitic capacitance. Based on this fact, and in a first approximation, capacitance C i is assumed to be proportional to the fanout of the node F i . Therefore, an estimation of the energy E i consumed at node i during one clock period is ½s i F i c 0 V 2 DD where c 0 is the minimum size parasitic capacitance of the circuit. According to the above formulation, the energy consumed in the circuit after application of a pair of successive input vectors (V k -1, V k ) can then be expressed by:
where i ranges all the nodes of the circuit and s(i,k) is the number of switchings provoked by V k at node i. Consider now a pseudorandom test sequence of length Lengthtest, where Lengthtest is the test length required to achieve the targeted fault coverage, the total energy consumed in the circuit during application of the complete test sequence is:
ILP-BASED FSM DECOMPOSITION
We formulate the FSM partitioning as an Integer Linear Programming (ILP) problem. Our objective is to maximize the isolation of circuit components by minimizing the communication between the partitioned FSM. This maximizes the number of components that can be put to sleep thus reducing the overall power dissipation. The ILP based partitioning technique proposed for the FSM component of the circuit shown in Fig.2 is given in Fig.3 . This ILP formulation relies on state transition probabilities which can be difficult to obtain in practice.
FSMD Partitioning
The proposed partitioning technique works at the behavioral level, before synthesis. The FSM described at the behavioral level is split into two or more separate FSM units. At any given time, only one FSM is active while the others are powered off. This results in significant power savings (static and dynamic). If there are data components, namely registers that are used in multiple partitions they are kept alive when either of these partitions is active. Ideally, data components must be isolated as much as possible so they can be turned off as long as possible. Further, when data components are shared between FSMs, their updated values need to be communicated to the newly activated FSM. The communication overhead result in power dissipation, is be minimized. A high level architecture of the proposed FSM partitioning is presented in Fig.4 .
Figure 4 Partitioned FSM

International Journal of Computer Applications (0975 -8887) Volume 11-No.8, December 2010
Another important criterion is to minimize the number of transitions between partitions. Each time a transition occurs we not only have the communication penalty, but also encounter a startup delay whereby the capacitances of the newly activated FSM are charged up. To reduce this performance penalty a lookahead mechanism can be used, to minimize these effects we need to efficiently partition the FSM to reduce the amount of shared data components. To achieve such an efficient partition we have used an ILP approach. Our objective is to minimize the number of shared components between partitions and also minimize the number of possible transitions between the partitions.
Problem Formulation
Formally, a finite state machine with datapath (FSMD), P, is a 6-tuple defined as P = {S, s A FSMD can also be represented using a state transition graph (STG). The STG of P can be described as G P (V P , E P ), where V P is the set of nodes, representing the state set of P, S, and E P = { (u, v) , u,v V P } is the set of edges representing the state transition set of P.
The partitions of P are a subset of S, along with the transitions related to the states in S. In addition, we require structures for coordination and communication between the partitions, while preserving functionality. It is our goal here to partition machine P into submachines P k such that the interaction between these partitions is minimized. Let the number of partitions be M. The set of partitions can then be identified as P k k [1, M] .
Given an FSMD, we first determine the set of variables that are shared between the various states. A variable, v VAR is considered shared between states si and sj if the variable is read or written in state si and read or written in state sj. We refer to the bits of variable v as being transition bits. We, thus, represent the total number of transition bits between states si and sj as Tij.
We represent the set of edges of the STG as Eij, which is a binary variable. It is 1 if and only if there exists an edge(transition) from state si to sj. Let sij be a binary variable which is 1 if and only if the states si and sj are in the same partition.
The objective function can now be stated as where N is the total number of states in a machine, P. The first summation term represents all transition bits between the various partitions, Pk, and the second summation term represents all edges between the partitions. The edges between the two partitions are weighted by which is the sum of all register bits in the original partition P. This is because, in the worst case, all register bits may need to be communicated from one partition to the other. The constraints on the ILP fall into two categories: quality constraints and correctness constraints. The quality constraints help to guide the solution toward a useful solution, while the correctness constraints ensure the variables have consistent values.
POWER ESTIMATION
Here we present a framework for estimating the potential power savings from partitioning a FSMD. The savings in power dissipation can be broken down into savings in static power and savings in dynamic power.
Using experimentation we have found that, at least in the circuit implementations we have used, the static power of the circuit is roughly proportional to the amount of sequential logic in the circuit. 1 Thus, by examining the number of sequential elements in the partitions, and the proportion of time they are put to sleep, we can estimate the static power savings.
ESTIMATION RESULTS
The CAT environment has four main functions: (i) generates the test program i; (ii) parameterizes the wrappers; (iii) inserts scan chains when necessary; (iv) generates the SoC interface integrating the cores. Table 1 presents the estimated power savings using the framework. In all cases, the original machines have been partitioned into three submachines using the ILP formulation presented in section 3. We find that in most cases the static power savings are between 20% and 40%, while the dynamic power savings are between 10% and 30%. We are currently analyzing this situation to find potential remedies in the case where there exists a large, single loop. 
CONCLUSION
In this paper, we have presented a FSM partitioning technique, using ILP, which efficiently decomposes the controller and the datapath into two or more partitions. Implementing and analyzing a ISCAS benchmark circuit in detail shows that up to 41% power savings are possible. We are currently working on implementing the proposed FSM partitioning method for the entire set of explored circuits and will report the measured power results soon. SoCs in test mode can dissipate up to twice the amount of power they do in normal mode, since cores that do not normally operate in parallel may be tested concurrently to minimize testing time. Power constrained test scheduling is therefore essential in order to limit the amount of concurrency during test application to ensure that the maximum power rating of the SOC is not exceeded. The goal of the proposed strategy is to minimize the average power consumption during SOC testing and to reduce the peak power consumption. This strategy consists in partitioning the original circuit into two structural subcircuits so that each subcircuit can be successively tested through IEEE Std 1500.
