Abstract-This paper advances the state of the art by presenting a well-founded mathematical framework for modeling and manipulating Markov processes. The key idea is based on the fact that a Markov process can be decomposed into a collection of directed cycles with positive weights, which are proportional to the probability of the cycle traversals in a random walk. Two applications of this new formalism in the computer-aided design area are studied. In the first application, the authors present a new state assignment technique to reduce dynamic power consumption in finite state machines. The technique comprises of first decomposing the state machine into a set of cycles and then performing a state assignment by using Gray codes. The proposed encoding algorithm reduces power consumption by an average of 15%. The second application is sequence compaction for improving the efficiency of dynamic power simulators. The proposed method is based on the cycle decomposition of the Markov process representing the given input sequence and then selecting a subset of these cycles to construct the compacted sequence.
I. INTRODUCTION

M
ATHEMATICAL modeling is an essential task in the early steps of system design. Markov processes are one of most widely used models in describing the behavior of very large scale integration (VLSI) systems. These models are, for example, used to define the behavior and performance of history-dependent systems, analyze different memory-dependent characteristics of circuits such as power dissipation and latency, and design policies for system-level resource management including dynamic power management. In addition, there are many system design optimization problems that strongly depend on information about the near past and/or future of the system workload and behavior, e.g., dynamic resource binding and on-the-fly reconfiguration. These problems can be readily modeled by the Markov processes.
Traditionally, a Markov process has been represented by a set of "states" and a set of "transitions" (edges) among these states. This representation is simple, compact, and easy to understand. Furthermore, using this representation, one can easily acquire information about the past (fan-in states) and/or the near future (fan-out states) with respect to current state of the system. However, there is another set of design problems that require information about the past and/or future trajectory of system states. One such problem is the low-power state assignment problem for VLSI circuits. In these types of problems, the customary representation of Markov processes in terms of states and transitions fails to efficiently represent the historic and/or futuristic trajectory of the system states. This paper introduces a mathematical framework for cycle representation of Markov processes. In this representation, a Markov process is represented by using a set of directed "cycles" and their corresponding "weights." This representation translates the state trajectory of the system into a set of intracycle and intercycle transitions. This representation of a Markov process in terms of cycles and weights captures more information about the trajectory of the system states and can be used to compactly formulate and solve an important set of design optimization problems.
More precisely, in this paper, the theory of cycle representation of a Markov process is used to prove that: 1) a given Markov process is probabilistically equivalent to a set of weighted cycles and 2) for a given set of cycles generated according to some ordering, there is only one set of cycle weights that have probabilistic interpretation in terms of the original Markov process. In other words, if one is not careful when generating a cycle decomposition of the Markov chain, he/she may generate a decomposition that is incorrect in the sense that there may exist some edge in the Markov chain whose weight is not equal to the summation of weights of edge-covering cycles that are in the extracted cycle set and cover that edge.
The key theoretical contribution of this paper is to show how to cost-consciously decompose a Markov process into a collection of directed cycles with positive weights that are proportional to the probability of their traversal in a typical random walk. We solve two versions of this problem. In the first version, detailed knowledge about the Markov chain itself, i.e., the state diagram, state transition probabilities, and the steadystate state probabilities, exists. This problem is solved optimally by a deterministic cost-driven cycle decomposition technique. The second version of the problem assumes no prior knowledge about the state diagram of the Markov process. This problem is solved heuristically using a probabilistic cost-driven cycle decomposition technique.
Based on this mathematical framework, we present solutions for two "power-related" problems: 1) state assignment and 2) sequence compaction. To solve the low-power state assignment problem, we identify most probable cycles in the finite 0278 -0070/$20.00 © 2006 IEEE state machines (FSMs) and encode the states on these cycles with Gray codes. The objective function is to minimize the weighted Hamming distance (WHD). Notice that although techniques such as those presented by Tsui et al. [7] use more accurate cost functions for two-level and multilevel logic realizations of the state machines, the minimum WHD (MWHD) metric is still relevant and valuable. In other words, although MWHD does not exhibit high absolute accuracy, it has good relative accuracy as demonstrated in published results [5] , [6] . 1 In particular, during the signoff analysis, we are interested in an evaluation metric with high accuracy; whereas, during synthesis and optimization, we are typically interested in a power or delay equation with high relative accuracy. As for the sequence compaction problem, first, the input vector sequence is modeled as a Markov chain; it is then decomposed into a set of cycles. The average power dissipation of the circuit under the applied vector sequence is calculated by simulating each decomposed cycle of the Markov process once and then calculating the average over power consumption of all cycles in the input Markov model. The mathematical framework for the proposed sequence compaction technique is different from those of [13] - [18] and results in higher compaction ratios for the same level of evaluation accuracy. The proposed technique is also clearly different from statistical techniques, which attempt to achieve the same objective through sampling strategies [19] , [20] .
A. Prior Work Review
State encoding/assignment, as a crucial step in the synthesis of the controller circuitry, has been extensively studied. Early research on state assignment was focused on finding a state encoding that minimizes the circuit area [2] - [4] . In the 1990s, a number of low-power state-encoding techniques were presented [5] - [7] . Roy and Prasad were the first to address the problem of reducing switching activity of input state lines of the next state logic, during the state assignment, formulating it as an MWHD problem [5] . Olson and Kang used a linear combination of switching activity of the next state lines and the number of literals as the cost function [6] . Tsui et al. [7] used simulated annealing as a search strategy to find a low-power state encoding that accounts for both the switching activity of the next state lines and switched capacitance of the next state and output logic.
On the other hand, for power estimation, it is well known that the average power dissipation in a CMOS circuit is proportional to a summation over all gates of the product of the capacitive load and the switching activity of each gate [1] . This summation, which is often called the switched capacitance of the circuit, can be calculated directly by simulating the circuit. To produce a power estimate with good accuracy and high confidence, the direct simulation technique requires explicit 1 Absolute accuracy is a measure of the estimate of some parameter of interest compared to its actual value. In contrast, relative accuracy is a measure of the accuracy of individual estimates of the parameter when compared to other estimates of the same parameter made under different conditions. For example, a proposed power estimation metric may produce absolute errors of 30% or higher for some circuit, whereas that metric may exhibit a relative error of at most 10% between estimates for two different implementations of that same circuit.
weighted enumeration of all allowed pairs of input vectors, which is impractical for any circuit of reasonable size and complexity. Therefore, adopting a computationally less expensive method for estimating the average power dissipation in a CMOS VLSI circuit is crucial. There exist two main classes of "static" and "dynamic" techniques for power estimation [9] .
Static techniques are based on calculating the probabilistic behavior of internal nodes of the circuit as a function of the stochastic behavior of the input vectors (e.g., switching activity and spatiotemporal correlations). Examples of power estimation techniques in this category are [10] - [12] . These techniques generally provide sufficient accuracy with low computational overhead; however, important effects such as signal slew rates, generation, and propagation of hazards/glitches cannot be properly captured by these techniques. In addition, probabilistic power estimation techniques tend to have difficulty in efficiently capturing the complete set of spatiotemporal correlations at the external inputs and the reconvergent fan-out structures in the circuit, which further reduces their estimation accuracy.
Dynamic techniques explicitly simulate the circuit under a "typical" input stream. They can be applied at both the circuit level and gate level. Their main shortcoming is, however, that they are very slow. Moreover, their results are highly dependent on the simulated sequence. A number of issues appear to be important for power estimation using dynamic techniques. The input statistics, which must be properly captured, and the length of the input sequences, which must be applied, are two such issues. Generating a minimal-length sequence of input vectors that satisfies these statistics is not trivial. The reason is the elaborate set of input statistics, which must be preserved or reproduced during sequence generation for use by power simulators.
B. Paper Organization
The remainder of this paper is organized as follows: Section II provides the theoretical background for the cycle decomposition of Markov processes. Sections III and IV present the application of this mathematical framework to two instances of computer-aided design problems, i.e., 1) low-power state encoding and 2) sequence compaction. Conclusions and a summary are given in Section V.
II. CYCLE DECOMPOSITION OF A MARKOV PROCESS (CDMP)
Let us start with a simple example of a cyclic process modeling the motion of a particle on a closed curve. Let us focus on the particle's motion through p points of this curve at moments that are one unit of time apart [cf. Fig. 1(a) ]. This leads us to a discretization of the curve into an infinite sequence of
. .) called a directed cycle with period p. If no disturbance occurs, the passing of the particle through (v i , v i+1 ) can be codified by an infinite binary sequence, i.e., where 1 (0) means that the particle is (is not) passing through
The sequence is understood as a "nonrandom" sequence in the context of "Kolmogorov's theory" of complexity since both 1 and 0 appear periodically after every p steps [21] . Consider a set of possibly overlapping cycles {C 1 , . . . , C r } where each cycle C i (i ≤ r) is associated with some positive number W (C i ) [cf. Fig. 1(b) ]. Imagine that at some instance of time, the particle appears at some point v that is common to t cycles, say C 1 , . . . , C t (t ≤ r). The particle may continue its way to another point v , which is the intersection point of m cycles (out of t cycles that had point v as an intersection point), say C 1 , . . . , C m (m ≤ t). A natural measure of the particle's transition when moving from v to v can then be defined as
Accordingly, the binary sequence y v,v codifying the transition of the particle from v to v is given by a "chaotic" sequence. Furthermore, since expression (2) provides transition probability from v to v of a Markov process ξ that behaviorally models the particle's movement, the following can be concluded: "A collection of cycles C, along with some weights assigned to each cycle, defines a Markov process ξ."
A. CDMP: Problem Statement
Motivated by the above example, we proceed with a formal definition of directed cycles and cycle-based representation of Markov processes.
Definition 1: A directed cycle over a countable set of states S is a periodic function C from the set Z of integers into S. Furthermore, C(i) is called a vertex of the cycle, and (C(i), C(i + 1)) is called a directed edge of the cycle.
Each cycle C belongs to an equivalence class of cycles C where C = {C |∀i ∈ Z, C (i) = C(t(i))}, where t is a translation function over Z. Two cycles belonging to the same equivalence class are called "equivalent."
Definition 2: For a cycle C, passage function J C is a binary function defined over the set of states S as follows:
The second-order passage function can be defined as
Definition 3: Let S be a countable set of states. Sequence ξ = (X n ) ∈ S (n ≥ 0) of random variables on the probability space Ω is said to be a homogeneous Markov process with state space S if for any n ≥ 0 and Y 0 , Y 1 , . . . , Y n+1 ∈ S, we have
Moreover, the Markov process ξ is called recurrent exactly if
where
ij are the single and n-step transition probabilities from state i to state j.
Based on the aforementioned definitions, Kalpazidou proposes the following general theorem for cycle decomposition of Markov processes (cf. [21, pp. 130-140] ).
Theorem 1: Let S be a finite set of states. Consider a homogeneous recurrent |S|-state Markov process ξ defined over a probability space with common invariant probability distribution p i , i ≤ |S|; then, there exists a finite set of weighted cycles C such that superposing the cycles will define ξ, i.e.,
where W (C) is a positive weight assigned to cycle C. p i and p ij are steady-state probability of state v i and conditional probability of transition from v i to v j , respectively. Proof: Given in [21] . Based on Theorem 1, the Markov process decomposition problem can be stated as follows.
CDMP Problem: Given a homogeneous recurrent |S|-state Markov process, find a set of weighted cycles C and their weights W (C i ) such that their superposition defines the Markov process.
Solutions to the CDMP problem can be classified as probabilistic or deterministic solutions depending on whether or not the weights on cycles are subjected to probabilistic interpretation or not. In the following sections, one solution from each class will be presented.
B. Probabilistic Solution to the CDMP Problem
In [24] , Qian and Qian presented a probabilistic approach for performing cycle decomposition. This approach is based on the fact that if we take an infinitely long random walk on the states of the Markov process, all possible cycles are identified. The weights for these cycles are then calculated by solving the set of equations in (5). An infinitely long random walk starting from a random state of the Markov process is initiated. The algorithm generate_cycle, which is shown in Fig. 2 , is then used to find all the cycles along the generated random walk. Given the walk V on the states of Markov process ξ, all the states are traversed one by one. If a previously visited state is encountered, a cycle is identified (cf. line 4). The sequence of states on the walk that makes up the cycle C are then replaced by the single state s. State s is the boundary state of cycle C, which means that cycle C is started and returned to state s along the given walk V of the Markov process ξ. The algorithm continues until all cycles are identified.
For example, consider a trajectory of ξ given by (a, b, c, d, b, d, c, e, f, c, e, a, . . .), applying the generate_cycle algorithm extracts a cycle set as shown in Fig. 3 . When all cycles are extracted, the set of linear equations (5) are solved by using any linear programming approach such as Gauss-Seidel method (cf. [25] ) to find the corresponding weights for each cycle.
It is obvious that this approach to cycle decomposition requires nonpolynomial time to find all possible cycles of a given Markov process since the number of simple cycles is combinatorial in a number of states in the Markov process. It is worth noting that cycle weights found using this approach are unique and independent of the ordering in which the cycles were generated. The weights have a probabilistic interpretation, in the sense that W (C) is equal to the expected number of times that C appears along an infinitely long sample path. In the next section, we will propose a deterministic approach to cycle generation with polynomial time complexity in a number of states in the Markov process.
C. Deterministic Solutions to the CDMP Problem
In [21] , Kalpazidou presented a deterministic approach to the CDMP problem as follows: Consider a homogenous and recurrent Markov process ξ. Pick an arbitrary state v i . Since the process is recurrent, there exists at least one state v j such that the transition probability from i to j, i.e., P ij , is nonzero. Pick this new state v j and repeat the procedure. Since the number of states is finite, a cycle will finally be created. Set the weight for this cycle to the minimum probability of any transition on the cycle and decrease the probability of each transition on the cycle by this weight. Proceed in a similar manner to process other cycles until no nonzero probability transition is left.
Theorem 2:
Let ξ be a homogenous and recurrent Markov process and C be the set of weighted cycles generated using the aforementioned deterministic approach. The set of cycles in C is a decomposition of ξ that satisfies Theorem 1.
Proof: Following the cycle generation procedure mentioned previously, in each iteration, after finding a cycle, the total transition probability of the edges on that cycle is reduced such that the total transition probability of the minimum-weight edge becomes zero. Now, if the minimum-weight edge happens to be edge ij, then p i · p ij − w(ij) = 0, and ij will be removed from the Markov chain. However, more likely, the minimumweight edge is some other edge, say kl, and thus, the weight of edge ij after extracting this cycle will be
After extracting a number of cycles, the weight of target edge ij is finally reduced to zero. In this way, the total transition probability of each edge is distributed between different cycles that are extracted in each iteration, and therefore, we have
Notice that the cycles generated by the deterministic approach are not unique and depend on the policy for selecting the next state. Moreover, even if the same set of cycles is generated, the weight for each cycle will depend on the order in which the cycles were generated.
Theorem 3: Let ξ be a homogenous and recurrent Markov process and C be the set of weighted cycles generated using the aforementioned deterministic approach, then |C| is of O(|S| 2 ) where |S| is the number of states in ξ.
Proof: Because in each iteration of the deterministic approach, after extracting a cycle, weight of at least one edge becomes zero, |C| is upper bounded by the number of edges in the Markov process that is O(|S| 2 ), Thus, the algorithm can have O(|S| 2 ) iterations in the worst case. To summarize, the cycle decomposition of Markov processes can be done in two ways. 1) A probabilistic approach that requires the enumeration of all simple cycles in the given Markov process. This enumeration requires exponential time in terms of the size of Markov process and results in cycle weights that are equal to the probability of visiting the cycles in any random walk along the given Markov process. 2) A deterministic approach that produces a polynomial size set of cycles in polynomial time in terms of the size of Markov process. In this approach, cycle weights no longer carry the probabilistic meaning but satisfy the requirements of Theorem 1. This deterministic solution is a generic procedure without any cost function to optimize. To utilize the cycle representation of FSMs for design optimization, it is important to develop a cost-aware deterministic cycle decomposition algorithm. More precisely, we must augment Kalpazidou's algorithm to select the "right" set of cycles for the target application. One such specialization is presented in Fig. 4 .
In this algorithm, the weight of each edge is the total transition probability of that edge, i.e.,
The algorithm starts with the edge with maximum weight in line 3 and then repeatedly picks the maximum weight edges from the set of remaining edges (line 7). This process is repeated until a cycle is generated (line 8); the cycle weight will then be set to the minimum of all the edge weights on the cycle (line 10), and the weights of all the edges on the cycle will be decreased by the cycle weight (lines [12] [13] [14] . This process is repeated until there are no more edges with nonzero weights.
When a cycle is formed by successively marking maximumweight edges as per Dtr_CD algorithm, the cycle does not necessarily include all of the marked edges. Consider, for example, the following sequence of maximum-weight edges in a five-vertex graph: 12, 34, 15, 35, and 25. The cycle formed here for the first time is 1251. Clearly, all edges are maximumweight edges; yet, only three of the five marked edges are included in the cycle. Assume that the minimum-weight edge is 12. Subsequently, the weight of this edge becomes zero; the weights of edges 15 and 25 are reduced by w(12), whereas the weights of edges 35 and 34 remain unchanged. The reason we choose edges of maximum weight may be explained as follows: During the state-encoding step, the extracted cycles are sorted according to their weights. Next, the cycle of maximum weight is identified and encoded. The states in this first cycle will therefore have the maximum flexibility in assuming any Gray code sequence, which means that the expected switching activity for transitions along the cycle (which have the highest occurrence probability) is minimized. This corresponds to the minimization of the MWHD during state assignment.
III. LOW-POWER STATE ASSIGNMENT
FSMs can be described by a six-tuple ψ(X, Y, S, s 0 , λ, η) , where X is the set of input symbols, Y is the set of output symbols, S is the set of states, s 0 is the initial state, λ : X × S → Y is the output function, and η : X × S → S is the next state function.
From a probabilistic point of view, an FSM, ψ, can be described by a Markov chain in which p i is the probability of being in state s i , and p ij is the conditional probability of transition from state v i to state v j . The p ij values are obtained from input sequence statistics and the next state function of the FSM, or they can be calculated based on output trace for simulation of typical input sequences to the FSM. The p i values are in turn calculated by solving the Chapman-Kolmogrov equations [22] , [23] .
The state transition graph of an FSM is a vertex/ edge-weighted directed graph G(V, E), where the set of vertices V and set of edges E correspond to the states of the FSM and transitions between them, respectively, i.e.,
The weight assigned to each vertex v i is the state probability p i , and the weight assigned to each edge (v i , v j ) is the probability p ij in the Markov process. Assume states v i and v j are encoded using binary strings b i and b j , respectively. The transition from state s i to state s j will have a switching activity equal to d ij , the hamming distance between b i and b j . Since dynamic power consumption is directly related to switching activity, and state transition in an FSM corresponds to the switching of the state bits, state encoding will have a major effect on the power consumption. The goal is to perform the state assignment in such a way that state transitions with higher probability take place within a smaller switching activity on state bits. The objective function to minimize would then be
Implementation of an FSM is usually done using D flipflops. The input to the flip-flops would then be D = η (x, v) , where x is the input and v is the present state.
In our proposed technique for low-power state encoding, the FSM is first decomposed into a set of cycles. These cycles are then encoded to minimize the total switching according to the cost function in (7) . Once the cycles are encoded, the entire FSM will be implemented using D flip-flops.
A. Cycle-Decomposition-Based Encoding
Having described a new mathematical framework for Markov process cycle decomposition, we can now proceed with the solution to the low-power state-encoding problem. As mentioned earlier, the technique for state assignment proposed here is based on the decomposition of FSM into a set of cycles. Due to the high complexity of the probabilistic method (i.e., the number of cycles in that approach can grow exponentially in the number of states of the FSM), the deterministic method will be employed here to generate the cycles.
After decomposing the FSM into a set of cycles, the CD_based_encoding algorithm of Fig. 5 is employed for the state assignment step.
After all of the cycles are generated in line 1 of the algorithm, they are sorted according to their weights (line 2), and then, a table of all Gray codes for the minimum required bit count is generated by Generate_Gray_code_table (line 3). The number of such Gray codes will be 2 lg 2 |S| . The cycles will then be encoded one by one according to the sorted order by using the Encode_cycle algorithm. This algorithm assigns the codes to the states on the cycle in such a way that the hamming distance of each state from its neighboring states is minimized. However, this is not feasible for those cycles whose states are partially encoded as part of a previously encoded cycle (line 7). In fact, when the number of previously encoded states in a cycle is sufficiently large, it makes more sense to switch from a Graycoding scheme to an MWHD algorithm, which is precisely what CD_based_encoding does for the few unencoded states in these cycles (lines 9 and 10).
Gray codes are used to encode states in a cycle because these codes are the optimal solution with respect to the measured cost function, i.e., the MWHD. Consider a table of Gray codes shown in Fig. 6 ; codes are divided into two sets. A code is a "high code" if its most significant bit (MSB) is 1, and it is a "low code" if its MSB is 0. A table of 2 n+1 Gray codes can successively be constructed from a table of 2 n Gray codes applying a simple procedure. 1) Write a Gray code table.
2) Concatenate the table above with a copy of itself, written in reverse order. 3) Add a 0 as the MSB of the entries in the first half of the table and a 1 as the MSB of the second half. When constructing a Gray code table according to this simple procedure, the low codes are always on top, whereas the high-codes are at the bottom. The line, which separates the high codes from low codes, is called the "middle line."
The only difference between high and low codes with equal distances from the middle line is the MSB; thus, they have a hamming distance of 1. Moreover, each code differs from its neighboring codes only in one bit (this is the well-known Gray code property). Given a cycle C, we can encode it optimally by following a "ping-pong" movement in the Gray code table starting from the very first high code under the middle line and choosing the codes in high, low, low, high, high, low, . . . order as shown in Fig. 7 . Fig. 8 shows the Encode_cycle heuristic algorithm for encoding cycles. For cycles with all unencoded states, it makes no difference which state to start from for encoding. However, for cycles where some of the states have already been encoded, the start state is indeed important. In line 1, Find_best_rotation returns the beginning of the largest consecutive sequence of previously unencoded states in the cycle. For example, if cycle C has ten states from which two consecutive groups of size three and four states are not yet coded, Find_best_rotation returns the state at the start of unencoded state sequence of size 4. The high and low codes are selected as candidate codes for the yet unencoded state (lines 3 and 4) . Next, Find_best_code is used to compare the cost for each of the two candidate codes and to pick the best one (line 5). The process is continued, traversing all states starting from the "strt" state until all unencoded states in the cycle are visited (line 2).
Example: Fig. 9 (a) shows an FSM with six states and its transition probability matrix. Algorithm Dtr_CD chooses the edges with descending transition probability until a cycle is formed. In this example, the first cycle detected is the cycle C1 [cf. Fig. 9(b) ] since transitions St0 → St1 and St0 → St1 are the two most probable transitions. Next, the algorithm finds the minimum transition probability in the cycle, which is St0 → St1, and reduces the transition probability of all edges with that value. This would eliminate edge St0 → St1 and reduces the transition probability of edge St0 → St1 to 0.0154. The algorithm continues until there are no more edges left. Fig. 9(b) shows the set of cycles generated using this algorithm. Next, Encode_cycle algorithm is used to encode each cycle. First, cycle C1 is selected for encoding because it has the maximum weight; this cycle is encoded using the codes 010 and 110 (cf. Fig. 7) . Then, cycle C2 is selected since, in this cycle, state St0 has been encoded in a previous iteration, the algorithm starts from the next unencoded state, i.e., St5, and picks the best code with respect to code of St0 to encode St5, which is 011. Next, the algorithm encodes C3, i.e., St3 is encoded with respect to St5, resulting in code 111. For cycle C4, we note that states St0, St5, and St3 are previously encoded; therefore, Find_best_rotation selects St2 as the start state for C4 since this is the beginning state of the largest unencoded sequence in this cycle. The algorithm then traverses the cycle and encodes each state. St2 state is encoded as 101, and state St4 is encoded as 001. Since there are no more unencoded states, the algorithm stops at this point. Note that the set of cycles and the order in which the cycles are being coded are very important, for example, if we change the encoding order to be C5, C4, C3, C2, C1, and C6. By the time we reach cycle C1, the remaining available codes for state St1 have at least a hamming distance of two compared to the codes of state St0, but this cycle is the most probable cycle, which implies that the resulting low-power code is not good. On the other hand, the set of cycles generated by Dtr_CD are the best for this problem because this algorithm chooses those cycles that are most probable in typical operation of the FSM machine.
B. Experimental Results
The cycle-based encoding algorithm was implemented in C and run on an IBM IntelliStation with a 730-MHz Pentium III processor and 256-MB memory to generate the experimental results. The total WHDs for a number of FSM benchmark circuits and for different encoding techniques are reported in Table I . For these results, we assumed uniform external input distribution and used (7) to calculate the WHD value for each state machine. The purpose of this table is to demonstrate the relative efficiency of the cycle-based encoding algorithm compared to a genetic-search-based algorithm proposed in [26] .
The first column provides names of the FSM circuits, which are all selected from LGSynth89 or ISCAS89 benchmark sets. The largest circuit used for generating experimental results has 256 states. Column 2 shows the number of states in each FSM. Columns 3 and 4 report the average switching activity per state bit line and the runtime (in seconds) for the proposed cycle-based state assignment (i.e., the CD_based_encoding algorithm). All FSMs were encoded in the order of 1 s or less. Columns 5 and 6 report the average switching activity and runtime for a genetic search algorithm that was implemented to calculate the low-power state assignment based on the MWHD cost. For these results, parameter t (cf. line 7, Fig. 5 ) was set to 40% for all of the above experiments. The results show a significant speedup compared to the genetic search algorithm with nearly the same quality of results. The case of "s208" is a notable exception, where we obtain more than 70% reduction in the WHD metric. The reason for this significant improvement is that s208 is indeed a 256 state counter. As a result, we perform much better than the genetic search algorithm (the quality of genetic algorithm solution may improve if it is given more computation time). Column 7 shows the WHD result of the approach presented in [4] , whereas column 8 shows the WHD result if the transition selection process in the deterministic cycle decomposition algorithm is performed at random instead of based on the probabilities. Table II shows the post-mapping area and power consumption of the FSMs using genetic-search-generated state codes versus codes generated by CD_based_encoding algorithm.
Relative accuracy of WHD cost function can be noticed by considering Tables I and II side by side.
IV. SEQUENCE COMPACTION FOR POWER SIMULATION
The statistics of input vectors heavily impacts the power dissipation in a combinational circuit [1] . Therefore, to obtain accurate power estimates for a circuit, it is required to use a vector sequence, which captures the typical application data. Unfortunately, the length of such a sequence can be quite large. Thus, it is important to find ways to construct or otherwise extract a subsequence of much shorter length that can be simulated by utilizing a low-level simulation engine to provide highly accurate power estimates of a circuit. Two approaches have been developed to address this problem: 1) probabilistic compaction and 2) statistical sampling.
An approach for reducing the power simulation time is to compact the given long stream of bit vectors using probabilistic automata [13] . The idea is to build a stochastic state machine (SSM), which captures the relevant statistical properties of a given long bit stream, and then excite this machine by a small number of random inputs so that the output sequence of the machine is statistically equivalent to the initial one. The relevant statistical properties denote, for example, the TABLE I  AVERAGE SWITCHING ACTIVITY   TABLE II  GENETIC VERSUS CYCLE_ENCODE_CYCLE signal and transition probabilities and first-order spatiotemporal correlations among bits and across consecutive time frames. The procedure then consists of decomposing the SSM into a set of deterministic state machines and realizing it through SSM synthesis with some auxiliary inputs. The compacted sequence is generated by uniformly random excitement of such inputs.
Another algorithm for vector compaction is presented in [14] . The foundation of this approach is also probabilistic in nature: It relies on adaptive (dynamic) modeling of binary input streams as first-order Markov sources of information. The adaptive modeling technique itself is best known as dynamic Markov chain modeling. A hierarchical technique for compacting large sequences of input vectors is presented in [15] . The distinctive feature of this approach is that it introduces hierarchical Markov chain modeling as a flexible framework for capturing not only complex spatiotemporal correlations but also dynamic changes in the sequence characteristics such as different input modes. Other approaches such as [16] use spatiotemporal correlation of the input sequence as a cost function for generating a new input sequence, which can estimate the average power within reasonable error bounds. Macii et al. [17] approximate discrete Fourier transform of hamming distance between consecutive vectors in the input sequence to generate a new sequence that captures the original sequence's statistical properties. Pinar and Liu [18] present a different approach based on a graph model to transform the sequence compaction problem to that of finding the heaviest weighted trail in a directed graph. The authors then present a heuristic based on minimum cost flow to solve the problem.
Burch et al. [19] and Ding et al. [20] use statistical sampling techniques as another approach for solving the sequence compaction problem; these techniques use an input model based on a Markov process to generate the input stream for simulation. The simulation is performed in an iterative manner. In each iteration, a vector sequence of a fixed length (called sample) is simulated. The simulation results are monitored to calculate the mean value and variance of the samples. The iteration terminates when some stopping criterion is met. This approach suffers from two shortcomings. First, the required number of samples, which directly impacts the simulation runtime, is approximately proportional to the ratio between the sample variance and the square of the sample mean value. For certain input sequences, this ratio becomes large, thus significantly increasing the simulation runtime. Second, there is a general concern about the normality assumption on the sample distribution. Since the stopping criterion is based on such assumption, if the sample distribution significantly deviates from the normal distribution, the simulation may terminate prematurely.
This section presents a new sequence compaction technique to improve the efficiency of dynamic power simulation techniques. The proposed approach is based on CDMP that models the input vector. Average power dissipation is subsequently calculated by simulating each decomposed cycle of the Markov process. The mathematical framework for the proposed sequence compaction technique is different from the aforementioned techniques, and the proposed algorithm results in higher compaction ratios for the same level of evaluation accuracy.
A. Sequence Compaction Problem
As mentioned before, probabilistic behavior of input vectors heavily impacts the power dissipation in a combinational circuit. In our model, any input combination is regarded as a state in the Markov process. Transition from one input combination to the next one is modeled by a transition between the corresponding states in the Markov process. The probability of each transition in the Markov process indicates the frequency of the corresponding input change. Associated with each transition is a value that represents the power dissipation in the circuit due to input vector change. Therefore, the average power dissipation under the modeled input vector (q ave ) is calculated as
where p ij is the probability of a transition from state i to state j, and q ij is the power dissipation in the circuit due to the corresponding input change. In this section, we first show how cycle decomposition of Markov processes enables compacting a large vector sequence that is to be applied to a circuit. A string of length k is a stream of k consecutive vectors in the original input sequence. A cover of an input sequence is a set of strings of different lengths such that the original input sequence can be constructed with repetition and/or concatenation of different elements of this set. Size of a cover is defined as summation of lengths of all strings in that cover, e.g., {abc, bdc, bcb, ca} is a cover of size 11 and abc is a string of length three.
Constrained Minimum Sequence-Covering (CMSC) Problem: Given an input sequence and an integer L min , find a cover C of the input sequence such that
and Size(C) is minimized. A sequence compaction problem can be formulated as a CMSC problem where L min is 2, since each string should cover at least one transition in the input sequence. The average power consumption of the circuit is then calculated by
where N s is the number of times string s is repeated, and P ave is the average power consumption of circuit due to string s. The CMSC problem can be shown to be NP-complete by reducing the satisfiability problem to it. The Markov process modeling this input sequence is shown in Fig. 10 . Simulation of this sequence will require 15 input transitions: a → b, b → c, c → b, etc. Following this sample path on the Markov process, one can easily see that the whole path can be represented by summation of cycles C1, C2, and C3, where C1 is repeated twice and C2 is repeated three times. Therefore, the total power dissipation along the previous input transitions is equal to the power dissipation along C3, two times C1, and three times C2, calculation of which requires only eight transitions.
B. Proposed Compaction Algorithm
Consider a sequence of input vectors applied to a combinational circuit. If we assume that the input sequence is long enough to accurately model the "typical" application data, then the Markov process generated using this input sequence is an accurate stochastic representation of the circuit inputs.
In practice, since the Markov process is generated based on the given input sequence and all the state and transition probabilities are calculated accordingly, it is assumed that the input sequence captures the stochastic behavior of Markov process in the sense that the input sequence is sufficient to provide cycle decomposition for the Markov process. Therefore, we can use the probabilistic approach described in Section II-B for the walk corresponding to the input sequence to generate the cycles that decompose the Markov process. Note that the deterministic algorithm can also be used for this purpose. However, typically, only the input sequence is given for power estimation purposes and not its generating Markov chain, which is required by the deterministic algorithm. Therefore, to use the deterministic algorithm, one is required to solve the Chapman-Kolmogrov equations (cf. [22] and [23] ) to get the generating Markov chain, making the probabilistic solution more appealing.
We first prove that the cycles generated along the walk corresponding to the given input sequence provide a decomposition of Markov process and then show how these cycles are used to calculate the average power consumption in the circuit. Notice that no a priori knowledge of the circuit structure is needed for the input vector compaction algorithm. This is in sharp contrast with statistical sampling techniques, which require the circuit net list.
Theorem 4: Assume that a sequence of input vectors for a combinational circuit is given and that a Markov process ξ is generated based on the given sequence. The probabilistic approach applied to the given input vector sequence produces a cycle decomposition of Markov process ξ with the following weights:
where W C k is the weight assigned to cycle C k , N C k is the number of times cycle C k appears along the walk, and |C l | is the length of cycle C l , i.e., the number of transitions on that cycle. Proof: For the set of cycles with the above weights to be a decomposition of the Markov process, we have to show that [cf. (5) ] The left-hand side of the above equation is equal to
where N (i, j) is the number of times that transition from i to j appears along the walk, and the denominator is basically the length of the walk itself. The right-hand side of (12), on the other hand, is equal to
Note that the second factor in (14) is exactly equal to the number of times that transition from state i to state j occurs along the walk, i.e., N (i,j) . Therefore, (12) holds for the cycle weights defined in (11) . The second equality in (5) holds since we have
and the proof is complete. The cycles and corresponding weights depend only on the walk itself-represented by the given input vector sequenceand not the Markov process. Therefore, the cycle decomposition can be obtained without having an explicitly constructed Markov process. Now that the cycles and their weights are generated, we show how the average power consumption can be calculated.
Theorem 5: Consider a combinational circuit and a sequence of input vectors applied to this circuit. Let C be a cycle decomposition based on the given input sequence with weights assigned as in (11) . The average power dissipation of the circuit under the given input sequence is then equal to
where q ave is the average power dissipation, and q C k is the average power dissipation in cycle C k per transition. Proof: To provide a proof, we reformulate the righthand side of (15) as a summation over transitions rather than cycles, i.e.,
Therefore
It should be noted that w C k · |C k | indicates the probability of being in cycle C k . Since each transition in the Markov process can be repeated along the walk and may be part of several cycles, the probability of being in a specific cycle is not the same as the summation of transition probabilities over all transitions on that cycle. The probability of being in a cycle depends on the number of times that cycle is repeated and on the length of cycle, i.e.,
The numerator in the above equation is the number of transitions along the walk that are part of cycle C k , whereas the denominator is the total length of walk. Therefore, the ratio gives the probability of being in cycle C k . One can now easily verify that p C k = 1 and that the average power dissipation may be calculated as
Based on the above theorems, a vector compaction algorithm for efficient dynamic power evaluation is proposed in Fig. 11 . In line 1, all the cycles corresponding to the given input vector V are generated using the probabilistic approach. In line 2, some of the cycles are dropped by the Prune_cycles algorithm to achieve a higher compaction ratio; then, the average power consumption is calculated using (3).
According to (5) , eliminating some of the cycles changes the probabilities in the Markov process that models the input sequence. Marculescu et al. [14] shows that by bounding perturbations in Markov process probabilities, one can bound the error in average power estimation. Dropping cycles with small weights minimizes perturbations in the Markov process probabilities and thereby minimizes the estimation error. To achieve this in algorithm Prune_cycles (cf. Fig. 11 ), first, all cycles are sorted by their descending weights (line 2). Next, based on this ordering, cycles are selected until the compaction ratio reaches a user-specified level r (lines 6-9). After dropping some of the cycles, weights of the other cycles are updated in lines 10-12. This is done because cycle elimination is equivalent to dropping some of the transitions from the given input sequence, which in turn influences the cycle weights according to (18) . Finally, the average power consumption is calculated from the updated weights.
C. Experimental Results
The proposed algorithm is implemented in A System for sequential circuit synthesis (SIS) framework. Average power consumption calculation is performed using an in-house gatelevel logic simulator under SIS. To make a fair comparison with [14] , we used the same input vector sequence (4000 vectors) to generate the results reported in Table III . Column 1 shows the name of the circuit. Each circuit was first optimized by using script.rugged and then mapped to a 0.25-µm technology library. The average power consumption for all 4000 vectors is reported in the third column. The proposed compaction algorithm was then used to achieve the results in columns 4, 5, and 6 with compaction ratios of two, five, and ten, respectively. The average error for the compaction algorithm using hierarchical model presented in [14] is also reported for the sake of comparison. It is interesting to note that skipping the pruning step and considering all the cycles has resulted in a compaction ratio of two with no error in average power estimation. More precisely, this means that the number of transitions after cycle decomposition and compaction is already two times smaller than the original number of transitions. Therefore, if we do not go through the cycle pruning step, then we will not incur any error in calculation of the power consumption and yet will achieve a compaction ratio of two.
Another set of experimental results is shown in Table IV for an in-house generated sequence of length 100 000, which is a counter sequence with the sequence restarting at a random number after a random number of vectors. The procedure for generating the results was the same as the procedure used to generate the first set of results.
The results clearly show the effectiveness of the proposed method in estimating the average power consumption of a circuit. The exact power estimate can be obtained with compaction ratio of two. Maximum error is less than 3%, and average error is less than 2% when the compaction ratio is five. For the case where the compaction ratio is ten, the maximum and average errors are bounded by 5% and 3%, respectively. Actual error values for different circuits are shown in Table V. V. CONCLUSION This paper presented a sound mathematical framework for Markov Models. The new formalism is based on the cycle decomposition of the Markov processes. The key result is that a Markov process can be decomposed into a collection of directed cycles with positive weights proportional to the probability of their traversal in a typical random walk. In one application, we proposed a new state assignment technique to reduce dynamic power consumption in FSMs. The proposed encoding algorithm reduces power consumption by an average of 15%. In a second application, we studied the vector compaction problem using the new mathematical framework. The key result was that a Markov process can be decomposed into a collection of directed cycles, and therefore, the input compaction ratio can be improved by an order of magnitude or higher by exercising these cycles exactly once and then calculating the total power consumption by using the corresponding weights of those cycles. Experimental results on a large number of vector sequences and test bench circuits were presented to demonstrate the effectiveness of the proposed approach. We envision a number of other applications in network and system design and analysis that can benefit from the proposed mathematical formalism. Examples include data aggregation in distributed sensor networks and low-power on-chip bus encoding.
