We present techniques to transform scheduled descriptions of control-flow intensive designs to facilitate power management. We investigate the factors that inhibit the application of power management in synthesized RTL implementations. Based on these insights, we present transformation techniques based on the concepts of variable protection, variable re-naming and re-assignment, and limited controller state memory insertion that result in inherently powermanaged architectures. Our transformation techniques can be easily used in conjunction with any existing resource sharing algorithm or in the framework of existing high-level synthesis tools. Experimental results on control-flow intensive designs indicated reductions of upto 76.6% (35.6% on average) in power consumption at area overheads not exceeding 10.1% (1.1% on average) over already poweroptimized designs.
Introduction
Power dissipation in today's circuits is dominated by the dynamic component, which is incurred whenever signals in the circuit undergo a logic transition. In practice, a large fraction of the transitions incurred during the operation of typical circuits are unnecessary, i.e., they have no bearing on the final result computed by the circuit. Equivalently, not all parts of a circuit may need to function during each clock cycle, i.e., some components may be idle each clock cycle. Recognizing this fact, several low-power design techniques have been proposed that are based on the idea of suppressing or eliminating unnecessary signal transitions. We use the term power management to refer to such techniques in general. Applying power management to a design typically involves two steps:
Identifying idle conditions for various parts of the circuit. Re-designing the circuit in order to eliminate switching activity in idle components, hence avoiding unnecessary power dissipation in those parts. Power management has been one of the most popular techniques used by designers of low-power VLSI systems [1, 2] . Many modern microprocessors have adopted the technique of gated clock signals in order to reduce unnecessary power dissipation [3] . Various automated synthesis techniques to apply clock gating and maximize its efficiency have been proposed, including gated-clock FSM synthesis [4] , synthesis of circuits that use multiple non-overlapping clock domains [5] , and clock tree construction to facilitate clock gating [6] . Gating clock signals may result in power savings in the clock network, as well as in logic blocks that are fed by registers that are prevented from loading unnecessary values as a result of clock gating. Pre-computation [7] is a power management technique that involves selectively computing the output values of the circuit, one or more cycles in advance, using much simpler circuitry than the original circuit itself, and using the pre-computed values to reduce internal switching activity in the succeeding clock cycle. The above techniques are applicable only to unused/idle blocks of logic that are fed by registers, i.e., they are not applicable to circuit blocks embedded within the combinational logic. Operand isolation [8, 9, 10] is a technique that can be used to save power consumption in idle embedded circuit blocks by disabling transitions at their inputs. In operand isolation, transparent latches are inserted at all the inputs of an embedded logic block, and control circuitry is added to detect the idle conditions for the block. When the block is not required to perform any useful operation, the transparent latches retain the previous cycle's values, avoiding unnecessary power dissipation in the idle block. An automated logic-level technique, called guarded evaluation, that detects idle sub-circuits on a per-clock-cycle basis, and inserts transparent latches to perform operand isolation was presented in [8] . The operand isolation techniques described above involve the addition of extra circuitry, including transparent latches and possibly some logic to generate their enable signals. Controller-based power management [11] is a low-overhead power management technique that is based on minimally re-designing the existing control logic in order to reconfigure the multiplexer networks and functional units in the data path to minimize unnecessary switching activity.
All the techniques described above incorporate power management in a given register-transfer level (RTL) or logic-level circuit. Hence, they are limited by the underlying structure of the initial circuit. Some techniques to modify the high-level synthesis process to result in inherently power-manageable designs have also been investigated. Techniques for partitioning the register/memory constructs in a behavioral description (e.g., variables, arrays) into physical registers/memory blocks, with an aim of maximizing opportunities for power management, were presented in [12] . The integration of operand isolation into high-level synthesis was examined in [10] , and found to have potential for significant power savings. A scheduling algorithm to enable power management, that attempts to schedule conditional operations before operations that depend on them in order to minimize unnecessary operation executions, was presented in [13] . The technique presented in this paper can also be classified under high-level synthesis optimization that enables power management. However, the key differences of our technique with respect to the above techniques are as follows. The technique of [12] targeted power reductions in the clock network and memory elements, while our technique targets power reduction in the data path logic. In addition, the techniques of [10] and [12] were primarily targeted towards data-flow intensive designs, and are inapplicable to designs that contain a significant mix of control flow, as described later in this section. Our technique can handle control-flow intensive designs. Finally, our technique works on scheduled designs, and hence can be applied in conjunction any scheduling technique, including the technique presented in [13] .
Consider a behavioral description where scheduling and resource sharing have been performed by a generic behavioral synthesis tool. Consider a functional unit (FU) in the RTL implementation. During the control steps (referred to as schedule or control states for control-flow intensive designs) in which the FU performs some computation from the behavioral description, the FU is said to be active. During other control steps, the FU is said to be idle. Though an FU need not perform any computation in its idle control states, the inputs to the FU may change values in the RTL implementation, causing unnecessary power dissipation in the FUs as well as the other logic (e.g., multiplexers) surrounding them. The techniques that may be used to eliminate such unnecessary power dissipation in the FUs are:
Operand isolation by inserting transparent latches at the inputs of FUs and disabling the transparent latches under conditions when the FUs are idle. The conditions under which the FUs are idle may be obtained by analyzing the behavioral description and high-level synthesis information [10, 14] . However, this approach is inapplicable to control-flow intensive designs due to the following characteristics. Power consumption is dominated by an abundance of smaller components like multiplexers, while functional units may account for a small part of the total power [15] . The power overhead due to the insertion of transparent latches is comparable to the power savings obtained when power management is applied to sub-circuits such as multiplexer networks. The signals that detect idle conditions for various sub-circuits are typically late-arriving (for example, due to the presence of nested conditionals within each controller state, the idle conditions may depend on outputs of comparators from the data path). As a result, the timing constraints which must be imposed in order to apply conventional power management techniques (the enable signal to the transparent latches must settle before its data inputs can change) are often not met. Finally, the presence of significant glitch-ing activity at control as well as data path signals needs to be accounted for in order to obtain maximal power savings. Re-designing the control logic may significantly reduce (but may not completely eliminate) unnecessary switching activity at the inputs of functional units as well as internal signals of multiplexer networks. However, the extent to which this is possible is very much limited by the given structure of the RTL circuit, due to the following factors: (i) the register sharing process may cause new values to be written into registers making it impossible to reduce switching activity in idle FUs by merely re-designing control logic, and (ii) controller-based power management can fail to find a good solution when a control state during which an FU is idle has several incoming transitions from other control states that require totally different values to be preserved at the inputs of the FU. In this paper, we present transformations for scheduled behavioral descriptions (functional RTL designs) with functional unit assignment information, which significantly improve the power savings achieved when compared to RTL power management techniques such as controller-based power management alone. We demonstrate that the following factors may inhibit power management:
The sharing of registers by multiple variables, and multiple assignments to and uses of a variable in the behavioral description can significantly affect the unnecessary power dissipation in the data path. Different sequences of states (paths across cycle or state boundaries in the scheduled description) may impose conflicting requirements on the values that need to be preserved at the inputs of FUs. This may translate into conflicting requirements on the variables that need to be stored in a register, and conflicting requirements on the values of control signals. We demonstrate that the above phenomena can cause significant unnecessary power dissipation in FUs in spite of applying a conventional power management scheme. We introduce techniques to address the above problems, based on the following insights:
We introduce a transformation called variable protection that, when applied to scheduled behavioral descriptions, selectively extends the lifetimes of variables along paths in the schedule in order to ensure that operand values used by the last "useful" operation on a functional unit are preserved at its inputs during subsequent idle cycles. In situations where a protected variable is (re-)assigned to a new value by an operation along a protected path, we demonstrate how to use variable re-naming and re-assignment in order to facilitate power management. We show how to address conflicting requirements of overlapping paths in the schedule by statically re-naming variables in the scheduled behavioral description and adding limited memory of the controller state history. We demonstrate the importance of performing the above optimizations selectively on the most "critical" paths of the schedule in order to minimize overheads and hence maximize the power savings. We present algorithms to automatically apply the above techniques, given a scheduled behavioral description. In order to evaluate our techniques, we have incorporated them into an existing RTL power management technique, namely controller-based power management. Experimental results demonstrate that the techniques presented in this paper significantly enhance the opportunities for power management. Experimental results on control-flow intensive designs indicate reductions of upto 76.6% in power consumption at area overheads not exceeding 10.1%, over circuits produced using only controller-based power management.
The rest of this paper is organized as follows. Section 2 introduces our power management methodology, and illustrates the contributions of this paper through several examples. Section 3 presents the details of the technique and algorithms for the application of our transformations to scheduled behavioral descriptions. Experimental results are presented in Section 4, and conclusions in Section 5.
Transformations to facilitate power management
We first motivate the need to transform scheduled behavioral descriptions to facilitate power management. The technique of variable protection proceeds as follows. Consider FU MUL1 and the path S0; S1; S2; S3 in the schedule. The FU performs operation a b in state S0. In order to avoid unnecessary power dissipation in the FU along the selected path, it is necessary to preserve or protect the values of the FU's operands from state S0 (a and b) to state S1, and similarly to protect the FU's operands from state S2 (g and g) to state S3. This is performed by extending the lifetimes of the variables, if necessary, to ensure that their values are retained long enough in the registers to which they are assigned, and ensuring that the multiplexers at the FU's inputs choose the appropriate register in states S1 and S3. A restricted version of this technique, applicable only to data-flow intensive designs, was presented in [16] .
It is important to note that while variable protection enables power reduction in data path components such as FUs and the multiplexers around them, it may result in larger storage (register) requirements (since variables that are otherwise not required in some states may need to be kept alive in registers). In general, this may lead to an increase in power consumption in registers and the clock distribution network. Hence, it is important to perform variable protection selectively and judiciously in order to ensure that the power overheads are minimal and do not compromise the power savings obtained. State transition graphs (STGs) are generally used to represent scheduled behavioral descriptions. For control-flow intensive designs, STGs may be much more complex due to the abundance of nested data-dependent loops and conditional constructs in the behavioral description. For such STGs, variables that are generated and used in non-overlapping paths (or sub-graphs) of the STG may share the same register. When variable protection is applied along all paths in the STG, it inevitably results in a significant increase in the number of registers and hence incurs power overheads. On the other hand, it may often be possible to apply variable protection only along selected paths in the STG. In doing so, we can exploit the following factors:
Registers which are not utilized to store variables from the schedule in some states may be used to store protected variables, leading to minimal overheads. We can restrict our focus to the high-probability STG paths in order to ensure that maximal power savings are obtained.
Example 2 :
Consider the partial schedule shown in Figure 2 (a).
Let us suppose that all the add (multiply) operations shown in the schedule are assigned to a single adder ADD1 (multiplier MUL1)
in the RTL implementation. Figure 2 (b) shows the lifetimes of the variables (each variable's lifetime is a collection of states during which its value must be stored in a register), and a compatibility graph 1 for the variables. A partitioning of the compatibility graph into cliques, which corresponds to a valid register assignment [17] , is shown in Figure 2 (b) by using undashed lines to indicate the cliques (fag,fb,c,e,f g,fdg). In the RTL implementation, adder ADD1 consumes unnecessary power in states S1, S3, and S6 due to the overwriting of b with c, and e with f, in a register that feeds it. Similarly, MUL1 consumes unnecessary power in states S0 and S5.
In order to eliminate the above unnecessary power dissipation, we applied the technique of variable protection along both the paths in the STG (S0; S1; S2; S5; S6 and S0; S3; S4; S5; S6). The modified lifetimes of variables resulting from variable protection are shown in Figure 2 (c), along with the corresponding compatibility graph. For example, variable b, which is an operand of ADD1 in state S0, is protected, resulting in its lifetime being extended to include states S1, S3, and S4 as well. As can be seen from the compatibility graph, the resulting circuit requires four registers (fag,fb,f g,fc,eg,fdg); an overhead of one data path register as compared to the original design.
The above overhead can be avoided as follows. Upon analysis of the state transition probabilities in the schedule STG (obtained using a simulation with several typical input traces), it was observed that the probabilities of execution of paths S0; S1; S2; S5; S6 and S0; S3; S4; S5; S6 were 0:9 and 0:1, respectively. Naturally, it makes sense to consider applying our optimizations along the more frequently executed path alone. Figure 2(d) shows the resulting variable lifetimes and compatibility graph after applying variable protection along STG path S0; S1; S2; S5; S6 only. As shown in the figure, the resulting RTL implementation contains the same number 1 The nodes of a compatibility graph denote variables, and an edge between two nodes indicates that the corresponding variables have nonoverlapping lifetimes [17, 18] . Thus, applying variable protection judiciously can lead to minimal or no register and clock network overheads. The aim of variable protection is to keep the value at the inputs of functional units stable during sequences of states that correspond to the execution of the selected STG path. In the previous examples, variable protection was performed by extending the lifetime of the variable to include all the states on the path. However, in cases when new values are generated/assigned to a protected variable in one or more of the states in the protected path, it is no longer possible to avoid unnecessary power dissipation in the targeted FUs, as shown in the following example.
Example 3:
Consider the partial schedule, and the FU ADD1 in its RTL implementation, that are shown in Figure 3(a) . The FU performs all the addition operations shown in the schedule. We applied variable protection along path S0; S2; S3; S4; S5 to variables b and c in order to eliminate unnecessary power consumption in ADD1 in states S3 and S4. Variable protection ensures that no other variables can be overwritten into the registers in which b and c are stored in the states that constitute the path. For the sake of illustration, let us assume that each variable is assigned to a separate register (this is in no way a limitation of our method, which can be applied with any arbitrary register sharing algorithm). Thus, the goals of variable protection are automatically satisfied. Figure 3 . This can happen, in general, when multiple assignments are made to variables in the schedule (i.e., the scheduled description is not in static single assignment form [19] ).
In such instances, we transform the schedule by re-naming the re-generated values of the protected variable, in order to ensure that it does not overwrite the old value, effectively resulting in a new variable. In order to ensure that the same functionality is maintained, we add re-assignment statements that assign the value of the new variable back to the original (protected) variable, along each outgoing transition from the selected STG path. For example, the schedule of Figure 3(a) , with the re-naming and re-assignment transformations performed, is shown in Figure 3 Note that we may use the same re-named variable v 0 to store re-generated values of a variable v along all the non-overlapping STG paths in which v is protected. However, in general, variable re-naming may introduce several additional variables in the schedule, thus increasing the storage (register) requirements. Hence, it is important to perform it judiciously.
While the illustrations in the examples of this section have so far considered STG paths individually for variable protection, it is possible that we may wish to protect an FU's operands along multiple overlapping paths. The variables protected along two different overlapping paths (to facilitate power management for a single FU) may or may not be the same. In either case, performing variable protection individually along the different paths is not sufficient to ensure elimination of unnecessary power consumption in the FU. We illustrate one such situation using the following example. Example 4 : Consider again the partial schedule of Figure 3(a) , and the FU ADD1 in its RTL implementation. Suppose we want to apply variable protection in order to eliminate unnecessary power dissipation in the FU along two overlapping paths S0; S1; S3; S4; S5 and S0; S2; S3; S4; S5. The operands of the FU in states S1 and S2 are a; d and b; c, respectively. Suppose that the lifetimes of these variables have have been extended and re-naming/re-assignment has been performed in order to ensure that their values are preserved along the respective paths. Now consider path S0; S1; S3; S4; S5. In state S3, in order to retain the value from state S1 at the inputs of ADD1, it is necessary to configure the multiplexer networks feeding the left and right inputs of the FU to select the registers corresponding to a and d, respectively. Next, let us consider path S0; S2; S3; S4; S5. The requirements of this path mandate that in state S3, we need to configure the multiplexer networks feeding the left and right inputs of the FU to select the registers corresponding to b and c, respectively. Thus, there is a conflict between the One remedy is to use the probabilities of state transitions S1 ! S3 and S2 ! S4, and specify the select signals of the multiplexer networks giving preference to the more frequent case [11] . This solution would work well if one of the transition probabilities is much higher than the other. However, it may often occur that both transitions have significant, and comparable transition probabilities (this is indicated by the fact that multiple paths were selected for variable protection in the first place). Giving preference to the requirements of one path over another (e.g., always selecting b and c at the inputs of ADD1 in state S3) will lead to unnecessary power consumption in the functional unit for the other cases, as shown in Figure 4 (a). Any solution to the above problem needs to recognize in state S3 whether the previous state was S1 or S2, and configure the multiplexer networks feeding ADD1 accordingly. Thus, in effect, we need to add limited state memory along the overlapping paths in order to remember the previous state. This state memory can be used to configure the multiplexer networks so that unnecessary power dissipation is avoided along both the paths. In Figure 4 (b), the control logic feeding the multiplexer network at the inputs of ADD1 is modified by adding a single flip-flop (FF) (the power overheads of this step are typically minimal since we only add a 1-bit FF in order to save power in multi-bit components of the data path 
The algorithm
In this section, we present our power management methodology. Our procedure accepts as input an STG with functional unit assignment information, and outputs a new STG. The new STG can differ from the original STG in the following ways:
Variables in the STG may be renamed, and new variables introduced.
Assignment statements may be added to selected STG edges.
The structure of the STG may be altered by the addition of extra states.
However, the functional unit assignment information is preserved. The following analysis illustrates the interplay of these transformations, and their role in facilitating power management.
Power management technique outline
The inputs to our power management algorithm are (i) an STG with profiling information, i.e., probabilities of individual states and edges being encountered, and (ii) functional unit assignment information for the operations scheduled. The outputs of our algorithm are (i) a new STG which represents a power-managed schedule, and (ii) lifetimes for the variables in the STG. The pseudocode in Figure 5 presents a high-level picture of the algorithm. The algorithm first identifies critical paths in the STG for power management. The second step involves transforming individual paths to prevent the functional units from executing spurious transitions. These transforms involve the introduction of new variables and assignment statements. The modifications made to one path to facilitate power management might conflict with the modifications made to another path which shares states with the original path. The third step of our power management algorithm resolves such conflicts.
We present our power management technique with reference to a single functional unit, fu. The results derived for this case can easily be generalized to apply to all functional units in the datapath.
Path Selection:
The first step is to identify paths in the STG for the application of our technique. Many applications are characterized by the presence of a few, frequently encountered paths. A vast majority of paths have very low probabilities of occurrence [20] . The power reductions obtained from these low probability paths do not justify the addition of hardware for power management purposes, as illustrated by Example 2. Our technique considers only paths whose probabilities exceed a user-specified threshold, tp, for the succeeding steps. Paths that are chosen for power management are deemed critical. The attributes that a critical path needs to possess are summarized by the following definition. Definition 1 : A path is considered critical if it satisfies the following properties:
The first state in the path, denoted by start, represents an active cycle for the functional unit, fu, under consideration. fu is idle in all states in the path, except start.
The probability that the path is traversed exceeds tp.
Every path, p, selected for power management is associated with a functional unit, fu, whose inputs are protected in p. Section 3.2 details our path-extraction algorithm, and relates the number of chosen paths to tp.
Variable protection:
The second step is to ensure that, in the chosen paths, the operands of a functional unit preserve their values between its successive uses. This is done by (a) extending the lifetime of an operand until the next use of the functional unit, and (b) configuring the multiplexer feeding the functional unit to select the preserved operand. The variables which represent these operands are referred to as protected variables of the selected path. The set Protected variables(p) contains the protected variables of path p.
Extending the lifetime of the operand is straightforward, except if it is written to between successive uses of the functional unit, as shown in Figure 6 . In this figure, path Si, : : :, Sk, Sk+1 , : : :, Sj be used. In Figure 6 , this is achieved by naming v as v 0 in the states including and succeeding Sk on p.
In renaming operands, we need to ensure that the functionality of the STG is preserved for every thread of execution encountered. To this end, we selectively add assignment statements to transitions entering or leaving p. For instance, transition T1 entering Sk+1 is annotated with the assignment statement, v 0 := v, and similarly, transition T2 leaving Sk+1 is annotated with the assignment statement, v := v 0 . In general, any transition, T, into an operation, op, in p, must be annotated with the assignment of v 0 to v if the following conditions are satisfied:
v is alive in op. Also, every transition, T, leaving op, in whose fanin a renamed variable v 0 , originally called v, is used, is annotated with the transition v := v 0 , if v is alive in op. After all variables have been renamed appropriately, the select signals to multiplexers feeding the functional unit, fu, selected for power management, are specified to select the protected variables.
Conflict resolution:
The two preceding steps identify paths for power management, and protect variables in these paths by extending their lifetimes. Different paths can require conflicting modifications. These conflicts are resolved using the techniques presented in this step. We first discuss the origin of path conflicts, and then present techniques for their resolution.
We now introduce some notation to simplify future discussion.
Consider a situation in which two paths, p1 and p2 , which share some nodes, are chosen for power management. Path p1 (p2 ) originates at state S1p1 (S1p2 )), and facilitates power management for functional unit fu1 (fu2 ). Conflicts can have the following origins: fu1 and fu2 are the same, causing potential conflicts in the configuration of the multiplexers at their inputs, in the states common to p1 and p2 . The following example illustrates this conflict, and presents a technique for its resolution.
Example 5 :
In Figure 7 , paths fS1;S3;S4g (p1 ) and fS2;S3;S4g (p2 ) are assumed selected to facilitate power management for functional unit fu1 . Therefore, fu1 must be configured to select variable u (w) at its left input in state S3 on path p1 (p2).
This conflict can be resolved with the addition of one extra flip-flop to remember path information. In general, distinguishing among n paths requires log 2 (n) flip-flops. At this stage, a few remarks about our technique, and its relation to the one presented in [11] , are in order. If paths p1 and p2 , described in the previous example, are not chosen for power management, then we do not add extra control memory for power management. In this case, the conflict is handled by selecting the variable that corresponds to the most probable path, in accordance with the technique presented in [11] . In our running example, suppose the current state is S3, and the probability of the previous state being S1 (0:8) exceeds that of the previous state being S2 (0:2). Then, the multiplexer feeding the left input of fu1 is accordingly configured to select u.
Some variable, v1 , is protected in paths p1 and p2. Also, as a result of the application of step 2 (variable protection) to path p1, some functional unit, fui , uses a renamed version, v1 0 , of v1. Example 6 illustrates this situation. Example 6 : Consider the STG fragments shown in Figures 8(a) , (b), and (c). As shown in Figure 8 (a), v is protected by both p1 and p2. Figure 8(b) shows the modifications made to path p1 to protect variable v. v is assigned a new value in the state preceding Sc, and is used by functional unit fu3 in Sc. Therefore, v is renamed v 0 in Sc and all its successors in p1. This renaming results in a pair of assignment statements, before and after Sc, which assign v to v 0 and vice versa. These statements conflict with the requirements of path p2 , which require that v preserve its value on path p2 without being altered or reassigned 3 .
This conflict can be resolved in two ways: . . . p2, fu2
. . . Additional control memory in the form of flip-flops or states can be added to remember the path leading to the state in question. Figure 9 shows the STG fragment, augmented by the addition of extra states for conflict resolution. This technique cannot be used if the first state, S1 p2 , of p2, is in p1. This is because p1 and p2 have the same origin, and cannot, therefore, be distinguished by the addition of extra states. The relabeling technique, discussed above, does not result in an overhead in the number of states. However, its use is limited due to the presence of situations such as those outlined in the following example. Example 7: Consider the STG fragment shown in Figure 11 . Paths p1, p2, and p3 are selected to facilitate power management for functional units fu1, fu2, and fu3, respectively. . . . sults in the addition of assignment statements which modify v on path p3, where v is also protected. We again rename v as v 0 on path p3, and add the appropriate assignment statements to incoming and outgoing transitions. This results in the renaming of v in state Sc, which impedes its protection on path p1. Note that, if v is renamed v 0 on p1, it is no longer protected in state Sd, because it acquires a new value in the state preceding Sd on path p1. Therefore, p3 must be protected by the addition of extra states, as shown in Figure 13 .
We note that the conflicts between p1 and p2 , and p2 and p3 are resolved by renaming, whereas that between p3 and p1 is resolved by adding extra states.
Power management technique details
In this section, we present the details of our proposed power management technique. These details pertain to the path selection procedure alluded to in step 1 of Section 3.1; other steps have been adequately explained in Section 3.1.
As mentioned in Section 3. //Initially, this set just contains the probability of path fsg, //which is just the probability, P(s), that s is encountered.
//Current path is the path currently under consideration, //and is initially set to fsg. slast is the last state in Current path; 0 foreach successor, succ, of slast in stg f Having established a bound on the number of paths, we now outline a procedure to extract them. The complexity of the procedure is polynomial in the number of extracted paths.
The pseudocode shown in Figure 14 describes the procedure. The subroutine GLOBAL EXTRACT PATHS iterates over individual functional units and their active states, identifying candidate paths for power management. We now discuss the subroutine GET CRITICAL PATHS builds the set of critical paths in an incremental fashion. It accepts as input the state transition graph, stg, the functional unit, fu, under consideration, the probability threshold, tp, and the path, Current path, under consideration. The other two parameters pertain to the recursive nature of the subroutine, and their purpose is clarified later in this section. The routine searches for critical paths (Definition 1) which can be obtained from 
Experimental results
We performed experiments to evaluate our techniques using several commonly available control-flow intensive designs. As mentioned in Section 3, our technique accepts as input a scheduled STG with functional unit assignment information, and outputs a new STG. The functional unit assignment is preserved in the new STG, but variable lifetimes may be re-defined. In our case, the input control-flow intensive designs had functional unit assignment, as well as register assignment information. The enhancement of variable lifetimes sometimes invalidated the original register sharing configuration. Extra registers were used to accommodate shared variables with conflicting lifetimes.
The RTL circuits produced using our technique were compared against those produced using the technique described in [11] , which re-specifies controller don't cares to preserve input operands of idle functional units. Note that, in this case, only the controller needed to be resynthesized, and the structure of the original datapath was preserved.
The RTL circuits for the two techniques were synthesized using the VARCHSYN synthesis system. The gate-level descriptions, derived upon synthesis were technology-mapped using the NEC CMOS6 0:5 standard cell library. The technology-mapped designs were simulated with typical input sequences using the NEC CSIM simulator to determine the power consumption. The input sequences used for simulation were obtained by first generating a zero-mean Gaussian sequence and then passing the result through an autoregressive filter to introduce the desired level of temporal correlation. Table 1 shows the experimental results obtained. The first column gives the name of the design. The cycle time of the synthesized designs was fixed at 25ns, and a supply voltage of 5 Volts was used. Major columns P and A show the power consumption and area of the synthesized circuits. Minor columns T1 and T2 refer to circuits produced by the application of the technique described in [11] , and those produced by the application of our technique, respectively. Columns P.S. and A.O. represent the power savings and area overheads for designs produced using technique T2, with respect to those produced using T1. The results obtained indicate that, for the same delay, our methods achieve a 35.6% average power reduction over already power-optimized circuits at an average area overhead of only 1.1%.
Conclusions
In this paper, we presented a suite of transformations which collectively facilitate power management for control-flow intensive RTL designs. These transformations, which act upon a scheduled and assigned design, include renaming variables, modifying the structure of the input STG, extending lifetimes of variables, and the introduction of extra assignment statements to preserve functionality. These transformations, when applied to some specific, critical STG segments, identified by our technique, were found capable of achieving upto 76.6% reduction in power at area overheads not exceeding 10.1%.
