Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies. Recent research efforts have tried to integrate architecture and compiler solutions to employ power-gating mechanisms to reduce leakage power. This approach is to have compilers perform data-flow analysis and insert instructions at programs to shut down and wake up components whenever appropriate for power reductions. While this approach has been shown to be effective in early studies, there are concerns for the amount of power-control instructions being added to programs with the increasing amount of components equipped with power-gating control in a SoC design platform. In this paper, we present a Sink-N-Hoist framework in the compiler solution to generate balanced scheduling of power-gating instructions. Our solution will attempt to merge power-gating instructions as one compound instruction. Therefore, it will reduce the amount of power-gating instructions issued. We perform experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumptions on Wattch toolkits. The experimental results demonstrate that our mechanisms are effective in reducing the amount of power-gating instructions while further in reducing leakage power compared to previous methods.
INTRODUCTION
Minimization of power dissipation can be considered at algorithmic, architectural, logic, and circuit levels [4] . Studies on low power design are abundant in the literature in which various techniques were proposed to synthesize designs with low transitional activities. Recently, new research directions in reducing power consumptions have begun to address the issues on the aspect of architecture designs and on software arrangements at instruction-level to help reduce power consumptions. [1, 5, 8, 11, 12, 17, 19, 20] . In order to reduce the dynamic power, several research work have been proposed to reduce the dissipation. For example, software rearrangements to utilize the value locality of registers [5] , the swapping of operands for booth multiplier [12] , the scheduling of VLIW instructions to reduce the power consumption on the instruction bus [11] , gating clock to reduce workloads [8, 19, 20] , cache sub-banking mechanism [17] , the utilization of instruction cache [1] , etc.
As semiconductor technology continues to scale down, the leakage power gains more significance in the total power dissipation. It is predicted that the leakage power will become comparable to the dynamic power in only a few generations [18] . Therefore, power gating in addition to clock gating should be used to reduce both leakage power and dynamic power, as clock gating is only able to reduce the dynamic power [3, 10] . Recent research efforts have tried to integrate architecture and compiler solutions to employ powergating mechanisms to reduce leakage power [6, 13, [22] [23] [24] [25] . This approach is to have compilers perform data-flow analysis and insert instructions at programs to shut down and wake up components whenever appropriate for power reductions. While this approach has been shown to be effective in early studies, there are concerns for the amount of powercontrol instructions being added to programs with the increasing amount of components equipped with power-gating control in a SoC design platform for embedded systems. Note that architecture designers can custom the processor with unique operation functions [7, 9, 21] . Examples of these modules are abundant. For example, one may have extensible instructions for crypto modules, 3D graphic modules, motion estimation modules, a variety of wireless communication modules, etc.
In this paper, we present a Sink-N-Hoist framework in the compiler solution to generate balanced scheduling of power-gating instructions. Our solution will attempt to merge power-gating instructions as one compound instruction. Therefore, it will reduce the amount of power-gating instructions issued. Note that power-gating instructions can significantly reduce leakage power, but produce recovery penalties, increase the execution time of programs, and increase code sizes of programs. Figure 1 illustrates an example of power-gating controls. In the LHS of the figure, it shows two different components in use. Next, the current practice will attempt to issue power-on and power-off instructions at programs for these two hardware components separately. The one in the RHS of Figure 1 shows our scheme to try to merge these instructions. In our research work, we will provide cost model and a software foundation to guide this process. Our solution includes a set of data-flow equations for code motions of power-gating instructions. Our work gives a theoretical foundation and step-by-step framework to group power-gating instructions, together. We perform experiments by incorporating our compiler analysis and scheduling policies into SUIF compiler tools and by simulating the energy consumptions on a platform integrating Wattch toolkits [2] . The experimental results done with DSP-stone benchmark demonstrate that our mechanisms are effective in reducing the amount of powergating instructions as well as producing power reductions over previous methods. It results in average 31.2% of reduction in the amount of power-gating instructions over the scheme without incorporating our Sink-N-Hoist framework for merging power-gating instructions. In fact, we further reduce the energy consumption in our framework. This is due to that the effect of a block version of power-gating instructions gives better power and performance effects than the pointwise version of power-gating instructions. The remainder of this paper is organized as follows. Section 2 describes a machine architecture for the target platform. Section 3 overviews the leakage-power reduction framework. Section 4 presents our analysis and merging techniques for reducing the amount of power-gating instructions. Section 5 gives the experimental results of our work. Finally, Section 6 concludes this work.
MACHINE ARCHITECTURE
The architecture model in our design is a modern system with an instruction set to support the control of power gating in the component level. We focus on reducing the power consumption of the certain components by invoking the power gating technology. Power gating is analogous to clock gating; power gating powers off devices by switching off their supply voltage rather than the clock. It can be done by forcing transistors to be off or using multi-threshold voltage CMOS technology (MTCMOS) to increase threshold voltage [3, 10, 14] . Figure 2 illustrates an example of our target machine architecture based on Alpha 21264 processor having the integer function unit (Execution Box) and the floating point function unit (Floating-point Box). In the adapted ALPHA 21264 architecture model, the E box and F box were added the power-gated functions. The power gating of each unit can be controlled by the "Power Gating Control Register" ("PGCR" for short). The PGCR is a 64-bit integer register. In this case, there are one bit used for Integer Multiplier and 3 bits for Floating Point Function Units. Setting the power gating bit true will cause the corresponding module to be powered on. Clearing the bit to zero will power off the corresponding module immediately in the following clock. A new instruction was implemented to control units with the power gated function by move a proper value from a general purpose register to the PGCR. The Integer ALU unit is always powered on, due to that it takes response to performs the data movement to the PGCR.
LEAKAGE-POWER REDUCTION FRAMEWORK
In this section, we present a compiler framework for employing power-gating mechanisms to reduce leakage power dissipation. In our earlier work, we have presented a dataflow analysis framework, called Component-Activity DataFlow Analysis, to estimate the component activities on a microprocessor within a given program [23, 24] . The analysis collects the information of the utilization of components at each point in the program. After that, a power-gating instruction scheduling is performed to determine when, where, and whether power-gating control should be employed with the concern of power reduction. Finally, power-gating instructions are inserted into the program accordingly. In this research work, we present a Sink-N-Hoist framework, which is applied in the phase right before power-gating instructions are inserted, to generate balanced scheduling of power-gating instructions. Our solution will attempt to merge power-gating instructions as one compound instruction. Figure 3 presents our compiler flow of the leakage power reduction framework.
Step (i), (ii), and (iii) are the steps in conventional methods [23, 24] and step (iv) and (v) are the steps proposed in this paper to perform mergings of power gating instructions. Among the stages of design flow, step (i) and (ii) gives component activity flow analysis, step (iii) decides where and if power gating instruction should be inserted. Next, step (iv) attempts to merge the power gating instructions with our proposed Sink-N-Hoist framework. Finally, step (v) performs code emits for the group case. A motivating example of power-gating control over floating-point units (a floating-point ALU, a floating-point multiplier, and a floating-point divider) with this framework is illustrated in Figure 4 , where each plot shows the status of a component in timeline and the shadowed plot represents that it is in use. Three scenarios are given as follows: the leftmost figure shows the case without power-gating control, the middle one shows the case when (i), (ii), (iii), and (v) in the framework are applied, and the rightmost one shows the case when all phases in the framework are applied. The number of power-gating instructions inserted can be decreased from six to two when the Sink-N-Hoist Analysis is applied.
In the following, we first describe the the methods in step (ii) and (iii) and then present step (iv) and step (v) with ii. Perform Component-Activity Data-Flow Analysis.
iii. Perform power-gating instruction scheduling.
iv. Perform Sink-N-Hoist Analysis.
v. Emit power-gating instructions for groups. 
Component-Activity Data-Flow Analysis
The goal of the Component-Activity Data-Flow Analysis is to collect the information of the utilization of components at each point in a program. A set of data-flow equations is proposed to compute such information. We say a componentactivity c is generated at a block b if a component is required for the execution, symbolized as COMPONENT loc (b), and it is killed if the component is released by the last request, symbolized as COMPONENT blk (b). The predicates of the data-flow equations for collecting component-activity information are given as follows:
• COMPONENT loc (b) is a set of components which are required for the first cycle of the execution.
• COMPONENT blk (b) is a set of components which are released by the execution.
• COMPONENT in (b) is a set of components which are required for the execution in the beginning of block b. It can be computed by
where Pred(b) is the set of predecessor program blocks of p.
• COMPONENTout(b) is a set of components which are required for the execution in the end of block b. It can be computed by
and can be read as, "the information at the end of a statement is either generated within the statement, or enters at the beginning and is not killed as control flows through the statement."
• INACTIVITY(b) is a set of components which are not active at block b. In fact, INACTIVITY(b) is the complementary set to COMPONENTout(b), i.e.,
where Ω is the universal set.
Power-Gating Instruction Scheduling
With the utilization information of components, we can insert power-gating instructions into programs at the appropriate points (i.e. the beginning and the end of an inactive block) to power off and on unused components so as to reduce the leakage power. However, both shut-down and wake-up procedures are associated with an additional penalty, especially the latter due to peak voltage requirements. The following equation represents a cost model for deciding if the insertion of power-gating instructions will provide energy-consumptions benefits: where functions E and P return the value of energy and power consumption, respectively; E off (C) represents the energy consumption of issuing a power-off instruction for component C and E on (C) represents the energy consumption of issuing a power-on instruction for component C; P leak (C) represents the leakage power consumption of component C in a cycle; P rleak (C) represents the leakage power consumption of component C in a reduced level in a cycle 1 ; and ITVL idle is the length of the idle interval. Accordingly, we have a break-even length of idle intervals for each component C, called BE-ITVL idle C , that sustains the above inequality and it is given by
Hence, the compiler must be aware that power-gating control of a certain component C is employed only when there exists a continuous idle interval whose length is greater than BE-ITVL idle C on the component. Moreover, the latency associated with powering a component on should also be considered.
The component activity information gathered and the cost model for deciding if the power-gating instructions should be employed now to consider the scheduling mechanisms when inserting the power-gating instructions into given programs. As the duration of power-gating control on components is influenced to conditional branches in programs, we propose a set of scheduling policies Basic Blk Sched, MIN Path Sched, and AVG Path Sched with power-gating instructions. The details are given below. A naive mechanism to control the power-gating instructions will set the on and off instructions at each basic block according to the component activities gathered by the data-flow equation. We call this scheme Basic Blk Sched. Another case to consider in power gating is that of an inactive block containing conditional branches, since the length of the two inactive blocks -which follow the branch targets -may be different. For example, only one of the branchings may benefit from power gating, in
SINK-N-HOIST ANALYSIS
The main idea of the Sink-N-Hoist Analysis is to abate the problem of too many instructions being added with code motion techniques. The approach attempts to merge several power-gating instructions into one compound instruction by 'sinking' power-off instructions and 'hoisting' power-on instructions, i.e., postponing the issue of power-off instructions late and advancing the issue of power-on instructions early. This will result in profits mainly for code size, but also in performance and energy via grouping effects. For instance, a power-off instruction can be postponed for some cycles to be merged with other adjacent power-off instructions. Nevertheless, there should be a limitation on the number of cycles to be sank or hoisted since sinking or hoisting a power-gating instruction will cause more leakage dissipation. A cost model is given below to determine the feasibility. For a component C, we have
where SINK-SLK is the number of cycles a power-off statement (or instruction 2 ) is sank, i.e., the power-off statement (3)- (6)) 2. Perform the Grouping-Off Analysis and Grouping-On Analysis. (Equation (7)- (10)) 3. Perform the Power-Gating Instruction Placement. is delayed for SINK-SLK cycles, E fet-dec-off (C) returns the energy consumption of fetching and decoding a power-off instruction, E exe-off (C) returns the energy consumption of executing a power-off instruction, and N is the number of powergated components. Note that the sum of E fet-dec-off (C) and E exe-off (C) is equal to E off (C). The right-hand side of the inequality represents the energy consumed when the power-off statement is delayed for SINK-SLK cycles and merged with other (N − 1) power-off statements while the left-hand side stands for the energy consumed when the power-off statement is called right after the end of the active interval. In consequence, we have a maximum sinkable slack for each component C, called MAX-SINK-SLK C , that sustains the above inequality and it is given by
Similarly, we have a maximum hoistable slack for each component and it is given by
ž .
With such cost constraint as the basis, we now present a set of data-flow equations to collect the information for code motion of power-gating instructions. Figure 5 shows the algorithm of the Sink-N-Hoist Analysis. The whole set of equations used are presented in Figure 6 . The Sink-NHoist Analysis mainly consists of three phases: 1) the Sinkable and Hoistable Analysis, which compute the information of possible positions for each power-gating instruction, 2) the Grouping-Off and Grouping-On Analysis, which group together the power-gating instructions that can be merged, and 3) the determination of the appropriate positions for power-gating instructions. The details are discussed as follows.
Sinkable and Grouping-Off Analysis
The predicates for collecting SINKABLE and GROUP-OFF information are given as follows: The SINKABLE gives the data flow equation to collect how far the turn-off instructions of component activities can be sank. In addition, GROUP-OFF gives the data flow equation to partition the turn-off instructions into groups, and we can then use this information to group them by selecting emitting instructions.
• SINKABLE loc (b) is a set of power-off statements which occur within block b and can be safely moved to the end of the block. Each statement is associated with a number, named SINK-SLK b C , which keeps an integer value of slack time for component C for indicating how many cycles the power-off statement can be sank at the current position. The initial value of SINK-SLK b C is set as MAX-SINK-SLKC .
• SINKABLE blk (b) is a set of power-off statements which cannot be safely moved from the start to the end of bock b, i.e., a set of power-off statements whose value of the associated SINK-SLK b C is zero.
• SINKABLEin(b) is a set of power-off statements which can be safely moved to the beginning of block b. The SINKABLE in (b) is computed as follows:
Meanwhile, the value of SINK-SLK is decreased by one to be in accordance with the definition. In brief, the value of SINK-SLK
where MIN is a function that returns the minimum value among its parameters.
• SINKABLE out (b) is a set of power-off statements which can be safely moved to the end of block b. The SINKABLEout(b) is computed as follows: We now gives the data flow equation for GROUP-OFF. It will partition the turn-off instructions into groups, and we can then use this information to group them by selecting emitting instructions.
• GROUP-OFF loc (b) is a set with at most one element, i.e., a singleton or an empty set, in which the element (if it exists) is an integer representing a group number and never appears in other sets of GROUP-OFF loc . The block b belongs to the group it numbered and is the beginning block of a set of successive blocks if the GROUP-OFF loc (b) is not empty. The GROUP-OFF loc (b) set is not empty only when A simple way to ensure that all the numbers in the sets of GROUP-OFF loc of all blocks are unique is using a integer counter to assign each element with the value of the counter. Once an element is assigned, the counter increases.
• GROUP-OFF blk (b) is a universal set of integers, namely Ω, or an empty set. The set is not empty (set to be a set with an Ω value) only when
In all other cases, it will be an empty set.
• GROUP-OFF in (b) is an integer singleton, a group number, which can be assigned to the start of block b or an empty set. The GROUP-OFFin(b) is computed by
where Φ returns the value of the element of its parameter and returns infinity if the parameter is an empty set. In addition, all the GROUP-OFFout set in the same group of its predecessors can be replaced by the GROUP-OFF in (b) if the GROUP-OFF out set of the predecessor of b is not empty. This will allow opportunity for further grouping effects.
• GROUP-OFF out (b) is an integer singleton, a group number, which can be assigned to the end of block b or an empty set. In fact, the element in GROUP-OFFout(b) gives the group number that block b belongs to. The GROUP-OFF out (b) is computed by
In the following, we give a running example to illustrate how the analysis works. Suppose that two components, namely A and B, are considered for analyses. Given a control-flow graph as shown in Figure 7 (a), where each block in the graph contains only a statement, we can determine where power-gating statements should be located by performing step (i), (ii), (iii), and (v) in Figure 3 . It includes the Component-Activity Data-Flow Analysis and power-gating instruction scheduling. In this example, it is found that component A and B should be powered off at Bm+2 and B n+2 and at B m+5 and B n+5 , respectively. The shadowed blocks represent that components are in use (the left half is for component A and the right half is for component B). To reduce the amount of power-gating instructions issued, we then apply the Sinkable Analysis. By the definition of the SINKABLE loc (b), a set of power-off statements which occur within block b, we have SINKABLE loc (B m+2 ) = {PowerOff A(4)}, SINKABLE loc (Bm + 5) = {PowerOff B(2)}, SINKABLE loc (Bn+2) = {PowerOff A(4)}, and SINKABLE loc (B n+5 ) = {PowerOff B(2)}, where the numbers in parentheses indicate the value of the associated SINK-SLK C (in fact, the values come from the MAX-SINK-SLK A and MAX-SINK-SLKB), and the SINKABLE loc of the other blocks are empty sets. To simplify the representation, the word 'PowerOff' is removed and the value of the associated SINK-SLK C is superscripted, e.g., SINKABLE loc (B m+2 ) = {A 4 }. Table 1 shows the computation results of the SINKABLE blk (b), SINKABLE in (b), and SINKABLEout(b) for each block. Actually, the SINKABLEout(b) indicates the set of power-off statements that can be issued at block b without energy penalties if the statements could be merged with other statements. In other 
Hoistable and Grouping-On Analysis
The Hoistable Analysis and Grouping-On Analysis are similar to the Sinkable Analysis and Grouping-Off Analysis, but the Hoistable Analysis is a backward data-flow analysis. Similarly, we can define a set of predicates for collecting HOISTABLE and GROUP-ON information as follows.
• HOISTABLE loc (b) is a set of power-on statements which occur within block b and can be safely moved to the start of the block. Each statement is associated with a number, named HOIST-SLK • HOISTABLE blk (b) is a set of power-on statements which cannot be safely moved from the end to the start of bock b, i.e., the set of power-on statements whose value of the associated HOIST-SLK b C is zero.
• HOISTABLEout(b) is a set of power-on statements which can be safely moved to the end of block b. The HOIST-ABLE out (b) is computed as follows:
Meanwhile, the value of HOIST-SLK b C would be the minimum one among the successors of b if the value of
{A,B} {A 0 , B 0 } 
Ω {2} Table 2 : GROUP-OFF predicates.
HOIST-SLK
s C is inconsistent with each other, where s is a successor of block b. It means that the hoistableslack from one successor would be shrunk if other successors have a smaller hoistable-slack. This is for the consideration that a power-on statement should not be hoisted far away to the position that may cause a reverse effect. Moreover, the value of each HOIST-SLK b C is decreased by one to be in accordance with the definition. In brief, the value of HOIST-SLK b C is given by
• HOISTABLE in (b) is a set of power-on statements which can be safely moved to the start of block b. The HOISTABLE in (b) is computed as follows:
Meanwhile, the value of HOIST-SLK b C is given from the value of the associated HOIST-SLK b C in HOISTABLE loc (b) if there exists a power-on-C statement in HOIST-ABLE loc (b); otherwise, it is given from the one in HOISTABLE out (b).
• GROUP -ON loc (b) is a set with at most one element, i.e., a singleton or an empty set, in which the element (if it exists) is an integer representing a group number and never appears in other sets of GROUP-ON loc . The block b belongs to the group it numbered and is the beginning block of a set of successive blocks if the GROUP-ON loc (b) is not empty. The GROUP-ON loc (b) set is not empty only when
A simple way to ensure that all the numbers in the sets of GROUP-ON loc of all blocks are unique is using a integer counter to assign each element with the value of the counter. Once an element is assigned, the counter increases.
• GROUP-ON blk (b) is a universal set of integers, namely Ω, or an empty set. The block b is one, or the only one, of the end blocks of a set of successive blocks if the GROUP-ON blk (b) is not empty. The set is not empty only when
HOISTABLEin(s) = ∅.
• GROUP-ON in (b) is an integer singleton, a group number, which can be assigned to the start of block b or an empty set. The GROUP-ON in (b) is computed by
where Φ returns the value of the element of its parameter and returns infinity if the parameter is an empty set. In addition, we can also replace all the GROUP-ON out set of its predecessors by the GROUP-ON in (b), i.e. if GROUP-ON out set of the predecessor of b is not empty. Note that this gives further opportunity for grouping effects.
• GROUP-ONout(b) is an integer singleton, a group number, which can be assigned to the end of block b or an empty set. In fact, the element in GROUP-ON out (b) gives the group number that block b belongs to. The GROUP-ONout(b) is computed by
Power-Gating Instruction Placement
With the SINKABLEout, HOISTABLEin, and GROUP-OFF/ ON out collected in Section 4.1 and 4.2, we then use these information to determine how to place power-gating instructions, i.e., whether power-gating instructions should be combined together or issued separately. Figure 8 gives a brief algorithm of the power-gating instructions placement. The basic idea of the algorithm is to place power-gating instructions in a group-by-group manner. It first determines all possible policies for issuing power-gating instructions -a legal policy is that all power-gating instructions should be issued at the block b in which SINKABLEout(b) or HOISTABLE in (b) is not empty and each type of power-gating instructions appearing within a group must be issued only and exactly /* determine which policy consumes least power*/ best policy = get best policy(policy list); /* annotate the positions of power-gating instructions */ make annotation(best policy); end } Figure 8 : Power-Gating Instruction Placements.
once. Next, it uses an energy-cost model, which describes the energy, such as the leakage energy, the energy of issuing power-off instructions, etc., to determine which policy has the best benefit in energy consumption aspects.
In the following, we elaborate the idea by continuing the example in Section 4.1. With the information of SINKABLE out and GROUP-OFF out , an energy-cost model is established and evaluated for each case of issuing-power-off-instruction policies under the guideline that power-off instructions must be issued at the block in which the SINKABLE out is not empty and each type of power-gating instructions appearing within a group must be issued only and exactly once, e.g., the policy could be 'powering off A at Bm+2 and powering off B at Bm+5' or 'powering off A and B at Bm+2' in group number one. The final decision of which policy to be taken depends on the energy cost evaluated by the model; certainly, the one with the minimum cost is chosen as it should be for low-power consideration. Finally, power-off instructions are inserted at appropriated points as shown in Figure 7 (b): the power-off statements within each group are merged.
EXPERIMENTAL RESULTS
We use a DEC-Alpha-compatible architecture with powergating control and instruction sets described in Figure 2 as the target architecture for our experiments. The proposed leakage-power reduction framework is incorporated into the compiler tool with SUIF [16] and MachSUIF [15] , and evaluated by the Wattch simulator with .10µm process parameter and 1.9 V supply voltage [2] . Figure 9 illustrates the phases in the compilation and simulation framework. We incorporate the low-power optimization phase following MachSUIF phase. As Wattch does not model leakage at the component level per se, we assume that leakage power contributes 10% of total power consumption. Furthermore, we assume that wake-up operations of power-gating control take 20-cycle latency, although 7.5 cycles are introduced in [3] , and it takes two times and ten times of leakage energy per cycle to power off and power on a component, respectively. The energy consumption of fetching and decoding a power-gating instruction is assumed to be two times of leakage power. Also the baseline data is provided by Wattch's cc3 clock-gating power estimation, which gates the clocks of those unused resources in multiported hardware to reduce the dynamic power; however, leakage power is still leaked. The benchmarks used in our experiment are from the floating-point version of DSP-stone benchmark suite [26] . Three versions are compared. The base version is the one without power gating mechanism. The original version is the one from a previous work [23, 24] that only performs the step (i), (ii) and (iii) in Figure 3 . The Sink-N-Hoist Analysis scheme is the one proposed in this work to perform all phases in Figure 3 . In addition, three policies for power-gating instruction scheduling were proposed in step (iii) of Figure 3 to deal with conditional branches in programs. Without loss of generality, we use the M in P ath Sched policy to schedule power-gating instructions in this experiment. Figure 10-12 give the compilation and simulation results of two approaches: the original one and the Sink-N-Hoist one when the integer ALU, floating-point adder, and floating-point multiplier are considered for power gating, and the comparison baseline in these figures is the one without powergating controls. Figure 10 presents the ratio of power-gating instructions over total instructions in program codes. It shows that the Sink-N-Hoist approach has about 31.2% of improvement (from 17.0% to 11.8%) in the reduction of the amount of power-gating instructions generated comparing to the one without Sink-N-Hoist framework. Moreover, our scheme also further reduces the total energy consumption compared to the one without Sink-N-Hoist framework. This is due to that the effect of a block version of powergating instructions gives better power and performance effects than the pointwise version of power-gating instructions. Figure 11 shows our scheme gives average 18.0% reduction (from 7.2% to 8.5% of total power) comparing to the base method. Note that the average reduction of total energy is less than 10%, but we should recall that only three types of functional units (the integer ALU, floating-point adder, and floating-point multiplier) are under power-gating controls in this experiment. In fact, the base method already achieved average 70.4% and 72.6% energy reduction for the floating-point adder and floating-point multiplier in combined dynamic and leakage power, respectively [23, 24] . Figure 11 also shows our scheme holding edges over the original scheme in energy reduction. This is due to the effect of a block version of power-gating instructions gives better power effects than the pointwise version as illustrated in our cost model. Finally, Figure 12 shows the detailed information of the performance impact caused by power-gating mechanisms, and it says that the performance degradation is reduced about 2.9% (from 2.01% to 1.95%) over the original method. Our method holds a small edge over the one without Sink-N-Hoist framework due to the reduction of the amount of power-gating instructions. Note that the performance penality is not as bad as the amount of instructions added due to most instructions were added outside the loop kernel. Nevertheless, the reduction of the amount of power-gating instructions still gives performance edges.
CONCLUSION
In this paper, we presented a Sink-N-Hoist Analysis for merging several power-gating instructions. In summary, our experiment shows that the Sink-N-Hoist Analysis framework results in benefits for code sizes as well as energy consump- Original Sink-N-Hoist Figure 12 : Performance degradation.
tion and performance. As the compiler phase is done one phase after another, our framework gives a sound theoretical foundation capable to work with other phases such as adding more slackness for low power with code motions. Finally, we are in the process of incorporating more components, such as crypto modules, into our architecture and simulator. We expect the effects of our scheme will be even more important as more extensible modules are equipped with power-gating control in this platform.
