Controlling value based fine-grained power gating with sleep signal optimization by Chen Lei
Controlling Value Based Fine-Grained Power
Gating with Sleep Signal Optimization
CHEN, Lei
Graduate School of Information, Production and Systems
Waseda University
July 2010

Abstract
Power gating technology is proved to be able to effectively reduce leakage current by cutting
off the idle logic blocks from their power supplies. However, the traditional power gating
strategies essentially require some power management units in order to identify the idle
period of target logic blocks and generate the corresponding control signals. On the other
hand, to reduce both leakage power and dynamic power, a successful VLSI design usually has
to combine power gating method with other dynamic power reduction methods such as clock
gating. In this dissertation, a novel power gating method based on the controlling values
of CMOS logic elements is proposed in order to reduce both dynamic and leakage energy
in active mode. Furthermore, by applying the proposed strategy, autonomic power gating
can be achieved, which means neither extra power management unit nor signal isolation
cell is required. Therefore, this controlling value based power gating strategy is expected
to be quite area-efficient and easy to implement. According to some early-stage estimation
on some standard benchmark circuits, over 50% of logic gates can be power-gated.
Since the proposed power gating strategy is based on the controlling property of logic
elements, its effectiveness depends on the probability of taking the controlling value and
also the number of gates controlled by the sleep signals. The product of the probability of
taking the controlling value and the number of gates is called the “expected number of sleep
gates” and is used in the evaluation of the effect of the proposed method. Several heuristic
algorithms are proposed and studied in this dissertation in order to achieve considerable
high power reduction by properly selecting sleep signals and power clusters.
One of the problems of the proposed method is that the structure of the circuit is
changed, thus the critical path increases. In this dissertation, a steady maximum depth
constraint is added during the procedure of the proposed algorithms in order to prevent
the depth increase of the critical path. Although the steady maximum depth constraint
has the trade-off with the expected number of sleep gates, we show that the energy saving
is still considerable. The effect of the steady maximum depth constraint is also checked
experimentally using benchmark circuits, which shows that with 20% ∼ 30% tolerance of
maximum depth increase, the energy saving is almost the same with that of no maximum
i
depth constraint.
ISCAS’85 benchmark circuits are involved to measure the power saving of all implemen-
tation algorithms developed in this dissertation. The power gating circuits are implemented
by using VDEC Rohm 0.18µm process technology and HSPICE simulations are performed.
According to the experimental results, all of these algorithms are capable of achieving more
than 13% of total power reduction with little performance penalty. Especially the pN -based
algorithm, which considers both gate count and switching activity, is proved to be able to
reduce, averagely, 20% of total power consumption.
The sleep transistor sizing is also addressed in this dissertation. With the capability of
reducing both dynamic power and leakage power, the sizing issue of the proposed method is
quite different from the traditional power gating methods. Since the delay overhead caused
by the sleep transistors is not serious, the major parameters of sleep transistor sizing are the
overall power saving and the switching power of sleep transistors. By performing post-layout
simulations, the size of sleep transistor corresponding to the minimum power consumption
is investigated. As a result, in the proposed power gating method, the sleep transistor can
be sized almost the same with the transistors used in the original circuit. Therefore, even
though the controlling value based power gating is usually implemented in a fine-grained
fashion, the area penalty caused by the sleep transistors is small. For certain benchmark
circuit, the proposed method can reduce more than 26% of total power with less than 10%
of area penalty.
ii
Contents
Abstract i
Contents iii
List of Figures vii
List of Tables ix
1 Introduction 1
1.1 Power Dissipation of CMOS LSI . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overviews of Power Reduction Methods . . . . . . . . . . . . . . . . 4
1.2.1 Dynamic Voltage and Frequency Scaling . . . . . . . . . . . . 5
1.2.2 Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Guarded Evaluation / Operand Isolation . . . . . . . . . . . . 8
1.2.4 Power Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Contributions of the Dissertation . . . . . . . . . . . . . . . . . . . . 12
1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . 14
2 Preliminaries 15
2.1 Controlling Values of Logic Elements . . . . . . . . . . . . . . . . . . 15
2.2 Binary Decision Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Fine-Grain Power Gating Based on Controlling Values of Logic Gates 23
3.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Properties of Controlling Value Based Power Gating . . . . . . . . . . 27
iii
3.2.1 Dynamic Power Reduction . . . . . . . . . . . . . . . . . . . . 27
3.2.2 Control Granularity and Area Overhead . . . . . . . . . . . . 30
3.2.3 Delay Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Basic Algorithm and Efficiency Estimation of Fine-Grain Power Gating 36
3.3.1 Selecting Sleep Signals and Sleep Gates . . . . . . . . . . . . . 36
3.3.2 A Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.3 CV-Based Optimum Power Gating for AND/OR Tree Circuits 46
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Heuristic Optimization of Expected Number of Sleep Gates in Con-
trolling Value Based Power Gating 51
4.1 Steady Maximum Depth Constraint . . . . . . . . . . . . . . . . . . . 52
4.2 Important Factors in CV-based Power Gating . . . . . . . . . . . . . 54
4.3 Heuristic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 N-based Heuristic Algorithm . . . . . . . . . . . . . . . . . . . 59
4.3.2 Probability-First Heuristic Algorithm . . . . . . . . . . . . . . 69
4.3.3 pN -Based Heuristic Algorithm . . . . . . . . . . . . . . . . . . 74
4.4 Trade-offs between Delay and Power Saving . . . . . . . . . . . . . . 81
4.5 HSPICE Simulation Results and Comparisons . . . . . . . . . . . . . 82
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5 The Sizing of Sleep Transistors in CV-based Power Gating 87
5.1 Switching Power of Sleep Transistors . . . . . . . . . . . . . . . . . . 88
5.2 Voltage of Virtual Power Supply . . . . . . . . . . . . . . . . . . . . . 92
5.3 Experimental Results on the Size of Sleep Transistor in CV-based
Power Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6 Conclusions and Future Work 101
6.1 Controlling Value based Power Gating Method . . . . . . . . . . . . . 101
6.2 Heuristic Optimization in CV-Based Fine-Grained Power Gating . . . 102
iv
6.3 Sizing of sleep transistors in CV-based power gating . . . . . . . . . . 104
6.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Acknowledgment 107
Bibliography 109
List of Publications 118
v
vi
List of Figures
1-1 The Trend of Dynamic Power and Leakage Power Dissipation.[77] . . 2
1-2 The Relation Between Delay and Power Supply Voltage[81]. . . . . . 4
1-3 The Basic Architecture of Clock Gating. . . . . . . . . . . . . . . . . 7
1-4 The Two Typical Styles of Clock Gating. . . . . . . . . . . . . . . . . 7
1-5 The basic structure of guarded evaluation[79] and operand isolation[80]. 8
1-6 Basic Architecture of Power Gating. . . . . . . . . . . . . . . . . . . . 10
1-7 Power Gating by Using Enable Signals of Clock Gating. . . . . . . . . 11
2-1 Two Types of Universal Logic Gates, NAND and NOR. . . . . . . . . 16
2-2 An Example of Binary Decision Diagram. . . . . . . . . . . . . . . . . 18
3-1 An example of controlling value based power gating. . . . . . . . . . . 24
3-2 Controlling value based power gating by using a pMOS sleep transistor. 25
3-3 An sample circuit to show the dynamic power reduction in CV-based
power gating method. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3-4 Dynamic power reduction in CV-based power gating method. . . . . . 30
3-5 An isolation cell implemented by using an AND gate. . . . . . . . . . 31
3-6 Critical path change caused by sleep signal assignment. . . . . . . . . 33
3-7 A possible way to handle gates with fanout. . . . . . . . . . . . . . . 38
3-8 Improving energy saving by changing the structure of circuit. . . . . . 39
3-9 Application example of the basic algorithm. . . . . . . . . . . . . . . 42
3-10 Pseudo code of the basic algorithm . . . . . . . . . . . . . . . . . . . 43
3-11 Samples of AND tree circuit. . . . . . . . . . . . . . . . . . . . . . . . 46
3-12 The CV-based power gating on AND tree circuit. . . . . . . . . . . . 47
vii
4-1 Calculating the length of the new path when assigning signal i to con-
trol the gate generating signal j. . . . . . . . . . . . . . . . . . . . . . 53
4-2 Counterexample of optimizing power reduction by adopting the best
sleep signal for each gate. . . . . . . . . . . . . . . . . . . . . . . . . . 56
4-3 An example of controlling logic gates with fanout outputs. . . . . . . 59
4-4 Pseudo code of N-based heuristic algorithm . . . . . . . . . . . . . . . 60
4-5 Pseudo code of sub-functions used in N-based heuristic algorithm . . 63
4-6 Multiple power domains can be controlled by one signal. . . . . . . . 65
4-7 Pseudo code of sub-functions used in N-based heuristic algorithm . . 66
4-8 Pseudo code of probability-first heuristic algorithm . . . . . . . . . . 70
4-9 Pseudo code of the sub-function CV prob used in probability-first heuris-
tic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4-10 Pseudo code of pN -based heuristic algorithm . . . . . . . . . . . . . . 76
4-11 An example of controlling logic gates with fanout outputs. . . . . . . 77
4-12 Pseudo code of pN -based heuristic algorithm . . . . . . . . . . . . . . 78
4-13 Trade-off between performance and power saving in the CV-based
power gating. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5-1 A sample circuit used in the experiments for sleep transistor sizing. . 89
5-2 The relation between the width and switching power of sleep transistors. 90
5-3 The voltage fluctuation of internal signal during sleep time. . . . . . . 93
5-4 The comparison of internal signals and virtual ground by using different
sized sleep transistor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5-5 The relation between the size of sleep transistor and power consumption. 97
5-6 An AND gate with a sleep transistor integrated. . . . . . . . . . . . . 98
viii
List of Tables
3.1 The propagation delay of the circuit shown in Figure3-3 . . . . . . . . 35
3.2 The maximum number of sleep gates in the benchmarks . . . . . . . 40
3.3 Application results of the basic algorithm . . . . . . . . . . . . . . . . 45
3.4 Controlling optimization for AND tree circuits. . . . . . . . . . . . . 48
4.1 The estimated power saving results obtained by using N-based algo-
rithm (with steady maximum depth constraint) . . . . . . . . . . . . 68
4.2 The estimation results of probability-first algorithms (with steady max-
imum depth constraint) . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 The estimation result of probability-first algorithm without steady
maximum depth constraint. . . . . . . . . . . . . . . . . . . . . . . . 73
4.4 Estimation results of pN -based algorithms (with steady maximum depth
constraint) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.5 HSpice simulation results (With steady maximum depth constraint) . 83
5.1 RMS Power Consumption versus the Width of Sleep Transistor . . . 96
ix
x
Chapter 1
Introduction
1.1 Power Dissipation of CMOS LSI
Designing devices that consume less energy has always been a major task faced by
all microelectronic designers. For mobile devices, reducing power consumption does
not only lead to longer battery life, but also results in better reliability. CMOS
technology has been widely used in today’s LSI design since CMOS circuits do not
dissipate power unless they are switching. On the other hand, as process technology
scales into the deep-sub-micrometer range, threshold voltages of p and n MOSFETs
are forced to be reduced with power supply voltage in order to maintain adequate
performance and noise margins[1]-[15]. The reduction of threshold voltage leads to
dramatic increase of leakage current in CMOS circuits, which is the main source of the
static power consumption and damages the energy saving obtained by reducing the
power supply voltage. Due to the exponential increasing feature of leakage current,
the leakage power consumption will be as much as or even more than dynamic power
consumption in the near future [16]-[24] (Figure 1-1).
In this dissertation, a novel gate-level power reduction technique is proposed,
which can reduce both dynamic and leakage power dissipation. In order to describe
the feature and capability of the proposed technique more clearly, the two compo-
nents of power dissipation, dynamic power and leakage power, are briefly introduced
together with overviews of current power reduction methods.
1
Figure 1-1: The Trend of Dynamic Power and Leakage Power Dissipation.[77]
In CMOS logic circuits, there are three major sources of power dissipation, the
switching power, short-circuit power and leakage power. Equation 1.1 shows the total
power which is the summation of these three components.
Ptotal = Pswitching + Pleakage + Pshort−circuit (1.1)
Pswitching = α× CL × V 2DD × fclock (1.2)
Pleakage = VDD × Ileakage (1.3)
Pshort−circuit = VDD × Ishort−circuit (1.4)
Equation 1.2 presents the switching power, where CL is the load capacitance, VDD
and fclock are the supply voltage and clock frequency, respectively; α is the switching
activity factor, which measures the probability of the output transition (the switch-
ing probability). The switching power is also referred as dynamic power. The second
power component, leakage power, is presented by Equation 1.3, where Ileakage is the
leakage current, which can be further divided into two types, reverse-bias diode leak-
age on the drains of transistors and sub-threshold leakage. The former one, diode
leakage, can be calculated approximately by Equation 1.5, where AD is the area of
2
the drain diffusion while JS represents the leakage current density. At the current
process technologies, the diode leakage is typically a small fraction of the total power
dissipation in most VLSI chips. Although it could be significant for a circuit spending
a large amount of time in idle, diode leakage is not considered in the further discussion
of this dissertation.
Idiodeleakage = AD × JS (1.5)
Sub-threshold leakage, on the other hand, is a much larger component of the total
power dissipation comparing to the diode leakage duo to its exponential characteristic,
which is shown in Equation 1.6.
Isub = K × e
Vgs−Vt
n×VT {1− e−
Vds
VT } (1.6)
where Vgs is the gate-to-source voltage, Vt is the threshold voltage, K is a constant
factor that only influenced by the process technology, VT is the thermal voltage, which
can be formulated as K × T/q, where T is the Kelvin scale temperature and q is the
electric charge, and finally Vds is the drain-to-source voltage. Assuming Vds is much
larger than VT , which makes {1 − e−
Vds
VT } approximately equal to 1, it is clear that
the sub-threshold leakage current Isub increases exponentially as threshold voltage Vt
decreases.
The above explanations on the power dissipation components give us a more clear
view of possible power reduction. For switching (dynamic) power in Equation 1.2,
reducing the supply voltage VDD seems to be effective due to the fact that the energy
consumption is proportional to V 2DD. This is exactly what engineers focused on during
the last decade. However, according to the delay characteristic shown in Equation
1.7, the reduction of VDD gives significant increase of delay penalty, as shown in
Figure 1-2, which describes the relation between VDD and Delay.
Td =
VDD × CL
I
=
VDD × CL
µCox × (W/L)× (VDD − Vt)2 (1.7)
On the other hand, the overall throughputs of LSI system can be maintained by
3
Figure 1-2: The Relation Between Delay and Power Supply Voltage[81].
reducing the threshold (Vt ) of the transistors together with power supply voltage
VDD. For example, a circuit with VDD = 1V and Vt = 0.3V roughly has the same
performance with a circuit with VDD = 2V and Vt = 1V . Unfortunately, as is noted
in Equation 1.6, the leakage power becomes larger if we reduce Vt. Therefore, there
exists a pair of optimum VDD and Vt for each CMOS process. It is hard to modify
threshold voltage since it is usually fixed in the transistors provided by manufacturer
libraries, providing two different power supply voltages or dynamically modifying the
power supply voltage during runtime are two common strategies used to approach
the optimum VDD and Vt pair. Further discussions about current power reduction
methods are held in the next section.
1.2 Overviews of Power Reduction Methods
A lot of power reduction methods have been developed during the last decade. These
methods can be classified into two categories: one is to reduce the dynamic power
and the other is to reduce the static power. In this section, several power reduction
4
methods are briefly introduced, including dynamic power reduction methods such like
clock gating, Dynamic Voltage and Frequency Scaling (DVFS), guarded evaluation /
operand isolation, etc., and leakage power reduction methods such as power gating.
The purpose of this section is to explain the advantages and drawbacks of these
state-of-the-art techniques in order to clarify the requirements of future low power
technology development.
1.2.1 Dynamic Voltage and Frequency Scaling
Dynamic voltage and frequency scaling, usually abbreviated as DV FS, has been
shown as an effective dynamic power management method[76]. The whole idea behind
DV FS is to provide minimum power supply voltage and clock frequency to the target
circuitry in order to accomplish a certain task while meeting the requirement on
execution time. If a task is light, then we can just use low VDD and low clock
frequency, while if the task is heavy, high clock frequency will be provided. Practically,
DV FS is usually used in LSI systems that are required to execute large numbers of
different types of tasks or multi-thread processing. Currently, a number of modern
microprocessors include the DV FS functionality.
Two types of DV FS have been implemented, the coarse-grained and fine-grained.
The former one is performed at the operating system or application level, while the
latter one is usually applied to each individual blocks in an application or a software.
If the execution time of a task at any clock frequency can be accurately predicted, the
DV FS would be the perfect method to reduce dynamic power. Unfortunately, it is
nearly impossible to accomplish this. In [76], the efficiency of DV FS is measured by
performing some CPU-bound programs and memory-bound programs on a XScale−
based embedded system platform. For memory-bound programs, more than 70% of
CPU energy saving has been observed with 12% performance loss. For CPU-bound
programs, 15% - 60% energy saving is shown at the cost of 5% - 20% performance
loss.
The disadvantages of DV FS technology are also significant. An obvious one is
that, the power supply controlling unit and the clock supply unit can both be quite
5
complex. The power consumption of the power supply unit is also a problem. An-
other problem of DV FS is that the proper power supply voltage and clock frequency
for each task is hard to predict and we need a complex control circuit to decide au-
tomatically. Correspondingly, the cost of developing DV FS-embedded firmware or
even software drivers is also considerable. Furthermore, the prediction and control-
ling of power supply voltages and clock frequencies are based on the assumption that
the important information of all possible tasks, such as workload, deadline, etc., are
already known beforehand. Thus it is rather difficult to accurately predict the voltage
and frequency in general-purpose operating systems. The inaccuracy of predictions
for execution time usually leads to unexpected performance penalty. Therefore, like
most other power reducing methods, engineers have to consider and measure the
performance-energy trade-off when applying DV FS techniques.
Currently, the requirement of minimizing area is rising in the development of
portable devices. Since highly complex control and predicting mechanisms are es-
sential in DV FS, it is hard to apply DV FS to portable devices and more compact
method is necessary.
1.2.2 Clock Gating
Clock gating is another commonly used dynamic power reduction method for syn-
chronous circuits, where clock signal is stopped by an enable signal. Clock distribution
network usually consumes more than 50% of dynamic power of sequential circuits, so
we can reduce power by stopping the clock signal. The basic methodology is to pro-
vide the clock signal to logic blocks or registers only when working, otherwise disable
the clock signal to prevent further switching. The typical architecture of clock gating
is shown in Figure 1-3.
Clock gating can be divided into two categories: local clock gating and global
clock gating. Local clock gating provides clock-gating mechanism to each register in
order to reduce the power dissipation individually, while global clock gating controls
the clock signal of a whole block or a module. Clock gating is simpler than that
of DV FS, but clock gating also requires some level of control logic for idle state
6
Figure 1-3: The Basic Architecture of Clock Gating.
Figure 1-4: The Two Typical Styles of Clock Gating.
determination and clock-gating unit.
There are two typical clock gating styles used in LSI designs. One is latch-free-style
and the other is latch-based-style. The latch free style, as is shown in Figure 1-4(a),
adopts a single AND or OR gate to perform the gating, which is easy to implement.
However, due to the difference of arriving time of clock signal and enable signal,
the resulting gated-clock signal can either terminate prematurely or contain multiple
clock pulses. These glitches make the latch-free clock gating almost unavailable.
Latch-based clock gating style uses an extra latch and suffers no glitch. As is shown
in Figure 1-4(b), an extra latch is introduced so that the value of the enable signal
is hold until the clock pulse is generated completely. In this case, the clock-gating
7
Figure 1-5: The basic structure of guarded evaluation[79] and operand isolation[80].
control signal is only required to be stable at the rising edge of the clock signal.
Since applying clock gating does not require modifications of RTL descriptions,
it is already supported by many commercial EDA tools and libraries. The on-chip
control circuit of clock gating still takes several percentages of the total chip area,
which can be smaller compared with DV FS though.
1.2.3 Guarded Evaluation / Operand Isolation
Two other commonly used low power technologies are guarded evaluation [79] and
operand isolation[80], the basic ideas of which are similar. In these two techniques,
the input data of a logic block is guarded (isolated) by using some latches or gates if
the output of the block is not used. In this way, unnecessary switching of inputs is
prevented from being propagated to the logic block and the dynamic power can be
reduced.
The basic structure of these two techniques is shown in Figure 1-5. We can see
that S is the select signal of the multiplexer, S and S are used as the control signals of
the latches guarding the inputs of block F1 and F0, respectively. When S is logic-1,
the output of block F1 is selected and the inputs of block F0 are guarded by the
8
latches controlled by S. In this way, the block F0 is forced to be idle and block F1
works normally. On the other hand, when S takes logic-0, block F0 is enabled and
block F1 is forced to be idle.
The guarded evaluation and operand isolation are effective at reducing the dy-
namic power reduction but not effective at reducing leakage power. The two methods
are usually applied at function module level like ALU’s, etc., and it seems not worth
implementing these two techniques in fine-grained style since more guarded latches
must be inserted as the gradation goes smaller and the growing power consumption
of the latches might cancel the power saving. Therefore, a similar mechanism which
can force the unused logic elements to be idle but with smaller gradation is expected.
1.2.4 Power Gating
Power gating, also referred as Multi-Threshold CMOS (MTCMOS), is now the prime
leakage current reduction technology [1]-[69]. The basic idea of power gating is to cut
the not-in-use logic blocks from their power supplies by using high-threshold tran-
sistors as switches so that the leakage current can be reduced. Figure 1-6 illustrates
the architecture of power gating technology, in which a sleep transistor is inserted
between the GND line and the logic block. During the idle time of the logic block,
the sleep transistor is turned OFF by some control signal so that the leakage current
is reduced, while during the active time, the sleep transistor is turned ON so that
the logic block is reconnected to the power supply. Commonly, the idle period and
the active period of the logic block are referred as “sleep mode” and “active mode”
in power gating technology. Besides of nMOS transistors shown in Figure 1-6, pMOS
transistors can also be used as sleep transistors, distinguished from their position,
nMOS sleep transistors and pMOS sleep transistors are usually referred as footer and
header transistors, respectively.
Similar with clock gating, power gating can be implemented in either cluster-
based (fine-grain) approach or coarse-grain approach. In fine-grain style, each sleep
transistor is added to every cluster or even every gate, providing an individual power
management at the cost of area penalty. In this case, the sleep transistors are usually
9
Figure 1-6: Basic Architecture of Power Gating.
embedded as a part of the standard cells of the libraries so that it can be easily
handled by the EDA tools. Therefore, it is not necessary to change the synthesis and
layout strategies and is compatible with normal design methodology. The coarse-
grain style, on the other hand, use just one sleep transistor to drive multiple cells
though shared virtual power supplies. This approach has less area overhead but the
layout is a little bit complex since the sleep transistors are part of the power supply
network rather than the standard cells.
The control of the sleep transistors can be implemented either by software drivers
or by dedicated on-chip power management units, the latter one, obviously, leads to
significant area overhead. It has been reported that by using power gating technique,
over 80% of leakage power dissipation can be reduced[16]-[49]. The area overhead
further comes from the usage of the sleep transistors, the size of which is much larger
than the transistors inside the logic block in order to minimize wakeup time. Dur-
ing sleep mode, the floating outputs of the power-gated logic blocks may influence
the normally ON blocks and cause short circuit current. Therefore, extra logic mod-
ules called isolation cells are required to temporarily force the floating outputs to be
logic 1 or 0. Furthermore, the data or state stored in memory devices needs to be
protected during sleep mode so that they can be put back when the block becomes
active again. This state retention mechanism is provided by some retention registers,
10
Figure 1-7: Power Gating by Using Enable Signals of Clock Gating.
which are also controlled by power management unit. The isolation cells and reten-
tion registers further add control complexity and area overhead to the power gating
method. Therefore, a leakage power reduction method with less area penalty and
control complexity is expected.
The generation of sleep signals is critical in any power gating method. Since a
current LSI design usually adopts several different low power techniques to achieve
the maximum power reduction effect, there might already exists signals for other
low power methods that can tell whether the target logic blocks are idle or active.
In [12], a power gating method using the control signals for clock gating to control
the sleep transistors is proposed. Figure 1-7 shows the architecture of this power
gating method. According to the experimental results shown in [12], this method
achieves approximately 83% of leakage power reduction when applied to a 32bit RISC
embedded CPU with the area penalty of 20%
Although the method is referred as a run-time power gating method with locally
extracted sleep signals, its effectiveness depends on the control signal of the clock
gating, and it is hard to improve the effectiveness since it is based on a different
method. However, this method represents the current trend of power gating method,
simpler, more localized and compatible with other low power technologies.
Another feature of today’s power gating techniques is granularity. In the early
days of power gating, the target circuits are handled as an entirety and the power
supply is cut off only when all elements of the circuit is unused. This coarse-grain
11
style of power gating is relatively easy to implement since the control logic is gen-
erally not complex. Unfortunately, coarse-grain power gating significantly loses the
opportunities to further reduce power consumption because the condition of cut off
the power supply is limited. Therefore, power gating control mechanisms that can
partially cut off power supply to certain part of the target circuit is expected. Cur-
rently, circuits are clustered in to many blocks called “power domains”, and a sleep
transistor is inserted to each power domain, or a sleep transistor might be inserted
to an individual gate in order to grub every opportunity to cut off power supply.
In order to reduce both dynamic (switching) power and static (leakage) power
dissipation, as is mentioned, leads to considerable area penalty due to the requirement
of complex power control logic. Therefore, area-efficient low power method is also
heavily expected, and it would be even better if this low power method is capable of
reducing both aspects of power dissipation at the same time.
A fine-grain and area-efficient low power technique, which can be easily imple-
mented without complex power management and can reduce both dynamic power
and leakage power, is the object of this dissertation.
1.3 Contributions of the Dissertation
In this dissertation, a fine-grained power gating method based on controlling values
(CV) of logic elements is proposed. It introduces a completely novel mechanism for
power gating control by utilizing the control properties of CMOS logic elements. Two
major features of the CV-based power gating are described:
1 The proposed CV-based controlling mechanism provides the capability of self
power control for all CMOS logic circuitry. Since all sleep signals are extracted
locally, no extra power control circuitry is required. Furthermore, the archi-
tecture of this power gating control grants the signal isolation, thus generally
signal isolation cells are not essential in the proposed method. Therefore, the
CV-based power gating method is expected to be area efficient comparing to
the previous power gating methods.
12
2 The power supply of logic blocks is controlled according to the logic value of the
corresponding sleep signal. In CV-based power gating, since the sleep signals
are actually original signals of the target circuit, the power control is performed
in an “on the fly” fashion, which means the power supply can be cut off during
runtime. Thus not only static power, but also dynamic power consumption can
be effectively reduced during active mode. This feature distinguishes the CV-
based power gating from other existing power gating methods. The mechanism
for reducing the dynamic power is similar to the guarded evaluation[79]/operand
isolation[80], but both gradation and extra resources are smaller.
Based on the CV-based controlling mechanism, we are focusing on the maximum
energy saving while keeping the overhead in an acceptable range. In order to achieve
this, the power consumption with the CV-based power gating is formulated statis-
tically with the two major factors that influence the energy saving, the size of the
power block (N) and the CV probability of the corresponding sleep signal (p). The
possibility of achieving the optimum energy saving is then discussed. Furthermore,
several heuristic algorithms are proposed to measure the importance of the factors,
including a depth-based algorithm, a size-first algorithm and a probability-first algo-
rithm. Based on these discussions, a pN-based algorithm is proposed, which is able
to achieve near-optimum energy saving within an acceptable runtime. Estimation
shows that the proposed pN-based algorithm is expected to be averagely 1.9 times
better than other heuristics in terms of energy saving and takes only 52% of runtime.
Furthermore, a steady maximum depth constraint is introduced in order to prevent
the delay overhead due to the critical path change after applying the proposed power
gating method. There is a trade-off between performance and energy saving. The
efficiency of steady maximum depth constraint is checked by simulation, which shows
that with 20% - 30% tolerance of maximum depth increase, the energy saving shall
be almost the same with that of no maximum depth constraint.
According to our HSPICE simulation results, even with 0.18m process technol-
ogy, the CV based power gating achieves up to 27% of total power reduction on
the ISCAS’85 standard benchmarks. Since the leakage current is nearly negligible in
13
0.18µm process, the power reduction can be considered as dynamic power reduction
only. Therefore more energy saving is expected under deep sub-micrometer process
technologies where leakage current becomes an enormous component. For the al-
gorithm comparison, the proposed pN-based algorithm performs significantly better
than other heuristics with approximately 10% more energy saving achieved.
1.4 Organization of the Dissertation
The rest of the dissertation is organized as follows.
Chapter 2 [Preliminaries] explains several key concepts and techniques involved
in this dissertation, including controlling value of logic elements and Binary Decision
Diagram (BDD). The overview of existing power gating methods is also shown in
this chapter. Chapter 3 [Fine-Grained Power Gating Based on Controlling Values of
Logic Gates] proposes the basic idea and architecture of a fine-grained power gating
method based on the controlling value of CMOS logic elements. For future optimiza-
tion, the behavior and feature of the proposed power gating method are fully analyzed
in order to show both potential and limitation. Chapter 4 [Heuristic Optimization of
Expected Number of Sleep Gates in Controlling Value Based Power Gating] focuses
on raising the energy saving potential while keeping the overhead in an acceptable
range. A steady maximum depth constraint is introduced in order to reduce the delay
overhead due to the critical path change after applying the proposed power gating
method. Three implementation algorithms are described, all of which are able to
achieve considerably high energy saving within little performance loss. The efficiency
of the algorithms are shown by simulation results. Chapter 5 [The Sizing of Sleep
Transistors in CV-based Power Gating] discusses the problem of sleep transistor siz-
ing. By performing post-layout simulation, the optimum size of the sleep transistor is
investigated experimentally. Chapter 6 [Conclusion and Future Work] concludes this
dissertation by summing-up the CV based power gating method as well as the opti-
mizing algorithms. Furthermore, some future works such as physical implementation
and verification strategies are also addressed.
14
Chapter 2
Preliminaries
The important point of a successful power gating is to shut down the power supply
of a proper logic block during a proper time period. So, the clustering of power-
gated blocks and the control mechanism are the key issues. In order to make the
further explanation more clear and accurate, two preliminaries about the proposed
controlling-value-based power gating method are shown in this chapter. One is the
concept of “controlling value” (CV) and the other is Binary Decision Diagram (BDD).
By using these concepts, we will explain the main proposal of this dissertation.
2.1 Controlling Values of Logic Elements
In modern digital LSI designs, Complementary-Metal-Oxide-Semiconductor (CMOS)
technology is adopted for constructing logic circuits because of its high noise immunity
and low static power consumption. In CMOS logic circuits, basic logic functions are
performed by elements called logic gates, which are implemented using nMOS and
pMOS transistors. Due to the existence of threshold voltages in transistors, the
two logic values, truth and false, can be perfectly represented electronically by the
voltage level of each port. In digital logic circuits, the truth and false of logic values
are usually referred as logic-1 and logic-0.
Even the most complex combinational logic circuits in today’s digital applications
are implemented by only several types of basic logic elements. These logic elements
15
are called universal logic gates, if they can implement any Boolean functions. 2-input
NAND and NOR are two typical universal logic gates, the truth tables of which are
shown in Figure 2-1(a) and Figure 2-1(b), respectively.
Figure 2-1: Two Types of Universal Logic Gates, NAND and NOR.
By observing the truth tables of both NAND gate and NOR gate, it is easy
to find that there exits a certain logic value which determines the value of output
independent to another input. For the NAND gate, the output is logic-1 if an input
takes logic-0. On the other hand, if an input takes logic-1, the output is decided by
the other inputs. So logic-0 is called the controlling value of NAND gate and logic-1
is called the non-controlling value, since logic-0 takes control of the output value if it
is given to any input. For a NOR gate, logic-1 is the controlling value and logic-0 is
the non-controlling value.
The same property can also be found in many other widely used logic elements,
such as AND gates and OR gates. For AND gates, logic-0 is the controlling value.
For OR gates, logic-1 is the controlling value. For XOR gates, unfortunately, there is
no controlling value.
The controlling value has been utilized in many applications. For example, the
previously mentioned “Isolation Cell” is one of the applications of controlling value.
16
In this dissertation, controlling value (CV) is used for the proposed power gating
method. The details shall be described in the next chapter.
2.2 Binary Decision Diagram
Binary Decision Diagram (BDD) is a directed acyclic graph to represent a Boolean
function and BDD package is one of the most important tools used in this disser-
tation[78]. A binary decision diagram can be considered as a data structure for
representing a Boolean function. Any Boolean function can be described as a rooted
and directed acyclic structure, like a binary tree. A BDD for a logic function has 1
root node with no entering edge, and 2 leaf nodes with no emanating edges corre-
sponding to the constant-1 node and the constant-0 node. Non-leaf nodes (including
the root node) are called decision nodes in a BDD. Each decision node is labeled by
a Boolean variable and has two edges called the 0-edge and the 1-edge, representing
an assignment of the variable to 0 and 1.
Starting from the root node, and selecting the 0-edge or 1-edge for variable nodes
depending on the value assignment for the variables, we can reach the constant-0
node or the constant-1 node. The 0 or 1 on the constant nodes means the value of
the logic function with value assignment for each variable as the passing edge types
(0-edge or 1-edge). Note that each node in a BDD also represents a sub logic function.
An example of a BDD with variables A, B and C is shown in Figure 2-2. The top
node represents the Boolean function (A·B) +C, and the variable node labeled with
A also represents a logic function A·B. Note that BDD’s became different graphs if
the variable ordering is different. In Figure 2-2, we use the order c < a < b, where
“<” means upper in the graph.
In many cases, a BDD may contain isomorphic subgraphs, which correspond to the
same function. The isomorphic subgraphs of a BDD can be merged to one subgraph.
The structure of a BDD depends on the order of variables. If we use the same
variable ordering on all paths, then that is called the usage of fixed variable ordering.
Generally, a BDD after the isomorphic merge and the usage of fixed variable ordering
17
Figure 2-2: An Example of Binary Decision Diagram.
for all paths is called Reduced Ordered Binary Decision Diagram (ROBDD). An very
useful property of ROBDD is that, for a particular logic function, the ROBDD is
unique for a decided variable ordering, which grands the ROBDD a big advantage
when it comes to functional equivalence checking. Isomorphic subgraphs can be found
in the BDD’s of many Boolean functions. If we use the same variable ordering for
such BDD’s, then we can merge the isomorphic subgraphs.
The size of a BDD depends on the ordering. The number of nodes in a BDD
might end up to be linear in the best case and be exponential in the worst case.
An example is shown in [29], where the number of nodes of function f(x1, ..., xn) =
x1x2+ x3x4+ ...+ x2n−1x2n varies from 2n to 2n+1 due to different order of variables.
Unfortunately, the problem of finding the best variable ordering is NP-hard, but some
efficient heuristic algorithms have been developed such as tackle the problem.
A BDD is capable of representing any Boolean function, in order words, any
combinational logic circuit can be described in forms of BDD as well. For each output,
a BDD can be built to represent the logic function by applying logic operations based
on the circuit structure. For a given combinational circuit, BDD is the simplified
representation of the truth table for the circuit, thus any circuit can be represented
by a unique BDD if we decide the variable ordering. BDD has already been used
18
widely in symbolic model checking algorithms in verification area.
We can do logic operations on BDD’s by using a recursive procedure. For example,
if we have BDD’s for f(a, b, c) and g(a, b, c), then in addition we do AND operation for
the two BDD’s, then we can obtain the BDD for f(a, b, c)· g(a, b, c). The procedure is
starting from the roots of two BDD’s, and if the variables of the roots are the same,
then go to the 0-edges and 1-edges. After several repetitions, we reach the 0 or 1
constant node and we can do the logic operation.
However, as the logic function of current LSI keeps becoming more and more
complex, the number of nodes of BDD can increase exponentially, which is called
BDD node explosion. The explosion of BDD nodes usually causes memory overflow
which makes BDD almost impossible to handle large circuits. So several memory
reduction methods have been proposed including the global node sharing, localized
BDD construction, etc.
In this dissertation, the Binary Decision Diagram is constructed from a circuit
data, and is used for the computation of the controlling value probability for certain
signals in the target circuits. Note that if we can compute the probability with
other methods, we can apply our method without using BDD’s. Although we also
suffer from the BDD-node explosion during our research, the efficiency of BDD is
still acceptable mainly because of the number of logic gates in the benchmark circuits
used in this dissertation are not larger than 10 thousands.
For each signal in a circuit, or in other words for each BDD corresponding to the
signal, the probability of taking logic-0 or logic-1 can be defined and computed as
follows. Generally, if full input patterns or random N input patterns are applied to
a circuit, and n patterns of them takes logic-0, then we can define the probability for
signal v to take 0 as follows.
Plogic−0(v) =
n
N
(2.1)
The probability to take logic-1 can be defined in the same way or Plogic−1(v) =
1− Plogic−0(v).
19
Since a BDD has the information same as the truth table, it can be used to
calculate the logic-0 and logic-1 probabilities. Take the BDD in Figure 2-2 as an
example. The probability of being 0 for the constant-0-node is defined as 1.0, while
the probability of being 1 for the constant-0-node is defined as 0.0. The logic-0 and
logic-1 probabilities of the constant-1-node is defined conversely. Then by observing
the node B, it is found that B has two edges pointing to constant-0 and constant-1
respectively, therefore B should have an equal chance of taking logic-0 and logic-1, so
the logic-0 and logic-1 probability of node B are both 0.5. Node A, on the other hand,
has a emerged left edge, which counts as two edges pointing to constant 0. The right
edge of node A points to node B which has one edge to 0 and one edge to 1. Therefore,
the total edge count of node A is 4 and 3 of them point to constant 0, which makes the
logic-0 probability of node A is 3/4(0.75) and the logic-1 probability, correspondingly,
is 1/4 (0.25). The root node of this BDD, node C, has its left edge pointing to node
A and its right edge pointing to constant-1. Therefore, considering the merged edges,
node C totally contains 8 edges pointing to constant-1 and constant-0 nodes, among
which 5 edges point to constant-1 and 3 edges point to constant-0. Thus the logic-1
probability of node C is 5/8 (0.625) and the logic-0 probability is 3/8 (0.375).
In general, for a variable v, the logic-0 probability and logic-1 probability can be
calculated by using the following formulas.
Plogic−1(v) =
1
2
× (Plogic−1(v0) + Plogic−1(v1)) (2.2)
Plogic−0(v) =
1
2
× (Plogic−0(v0) + Plogic−0(v1)) (2.3)
where v0 and v1 are the left and right children of node v, respectively. 1/2 in the
formula corresponds to the value probability of the variable of the node v. We can
compute the probabilities by using the pre-assigned value probability for each variable.
A result obtained in Figure 2-2 is based on the assumption.
In this chapter, two most important preliminaries, the concept of controlling value
and the Binary Decision Diagram, are explained to prepare for the description of our
20
proposed power gating method.
In the following Chapters, the proposed power gating method is described, which
is considered to be area-effective, easy to implement and is capable of reducing both
dynamic power and leakage power dissipation during operation.
21
22
Chapter 3
Fine-Grain Power Gating Based on
Controlling Values of Logic Gates
The main proposal of this dissertation, a fine-grain power gating mechanism based
on the controlling values of logic gates, is described in detail in this chapter. In the
first section, the basic idea and mechanism of the proposed power gating method are
explained. Secondly, the advantages and disadvantages are discussed and some solu-
tions to the drawbacks of this method is addressed. Furthermore, a basic algorithm
for the proposed power gating method, which is referred as depth-based algorithm, is
presented to show the efficiency of the controlling value based low power technique.
3.1 Basic Idea
The idea of power gating is to cut off the power supply of idle logic blocks by using
some methodology like standby mode analysis. In this dissertation, however, we do
not only apply power gating control to the blocks in standby mode, but also cut the
power supply of logic circuitry whose outputs are not in use during active mode. In
other words, if the output values of a logic block do not have any influence to other
parts of the circuit, power gating can be applied to the block. For example, if one input
of a multiplexer is selected, the other inputs becomes useless during this period even
though the logic blocks generating other inputs are not idle. Therefore, the energy
23
Figure 3-1: An example of controlling value based power gating.
consumed to generate the other inputs can be considered as a form of dissipation.
However, due to the complexity of current logic circuits and the potential electrical
problems caused by the floating outputs of power-gated blocks, runtime power gating
is usually not easy to implement.
In Chapter 2, the concept of controlling value is introduced, which shows that if
one input of a logic gate takes the corresponding controlling value, the output of this
logic gate is determined no matter what logic values are taken by other inputs. This
property of CMOS logic elements provides opportunities to identify temporarily use-
less signals and optionally occlude the power supply to the logic elements generating
those useless signals. The sleep control methodology, is simple since the signal taking
the controlling value can be directly used as sleep signal for the block generating the
other input. The binary-valued nature of digital circuits is suitable to the sleep con-
trol. Since there are two possible logic values in digital circuits (logic-0 and logic-1),
no matter what logic value the controlling value is, there are two states, a state in
which this input takes controlling value or a state in which the input does not take
the controlling value. Therefore, any input of a logic element can be adopted as the
enable signal or sleep signal.
An AND gate case is shown in Figure 3-1. In this figure, an nMOS type transistor
is used as the sleep transistor and inserted between the ground line and the logic block
1. The sleep transistor is simply controlled by signal B, an input signal of the output
24
Figure 3-2: Controlling value based power gating by using a pMOS sleep transistor.
AND gate. When signal B takes the controlling value, logic-0, the sleep transistor
is turned off thus the logic block 1 will be cut off from the ground line, by this, A
might be unknown value. While the final output, signal S1, is determined by signal
B and will be 0 independent to the value of A. When signal B takes logic-1, the sleep
transistor is turned on simultaneously and the logic block 1 is activated. The output
signal S1 will be determined by both input signals A and B. For NAND gate, since
the controlling value is also logic-0, the architecture of controlling value (CV) based
power gating is the same as shown in Figure 3-1.
On the other hand, the controlling value of NOR and OR gates are logic-1, which
makes nMOS transistor ON, and we need an inverter in order to control the nMOS
sleep transistor for NOR or OR gate. The additional inverter causes further area
penalty. On the other hand, pMOS sleep transistors can be controlled by the control
value of NOR(OR) gate, so we adopt this as shown in Figure 3-2.
By using both nMOS and pMOS sleep transistors, the power gating control of
both types of logic elements that taking logic-0 and logic-1 as controlling value can
be implemented without any extra control circuit. This is one of the advantages of
the proposed power gating method. With the proposed method, the area penalty
of power gating is not so large. Furthermore, the proposed power gating method is
expected to be easy to implement with only transistors and wire connections required.
Since shutting down the sleep transistors does not require the corresponding power
25
blocks to be idle, the logic blocks can be connected and disconnected to the power
supply dynamically during runtime. The proposed method can reduce, not only sub-
threshold leakage power dissipation, but also dynamic power dissipation.
In addition, the proposed CV based power gating method, as is explained in this
section, is based on the controlling property of CMOS logic gates, which are the
foundation of almost every logic circuit, thus the CV based power gating method is
considered to be a general-purposed low power technique, which can be applied to
any logic circuit and is capable of reducing both dynamic power and leakage power
solely without combining with other low power technology or any external assistance.
Even in logic circuit where there is no obvious NAND/AND/OR/NOR logic gates,
such like a multiplexer, the CV-based power gating method can still be implemented
by using the select signal of the multiplexer to control the sleep transistor for the
logic blocks generating the unselected input signals.
It is obvious that the power reduction potential of the CV-based power gating
method highly depends on how frequently the sleep signal takes controlling value. At
the same time, the size of power gated logic blocks, in other words, the number of
power-off gates, also plays an important role in this method. So, the CV-based power
gating method highly depends on the structure of the target circuit. For certain logic
circuits such as AND-tree circuits, control signals with more than 90% probability of
being controlling value can be found, the efficiency of the proposed method is high.
On the other hand, for circuits like XOR-tree circuits, the proposed method is not so
effective.
In low power technologies, the delay penalty is always an issue. The CV-based
power gating method, unfortunately, suffers not only from the wake-up delay of sleep
transistors, but also from the potential critical path change. In the worst case, all logic
gates in a circuit might be connected in series after applying the proposed method,
where critical paths become several times longer than the original critical path. The
change of critical path results in dramatic performance loss and we need a method to
restraint the increase of critical path. We will discuss the delay issue later.
26
3.2 Properties of Controlling Value Based Power
Gating
Before we further explain the detailed algorithms of the CV-based power gating, the
advantages and disadvantages of the CV-based power gating method will be discussed
first in order to clarify the motivation of several important principles involved in
the proposed method, such as sleep signal selection criteria, steady maximum depth
constraint, etc.
3.2.1 Dynamic Power Reduction
Previously, dynamic power reduction has not been discussed by any existing power
gating technique. The proposed Controlling Value (CV) based power gating method,
on the other hand, is shown to be able to reduce both dynamic and leakage power
effectively during runtime.
To achieve dynamic power reduction, an approach must be implement so that the
switching of logic gates can be stopped dynamically during active time. In the CV-
based power gating, the dynamic power is reduced because of the switching ability
decrease due to the fluctuating voltage of virtual power supply (ground) line during
sleep mode.
During the sleep period, since the power-off logic blocks are disconnected to the
power supply, the voltage of virtual ground line is charged up gradually to a voltage
near V DD, while the voltage of virtual V DD discharges gradually down to a voltage
near the ground voltage, V SS. The floating voltages of virtual power lines are usually
considered as a side-effect of power gating technology since the charge and discharge
of virtual power line also consume certain amount of energy and the floating voltage
often causes electrical problems to the power-on circuitry. Therefore, several methods
are developed to alleviate this effect. In the CV-based power gating method proposed
in this dissertation, however, we found that the fluctuating voltage of virtual power
lines can be utilized to achieve dynamic power reduction.
27
As is described in Chapter 1, the dynamic (switching) power dissipation in CMOS
logic circuitry can be formulated as follows:
Pswitching = α× CL × V 2DD × fclock (3.1)
In the case of power gating, since the existence of sleep transistors and virtual
power lines, the formula of dynamic power dissipation is further modified as follows:
PDynamic = α× CL × (VDD − Vvirtual ground)2 × fclock (3.2)
PDynamic = α× CL × (Vvirtual vdd − VSS)2 × fclock (3.3)
where Equation (3.2) shows the case of power-gated logic block with nMOS sleep
transistors while Equation (3.3) shows the case with pMOS sleep transistors. After
the power-gated logic blocks is put in sleep mode for a certain time period, which
is usually referred as deep-sleep in power gating technique, the voltages of virtual
power lines will stabilize to voltages opposite to the voltage they take in active mode.
That means, for nMOS sleep transistors, during sleep mode, the value of VDD −
Vvirtual ground is smaller comparing to active mode, which results in the reduction of
dynamic power. In fact, the reduction of VDD − Vvirtual ground effects the switching
ability of power-off logic gates, which makes them unable to switching even if their
inputs are still changing.
Figure 3-4 shows the power dissipation chart when applying full input patterns
to the circuit shown in Figure 3-3. The curve is obtained by using HSPICE with
0.18µm process technology.
During the interval 160ns to 320ns, signal in3 is logic-0, thus the AND gate G1
is in sleep mode during this period, It is easy to observe that there is no switching
power consumed in this interval even though the input signal in1 and in2 are still
changing. While the power chart of the original circuit, which does not contain
the sleep transistor connected on node N1, shows that during the same period, the
dynamic power of G1 is still dissipated although it makes no difference on the value
of primary output signal out3. Figure 3-4 illustrates that during the sleep period,
28
Figure 3-3: An sample circuit to show the dynamic power reduction in CV-based
power gating method.
the power-off logic gates are forced to be idle due to the reduction of power supply
voltage. The same phenomenon can also be observed in the power gated logic blocks
with pMOS header sleep transistors, the difference is the switching inability is caused
by the drop of voltage on the virtual VDD line. Even in the light-sleep stage, which is
the beginning of sleep mode, the dynamic power can still be reduced since the power
supply voltage decrease.
The capability of reducing dynamic power by power gating technique is usually
overlooked since the traditional power gating methods are combined with other dy-
namic power reduction method, such as gated-clock, etc. In the CV-based power
gating method, however, the logic gates can by turned ON and OFF dynamically in
an extremely fine-grain way, while the whole circuit still works normally without any
functionality problem.
Furthermore, the dynamic power reduction in CV-based power gating does not
29
Figure 3-4: Dynamic power reduction in CV-based power gating method.
require that the whole module is idle and it is compatible with other dynamic power
reducing methods. We can use clock gating to reduce dynamic power of registers and
connected combinational circuits while use the proposed method to further reduce
the dynamic power during active time. The ability of reducing dynamic power dis-
tinguishes the controlling value based power gating method from other power gating
methods.
3.2.2 Control Granularity and Area Overhead
The area overhead is nearly inevitable in any power gating method because of the
insertion of sleep transistors. Traditional power gating method usually leads to an-
other area overhead caused by the extra power control circuit attached to the original
circuit in order to control the sleep transistors. In current LSI’s, the power control
circuit takes approximately 5% - 10% of total chip area. Furthermore, in order to
achieve maximum energy saving and to prevent negative influence to other power-
on circuit, the power control circuit is required to monitor the running state of the
power-gated circuit together with the signals between the power-gated part of circuit
and other part of circuit. Therefore, it is relatively difficult to implement the power
control logic.
In the controlling value based power gating proposed in this dissertation, a logic
30
Figure 3-5: An isolation cell implemented by using an AND gate.
block is put into sleep only when its corresponding sleep signal takes the controlling
value of their successor gate. In other words, the sleep control is performed based only
on the logic function of the original circuit. Thus no extra control logic is required
in the CV-based power gating method, which significantly reduce the area overhead.
In addition, since the power gating mechanism can be formed without extra power
control circuitry, the CV-based power gating method is relatively much easier to
implement. In fact, just wire connections are necessary besides sleep transistors.
Other than power control circuit, the isolation cell also leads to significant area
overhead, which is essential in traditional power gating methods. In CV-based power
gating method, the isolation cells are not required, since the gate connected to the
power-off block behaves as an isolation cell.
As is mentioned in Chapter 1, during sleep mode, the output voltage of power-
off gates becomes floating and will result in serious electrical problems if they are
still connected to other power-on part of the circuit. In this case, it is necessary to
force the floating outputs of the power-off logic gates to logic-1 or logic-0 in order
to prevent the consequences. Commonly used isolation cells are AND or OR gates
and the isolation signals are provided by power control circuit. Figure 3-5 shows
an example of implementing an isolation cell by using a signal AND gate. When
the power-gated block is put into sleep, the isolation signal becomes logic-0 and the
output of the isolation AND gate is forced to be logic-0 as well, so that the floating
output of the power-off block is not capable of influencing other part of the circuits.
After the block wakes up, the isolation signal is set to logic-1 and will hold this value
during the active period of the power-gated block. Thus the output of the block can
31
go though the isolation AND gate in order to contribute to the function of the whole
circuit.
Although, recently other more concise structures of isolation cells have been pro-
posed and adopted in commercial LSI designs such as implementing the isolation cells
with pull-up or pull-down transistors, the implementation by using AND or OR gates
for isolation cells is still the foundation of signal isolation.
By observing the structure of controlling value based power gating shown in Fig-
ure 3-1 and Figure 3-2, it can be told that the output gate itself can be considered
as the isolation cell. Take the circuit in Figure 3-1 as an example, when signal B be-
comes logic-0, which is the controlling value of the output gate, the primary output
signal S1 is then determined to be logic-0. At the same time, the logic block 1, which
generates signal A, is disconnected from the ground line because the sleep transistor
is turned off by signal B. Although the signal A will be floating during this period,
the primary output S1 will still be stable because the output AND gate acts just
like an isolation cell which separates the signal A away from other part of the circuit.
Therefore, no signal isolation is required in the controlling value based power gating
method.
The CV-based power gating method requires neither power control circuitry nor
isolation cells, which makes the proposed method quite easy to implement and area
efficient.
By adding some extra logic we can power-off more gates, like the gates with fanout
outputs, however, we adopt “no extra control logic” principle when we apply the CV-
based power gating method, even though this leads to some loss of energy saving.
Those situations are discussed further in the next section.
3.2.3 Delay Overhead
In controlling value (CV) based power gating technique, the delay overhead comes
mainly from two aspects, the critical path change and the wake-up delay of sleep
transistors. We only show the problem of delay overhead in this section, and the
solution is described in the coming chapter.
32
Figure 3-6: Critical path change caused by sleep signal assignment.
Critical Path Change
One of the major advantages of the CV-based power gating method is that all sleep
signals are extracted locally from the original circuit. This might affect the critical
path when considering the power gating control as another input to each gate. By
adopting the original signals as sleep signals and connecting them to other parts of
the circuit, the structure of the original circuit is directly changed, and the critical
path delay might be larger than that of the original circuit. The change leads to
significant increase in the worst case.
Figure 3-6 shows an example of the worst cases. In the example, the original
critical path is 2→ 3→ 4→ 5. After applying the CV-based power gating method,
the output signal of gate 1 is assigned as the sleep signal to control gate 2, 3 and 4 in
order to achieve maximum energy saving. After that, the critical path of this circuit
becomes 1 → 2 → 3 → 4 → 5, which is 1.25 times longer than the original one. In
addition, it can be seen that all logic gates are connected in serial, which is the worst
case of critical path change.
Practical LSI circuits, contains much more logic gates and has more complex logic
structures. The consequences of critical path change might be more serious. Since
33
few designers are willing to exchange the performance of their chip for energy saving,
the critical path change becomes the most serious drawback and limits the practical
application of the proposed power gating method.
Fortunately, there is a trade-off between the critical path change and the power
gating effect. For the circuit in Figure 3-6, if the sleep signal (the output of gate 1),
is only assigned to control gate 3 and 4, then critical path becomes 1→ 3→ 4→ 5,
which is the same depth as the original critical path. Thus by losing one gate, we keep
the performance of the original circuit while saving considerable power dissipation.
The detailed and advanced trade-off mechanism is described in the following sec-
tions.
Delay Overhead Caused by the Switching of Sleep Transistors
The timing penalty caused by the wake-up delay of sleep transistors is an inevitable
issue of all power gating methods. In previous power gating approaches, the sleep
transistors is usually sized much larger than the transistors in the logic block in order
to rise the threshold voltage of sleep transistors to further reduce the leakage current
during sleep mode. By this, the transition delay of the sleep transistor also becomes
larger. Since the correct output can be obtained only after the sleep transistor fully
wakes up, the wake up delay becomes a noticeable penalty on circuit performance.
The controlling value (CV) based power gating, on the other hand, suffers little from
the switching delay of sleep transistors.
We show an experimental result to illustrate the influence of delay caused by the
switching of sleep transistor in the CV-based power gating method. In Table 3.1,
the propagation delay of the circuit in Figure 3-3 during mode switching is shown.
This table shows all cases where sleep transistor and output change. Other cases are
omitted because they do not include the change of output. By observing the “delay
overhead” column, Table 3.1 shows positive delay overhead at 7th and 8th rows. The
overhead is 0.04% - 0.06% which is small enough to be negligible. In other cases, the
results show that the circuit acts even faster than the original circuit.
The reason of the acceleration is that the output of power-off logic blocks are
34
Table 3.1: The propagation delay of the circuit shown in Figure3-3
Propagation Delay (ps)
Input Output Power Delay Sleep
in1 in2 in3 out3
Original
Gating Overhead transistor
000→111 1→0 138.2 61.02 -55.8% off → on
010→111 1→0 131.6 60.34 -54.1% off → on
100→111 1→0 131.1 57.85 -55.90% off → on
110→111 1→0 48.61 48.64 0.06% off → on
111→000 0→1 49.69 49.11 -1.16% on → off
111→010 0→1 50.19 50.09 -0.20% on → off
111→100 0→1 50.06 50.08 0.04% on → off
111→110 0→1 50.18 50.18 0.00% on → off
allowed to be floating, instead of forced to be 0 or 1 by using isolation cells. When
the logic block is activated again, the outputs of the logic block can charge (discharge)
to logic-1 (logic-0) from the floating voltage instead of charging (discharging) fully
from logic-0 (logic-1). Thus the time consumed from the activation to reaching the
post-sleep output value of the power-gated logic block is obviously shorter than that
of the original circuit. Even when the outputs of the power-off logic block are not
required to change after sleep, they can still recover to the pre-sleep value faster from
the floating voltage, which corresponds to the cases with very little delay overheads
in Table 3.1.
In the CV-based power gating method, if we handle the critical path change well,
the delay overhead can be inhibited since the delay overhead of sleep transistor is not a
serious issue. In this dissertation, a steady maximum depth constraint is introduced
at the algorithm level to prevent the critical path from increasing. Although the
energy saving is less than that without the constraint, the controlling value based
power gating method can still reduce considerable total power dissipation without
suffering the loss of performance.
35
3.3 Basic Algorithm and Efficiency Estimation of
Fine-Grain Power Gating
In this section, the issue of selecting sleep signals is addressed and a basic algorithm
is developed in order to estimate the efficiency of the proposed method. Although
this basic algorithm might not be optimum, it helps to formulating the reduction
of power dissipation by using the CV-based power gating method. The concept of
expected sleep gates is introduced in this section as the unit of measurement. Based
on this, several experiments are performed on ISCAS’85 benchmarks and the results
are shown. 15% - 36% of logic gates can be power-gated. Timing constraint is also
involved to check the gains and losses.
The important parameters which influence energy saving are discussed in the next
chapter for optimizing the energy saving.
To make the explanation more concise, the following terms are used.
sleep signal: A signal that controls a sleep transistor.
sleep gate: A logic gate that is power-gated.
footer/header transistor: nMOS/pMOS sleep transistor respectively
power domain: a power-gated logic block, all gates in it should be sleep
gates.
3.3.1 Selecting Sleep Signals and Sleep Gates
Since we are changing the structure of the original circuit when applying CV-based
power gating, the sleep signals and sleep gates must be carefully selected so that the
function of the circuit will not be influenced. In addition, we insist on a method with
no extra control logic, some logic gates could not be selected as sleep gates in some
cases. The efficiency of the CV-based power gating greatly depends on the selection
of sleep signal and gates. Generally speaking, a larger power domain produces more
energy saving when in sleep. At the same time, the frequency of being in sleep mode
is also a key parameter since more opportunity in sleep mode achieves more energy
36
saving.
In the sample circuit shown in Figure 3-1, signal B can be used to control the
logic block 1. Conversely, signal A can be used to control logic block 2. It is easy to
see that we can not apply both control at the same time, otherwise the circuit will
end up dysfunction because of the interlock. If both signal A and B are used as sleep
signals, and both of them becomes logic-0, the both sleep transistors are turned off
and both logic bocks are forced into sleep mode. The consequence is that both signal
A and B will become floating. Then the output signal S1 becomes a floating signal.
Finally, the floating output signal S1 might cause many other problems to other parts
of the circuit, like dysfunction and short circuit current.
Since adopting all input signals as sleep signals is impractical, it is necessary to
select only one of them as the sleep signal. In this case, we need to find out the one
that can save energy the most. The detailed sleep signal selection criteria are shown
in the next chapter.
Besides the selection of sleep signals, the selection of sleep gates is also an issue.
For example, the power-gating control of a logic gate with fanouts is easy. A logic
gate with fanouts is connected to more than one successor gates, the gate can be
power-gated only when its output is unused by all of its successor gates. Thus for
gating the fanout gates in CV-based power gating, all of its successor gates must have
at least one of the other inputs taking the corresponding control values. Figure 3-7
shows an example.
In Figure 3-7, gate 1 has fanout to gate 6, gate 2 and gate 3, therefore, gate 1
can only be power-gated only when all of its fanout FO1, FO2 and FO3 are blocked.
That requires the signal S1, S2 and S3 to take the controlling value, logic-0 in this
case, simultaneously. To ensure this, an OR gate must be inserted as is shown in
Figure 3-7, which is an extra logic element. We need one extra gate to control a gate.
That is not acceptable from the point of the power consumption.
Logic gates with fanout can not be power-gated without extra control logic. There-
fore, in this dissertation, logic gates with fanout will not be selected as sleep gates.
This sleep gate selection criterion based on the principle that no extra control logic
37
Figure 3-7: A possible way to handle gates with fanout.
will be involved in the controlling value based power gating method. On the other
hand, the efficiency of the CV-based power gating method decreases for circuits con-
taining a large amount of fanout gates since the number of possible sleep gates is
reduced.
In some cases, changing the structure of the original circuit can give more energy
saving. An example is shown in Figure 3-8. The left circuit shown in this figure
is the original circuit, in which signal e is selected as the sleep signal to control 4
gates, A, B, C and D. Since signal e is a primary input, we assume its probability
of taking logic-0 is 0.5. The effective energy saving is 0.5 × 4 = 2.0 gates. On the
right of the figure, the structure is modified without changing the logic function of
the original circuit. In this alternate circuit, both signal e and g can be used as the
sleep signal for C and for A, B respectively, as shown in Figure 3-8. Since signal g is
the output of two serially-connected AND gates C and E, the probability for signal
g to take the logic-0 is 0.875, hence the expected energy saving from sleep signal g
is 0.875× 2 = 1.75 gates. On the other hand, the expected energy saving from sleep
signal e is 0.5× 1 = 0.5 gates. The total expected energy saving is 1.75 + 0.5 = 2.25,
38
Figure 3-8: Improving energy saving by changing the structure of circuit.
which is larger than the original saving 2.0.
Although changing the structure of the circuit might result in better energy saving,
it is difficult to change the structure in the logic synthesis step of current LSI designs.
Besides, if the structure of the original circuit is changed, it is necessary to perform
major verification in order to check the correctness of functionality. In the basic
algorithm shown in this chapter, the mechanism of changing the structure of original
circuit is one of the future works.
3.3.2 A Basic Algorithm
Before introducing the basic algorithm, we have checked how many gates can be used
as sleep gates in a target circuit. We have implemented a program to search possible
sleep gates from higher depth to lower depth in the circuit. A backward traversal is
performed from each primary output to primary inputs. Gates are visited one by one
from higher depth. When a gate is visited, the connection of this gate is checked. If
the visited gate has a successor gate of AND/NAND/OR/NOR type and does not
39
Table 3.2: The maximum number of sleep gates in the benchmarks
ID Total gates Sleep gates Percentage
C432 252 182 72.2%
C499 454 300 66.1%
C880 435 274 63.0%
C1355 590 300 50.8%
C2670 1400 889 63.5%
C3540 1983 1387 70.0%
C5315 2973 2113 71.1%
C7552 4042 2686 66.5%
have fanouts, this gate is counted as a sleep gate.
In order to simplify the program, logic gates with more than two inputs are divided
into several 2-input gates. For example, a 3-input AND gates can be split into two
2-input AND gates. Based on this operation, the circuit is transfered into a circuit
with only 2-input gates, and we can apply traditional binary traversal algorithms.
Table 3.2 lists the results obtained by applying the program to several ISCAS’85
benchmark circuits and the number of possible sleep gates are shown.
From this table, we can see that the averagely there are more than 65% of gates
can be power-gated in those circuits. Considering the interlock situation mentioned in
3.3.1, not all candidate gates can be power-gated, but the results shows the potential
of the CV-based power gating method.
The basic algorithm avoiding the interlock situation is actually based on the pro-
gram shown above. It is based on the structure in Figure 3-1. The basic algorithm
traverses the circuit from higher depth to lower depth, in other words from primary
outputs to primary inputs, and selects the sleep signals based on the size of the power
domains they are capable of controlling. By adopting this procedure, the size of the
power domain can be maximum for tree circuits as shown in Figure 3-1. In general,
40
the logic gates with higher depth might to have input signals with higher probability
of taking the controlling value, while the number of logic gates that can be controlled
by each input also tends to be bigger as well.
Firstly, a backward traversal procedure is performed, which visits logic gates from
primary outputs to primary inputs in a preorder traversal fashion. Logic gates with
high depth are visited first. For a visited gate, check whether this gate is already
selected as a sleep gate or marked as visited. If so, we will pass this gate and visit its
predecessor gates based on the connection. Otherwise compute the number of sleep
gates that can be controlled by each of its input signal respectively. This computation
can be accomplished by using another backward traversal procedure. For each input
of the currently visited gate, the new backward traversal procedure recursively checks
all logic gates that are involved in generating the other input. The recursion will
stop when it reaches primary inputs or outputs of logic gates with fanout, we can
not enlarge the power gating block from these signals. After applying the procedure
to both input signals of the currently visited gate, the sizes of the power domains
controlled by these input signal are obtained. In order to avoid the interlock situation,
only one input could be used as the sleep signal. In this basic algorithm, we choose
the sleep signal based on the size of its power domain. The algorithm maximize the
number of sleep gates based on the backward traversal.
After the decision, the selected input is then marked as sleep signal and all the
gates that are controlled by this new-selected sleep signal are marked as sleep gates.
The procedure then marks the current gate as visited and continues to visit the
unvisited and unmarked gate with the highest depth. The procedure will end when it
traverses all the gates. A pseudo code of the basic algorithm is shown in Figure 3-10.
The figure also shows the recursive sub-function used to find the sleep gates.
A sample circuit to illustrate the basic flow of the algorithm is shown in Figure 3-
9. The maximum depth of the sample circuit is 4. Gate1, Gate2 and Gate4 are gates
with depth 1. Gate3 and Gate6 are at depth 2. Gate5 is at depth 3. Gate7, the gate
generates the primary output, is the only gate with depth 4. The backward traversal
procedure starts from Gate7 and the sizes of the possible power domains for signals
41
Figure 3-9: Application example of the basic algorithm.
a and b are computed respectively. For signal a, Gate4 and Gate6 are found. For
signal b, the possible sleep gates include Gate1, Gate3 and Gate5. Although Gate2
is related to both signals, it is not included as a potential sleep gate because it has
fanouts to both Gate3 and Gate6. According to the previous explanation, signal b is
selected as sleep signal since the number of gates controlled by signal b is larger than
that controlled by signal a. Therefore, the signal b is marked as sleep signal while
Gate1, Gate3 and Gate5 are marked as sleep gates, which means they will not be
checked in further procedure. After this, the Gate7 is marked as visited.
The basic algorithm then visits Gate6, which has the highest depth in the remain-
ing gates. By computing the number of potential sleep gates for both input signals,
we can see that the output of Gate2 can be used to power-gate Gate4, while the
output of Gate4 can not control any gate since Gate2 has fanout outputs. Thus the
second power domain is formed, which contains Gate4 controlled by the output of
Gate2. The next gate being visited is Gate2. Gate2 is only related to primary inputs
so no sleep gate could be found. By this, all gates are traversed, and the traversal
procedure of the basic algorithm stops. As a result, we obtained two power domains
and 4 sleep gates in total.
To estimate the power saving obtained after applying the basic algorithm, the
42
Basic algorithm:
01: Build BDD for the target circuit;
————– Compute CV probability for each signal ————
02: For all signals in the target circuit, do {
03: Compute CV Probabilities;
04: }
05: Starting from each primary output and backward traverse, {
06: if gate (i) is not checked, do {
07: find sleep gates(input 1(i), label 1);
08: find sleep gates(input 2(i), label 2);
09: if N(input 1) > N(input 2), do {
10: for each gate (j) with label 1, do {
11: sleep signal(j) ← input 2;
12: check(j) ← checked;
13: ENSG← ENSG+CV probability(input 2);
14: }
15: }
16: else, do {
10: for each gate (j) with label 2, do {
11: sleep signal(j) ← input 1;
12: check(j) ← checked;
13: ENSG← ENSG+CV probability(input 1);
14: }
15: }
16: Move to one of the input gates of gate i, repeat 6-15;
17: until all traversal branches hit primary input
18: then start again from another primary output.
19: }
End the basic algorithm
Sub-function: find sleep gates(id, label)
————– Use to find all controllable gates by tracing backward from a certain signal ————
1: if fanout(id) < 2 and id 6= checked and id 6= primary input, do {
2: label(id) ← label;
3: N(label) ++;
4: find sleep gate(input 1(id), label);
5: find sleep gate(input 2(id), label);
6: }
Figure 3-10: Pseudo code of the basic algorithm
43
probability of taking the controlling value (CV probability) for each sleep signal is
calculated. Based on the mechanism of CV-based power gating, the CV-probability
can be considered as the percentage of time for the power domain to be put into
sleep. Assuming the power dissipation of a sleep gate is completely reduced during
sleep time, we can approximately calculate the power saving obtained by CV-based
power gating method in terms of Expected Number of Sleep Gates (ENSG), which
means that if one sleep gate can be put into sleep for δ% of total runtime, then we
can see that averagely the power of δ × 0.01 gate is saved (ENSG = 0.01δ). In this
chapter, we use the term ENSG to represent the power saving.
In this dissertation, as is mentioned in Chapter 2, the CV-probability is computed
by using Binary Decision Diagram (BDD). For the circuit shown in Figure 3-9, the
BDD’s of both sleep signals, b and the output of Gate2, are built. According to the
calculation strategy introduced in Chapter 2, the CV-probability of signal b is 93.75%,
while the output ofGate2 has a CV-probability of 75.00%. Hence the ENSG obtained
by applying CV-based power gating is (0.9375×3)+(0.75×1) = 3.5625, which means
that averagely the power of 3.6 gates can be reduced, considering there are totally
7 gates in the circuit, we can see that around 50% of the total power is expected to
be saved in this circuit. Note that both logic-0 and 1 probabilities of primary inputs
are set to be 0.5 based on the assumption that full input patterns are applied to the
circuit. In reality, we can modify the probabilities of primary inputs according to
applications.
We have implemented the basic algorithm in C and checked on a personal com-
puter with Intel Core 2 CPU 2.4GHz and 2038MB memory on ISCAS’85 benchmarks.
The results are shown in Table 3.3. In the implementation, we have estimated ENSG
and the maximum depth of the circuit, where we consider the power gating sleep sig-
nal as a usual input signal of the controlled gates.
Table 3.3 shows the ENSG obtained by applying the basic algorithm to some of
the ISCAS’85 benchmark circuits. It can be observed that averagely 19% of total
energy is expected to be reduced by using the basic algorithm. It also shows that
the results of the CV-based method highly depends on the structure of the original
44
Table 3.3: Application results of the basic algorithm
Original Circuit CV-Based Power Gating with the Basic Algorithm
Circuit ID Total Gate No. Original Depth Sleep Gate No. Sleep Signal No. PG Depth ENSG
C432 252 31 166 54 42 49.9(19.8%)
C499 454 19 190 120 37 74.4(16.4%)
C880 435 30 213 80 45 72.2(16.6%)
C1355 590 26 190 120 44 74.4(12.6%)
C2670 1400 40 758 281 68 282.2(20.2%)
C3540 1983 56 1273 307 85 672.1(33.9%)
C5315 2973 52 1811 630 80 655.5(22.0%)
C6288 2416 124 480 480 155 51.6(2.10%)
C7552 4042 45 2163 995 70 691(17.1%)
circuit. Circuit C5315, for example, achieves more than 30% of ENSG while only
2.1% of ENSG is obtained for C2670. Note that this don’t mean CV-based method
is ineffective for C2670, by adjusting the sleep signal selecting strategy, the situation
can be significantly improved.
Comparing the depth before and after applying CV-based power gating method,
it is further confirmed that the delay overhead caused by the critical path change
is an essential problem in the proposed low power technique and must be handled.
Otherwise the averagely 56% of critical path increase shown in Table 3.3 will block
any LSI designer from using the CV-based power gating method in reality.
As is mentioned in the calculation of ENSG, the CV-probability also plays an
important part in the estimation of the CV-based power gating method. In some
cases, the energy saving of a very large power domain can be overwhelmed by a
much smaller power domain just because the CV-probability of the smaller domain
is much higher than the large one. Hence the balance of power domain size and the
CV-probability of sleep signal must be considered.
In addition, the order of visiting the logic gates is also an ad hoc manner. Some
signal with high power saving potential at lower depth might never be adopted just
because the logic gate generating this signal is previously marked as a sleep gate when
the algorithm visits some high-depth gate. Therefore, there should be no particular
order when we select the sleep signals, all signals should be given the opportunity of
45
Figure 3-11: Samples of AND tree circuit.
evaluation during the selection.
3.3.3 CV-Based Optimum Power Gating for AND/OR Tree
Circuits
AND/OR tree circuits for implementing multiple input AND/OR functions are widely
used in many applications such as equivalence checking and carry generation in carry-
look-ahead adders. There are many structures of implementing the multiple inputs
AND function. The structure of AND/OR tree circuits is friendly to the basic CV-
based power gating method. In this section, we show an optimum structure of AND
tree circuits to achieve high ENSG.
To implement an AND operation with (n+1) variables, n AND gates are required
if assuming all gates are 2-input gates. Two basic structures of implementing AND
function are shown in Figure 3-11.
To achieve the maximum ENSG in AND tree circuits, the circuit is divided into
two parts as is shown in Figure 3-12. One is used as the power domain and the other
is considered as a logic block that generating the sleep signal. In the figure, the upper
part is the controlled part including n − m − 1 gates and lower part is the control
part including m gates. Note that m gates corresponds to an AND function with
(m + 1) variables (inputs). For the control part, logic-0 is the controlling value and
46
Figure 3-12: The CV-based power gating on AND tree circuit.
the probability of taking logic-0 for an AND operation with m + 1 variables can be
calculated as follows:
P0 =
2m+1 − 1
2m+1
× 100% (3.4)
P0 is the CV-probability for this structure, The controlled part includes n −m − 1
logic gates, and the ENSG obtained from this structure is formalized as follows:
ENSG1 =
2m+1 − 1
2m+1
× (n−m− 1) (3.5)
Note that the control part generating the sleep signal is also an AND-tree circuit,
so we can apply the power gating recursively. More ENSG can be obtained by
recursively divide the sleep signal generation part. Hence we can formalize the total
ENSG as:
ENSG(n) = max{2
m+1 − 1
2m+1
× (n−m− 1) + ENSG(m)} (3.6)
where ENSG(n) represents the expected number of sleep gates for a circuit with n
AND gates. By varying the variable m from 0 to n − 1, the optimum result can
be achieved. We have written a simple exhaustive program to obtain the optimum
structure.
47
Table 3.4 shows the calculation results. As the number of AND gate increases,
the ENSG rises significantly. For an AND tree with 32 inputs, the power saving is
87% and for an AND tree circuit with 2000 gates, over 99% of ENSG is obtained.
Table 3.4: Controlling optimization for AND tree circuits.
Total ENSG coverage
Gates %
10 6.625 66%
16 12.25 76%
32 27.93 87%
48 43.844 91%
64 59.78 92%
100 95.697 95%
1000 995.380 99.5%
2000 1995.316 99.7%
3.4 Summary
In this chapter, the proposal of this dissertation, a controlling value based power
gating method, is introduced detailedly. The object of the proposed method is to
reduce the energy consumed by some circuitry whose outputs are temporarily un-
necessary. Basically, if one input of a logic gate takes the corresponding controlling
value, the output will be determined no matter what value other inputs are. In this
case, the power supply to the circuitry generating those inputs can be turned off in
a power-gating fashion by using the signal taking the controlling value as the sleep
signal.
In the CV-based power gating method, all sleep signals are extracted locally from
the original circuit, therefore no external assistance is required, such as power man-
agement unit, etc. which is essential in previous power gating methods. In addition,
since the signal taking the CV has already determined the output, the floating signals
of the power-off gates will not influence the other power-on gates, which makes the
CV-based power gating isolation-cell-free. Therefore, CV-based power gating method
is expected to be an area-effective low power technique. Furthermore, the mechanism
48
of the CV-based power gating allows the sleep gates to be shut down dynamically
during active mode. During sleep time, the decrease of the driving voltage due to
power-gating directly reduces the switching capability of the sleep gate. Therefore,
the dynamic power can also be reduced by using the proposed method.
The drawbacks of the CV-based power gating method is also discussed in this
chapter. In CV-based power gating, some local signals of the original circuit are
adopted as sleep signals and are connected to other parts of the circuit, which results
in structure change of the original circuit and usually causes significant critical path
increase. According to the experiments performed on benchmarks, averagely the
depth of the circuit increases by 50% after applying the proposed method, which
means the CV-based power gating may suffer from serious performance loss if the
critical path issue couldn’t be handled properly.
The efficiency of the CV-based power gating method highly depends on the se-
lection of sleep signals and sleep gates. In order to maintain the correctness of the
functionality, for each NAND/AND/OR/NOR gate, only one of its inputs can be
adopted as sleep signals. In addition, logic gates with multiple fanout outputs are
not counted as sleep gates since power-gating these gates requires extra control logic,
which should be avoided due to the area penalty.
In this chapter, a basic implement algorithm is introduced, which searches the
sleep signals and sleep gates according to their depth. Sleep signals are selected
based on the size of the power domain they are capable of controlling. The power
saving is estimated in terms of Expected Number of Sleep Gates (ENSG), which
indicates the average number of power-off gates during runtime. According to the
estimation results, approximately, the power of 20% of the total gates is expected
to be reduced. An application example is also shown in this chapter, in which the
CV-based power gating method is applied to AND tree circuits. Over 99% of ENGS
can be achieved by balancing the power domain size and the CV probability.
Although it is fairly easy to implement, the basic algorithm doesn’t achieve the
full power saving potential of the CV-based power gating method. Since the CV-
probability is also directly related to the overall power saving, it should also be con-
49
sidered when applying CV-based power gating method. Furthermore, it is confirmed
that, as is mentioned, the delay overhead is extremely large in the proposed method.
All these issues shall be further discussed and solved in the next chapter, which focuses
on optimizing the CV-based power gating method.
50
Chapter 4
Heuristic Optimization of
Expected Number of Sleep Gates
in Controlling Value Based Power
Gating
Major issue of CV-based power gating method is to optimize the sum of the product of
the controlling value probability and the number of controlled gates. Several heuristic
optimization methods of the CV-based power gating is fully described in this chapter.
In the optimization, we also consider the maximum depth issue mentioned in the basic
algorithm. In order to maintain the performance of the original circuit, a timing
constraint is added in the application of CV-based power gating, which prevents the
increasing of maximum depth. Hence how to achieve considerable power saving under
the timing constraint becomes the new object for further implementation algorithms.
After introducing the timing constraint, three new heuristic algorithms are de-
scribed, the N-based algorithm, the probability-first algorithm and the pN -based
algorithm. All of them explores a different aspect that influences power saving in
CV-based power gating method. Finally, by considering both power domain size and
CV-probability, the pN is shown to be able to reduce the total power dissipation
51
significantly more than other heuristics while maintaining the performance of the
original circuit.
By measuring the power reduction using HSPICE simulator, it is observed that
averagely 26% of total power dissipation can be achieved by using the pN -based
algorithm, which is approximately 10% more than the reduction obtained by other
heuristics. The trade-off between the power saving and the performance is addressed,
which indicates that the power saving can be significantly improved if a small amount
of delay overhead is tolerable.
4.1 Steady Maximum Depth Constraint
The delay overhead caused by the critical path increase is considered as the major
drawback of the basic CV-based power gating algorithm. As is briefly measured in
Chapter 3, the depth after applying the basic algorithm can be over 50% larger than
the depth of original circuit. In the worst case, all logic gates in the circuit might be
connected in series though sleep signals and sleep transistors as shown in Figure 3-1.
After applying the CV-based power gating method, the critical path of the circuit
becomes Gate2 → Gate4 → Gate6 → Gate1 → Gate3 → Gate5 → Gate7, instead
of just Gate1 → Gate3 → Gate5 → Gate7 in the original circuit.
Since few LSI designers accept to sacrifice the performance of their chip to obtain
energy saving, the delay overhead must be handled effectively in order to make the
CV-based power gating method more practical. Since the source of the critical path is
the insertion of sleep transistors and the connection of sleep signals, a simple solution
for the critical path increase is to limit the selection of sleep signals. If the critical
path becomes larger by adding the power gating signal, then we should not accept
the control.
As is known, the length of critical path is usually equal or shorter than that of
maximum depth for a given circuit since the longest path of a circuit, which produces
the maximum depth, might not be activated. Therefore, if it is possible to prevent
the maximum depth of the original circuit from increasing, the critical path change
52
Figure 4-1: Calculating the length of the new path when assigning signal i to control
the gate generating signal j.
can also be restrained.
In this section, a timing constraint is introduced to prevent the maximum depth
increase during the application the CV-based power gating technique, so that the
performance of the original circuit can be kept the same. This timing constraint
is named as Steady Maximum Depth Constraint and will be referred as SMDC in
the future discussion. The basic idea of the SMDC is to compute the length of the
possible new path before assigning a sleep signal to control a sleep gate, and then
compare the length of the new path with the maximum depth of the original circuit.
If the new path is longer than the original maximum depth, then this assignment will
not be applied.
To implement the SMDC, depth from primary inputs (DPI) and depth from the
primary outputs (DPO) are computed as stored as a basic information for each signal
in the original circuit. During the application of power gating, if a signal i is about to
be assigned to control gate j, the length of the new path is then calculated as follows
and illustrated in Figure 4-1:
Dnew = DPI(i) +DPO(j) + 1 (4.1)
where Dnew is the length of the new path formed by this power gating assignment.
Let DMAX be the maximum depth of the original circuit. If Dnew ≤ DMAX , then it
is safe to apply this power gating. If Dnew > DMAX , this power gating is not applied
53
so that the critical path will not be enlarged. By applying this checking every time
before a power gating control is applied, the length of the maximum depth can be
kept the same with that of the original circuit.
In addition, although using SMDC could maintain the maximum depth un-
changed, some shorter paths of the circuit can still increase due to the power gating
assignment. Therefore, the depth information of all gates related to the newly as-
signed sleep signal and the sleep gate should be recomputed before we check the next
assignment. As a result, the depth information is updated every time a new power
gating control is applied.
Based on the above discussion, it is obvious that the delay overhead issue caused
by the critical path change is handled in a trade-off style with respect to the expected
number of sleep gates. The maximum depth is maintained at the cost of losing the
opportunities of power-gating certain sleep gates.
The effect of the SMDC will be shown in the following sections, together with
the evaluation of heuristic optimization algorithms. Since the number of sleep gates
might be greatly reduced under the limitation of SMDC, the capability of finding
more sleep gates while keeping SMDC is an important criterion when evaluating the
heuristic optimization algorithms.
4.2 Important Factors in CV-based Power Gating
In order to develop effective algorithms, the factors that influence the power saving
should be analyzed. Generally speaking, to estimate the amount of power saving
obtained by a fine-grained power gating method, four factors can be considered:
(1) the power saving effect of each sleep gate.
(2) the overall number of sleep gates.
(3) the average sleep duration of sleep event for each power domain.
(4) the frequency of sleep events for each power domain.
In CV-based power gating method, we assume that a built-in sleep transistor is in-
serted in each sleep gate, hence every sleep gate can be considered as an individual
54
cell who has the independent virtual power lines. Therefore, it is assumed that the
power saving effect of each sleep gate in a given period of sleep duration is the same.
The power saving of different types of sleep gates might not be identical, but it is
not included in the sleep signal selection algorithms. The issue of power saving effect
of each sleep gate will be addressed in Chapter 5, where the sleep transistor sizing
problem is discussed.
In addition, for each power domain, the sleep duration is actually the interval
when the sleep signal takes CV. Since the sleep signals are extracted from the original
circuit, the exact sleep duration for each sleep event varies largely depending on input
vectors, hence the sleep duration is nearly unpredictable. For large circuits, it is
infeasible to check all possible input patterns. Hence unless the input sequences have
certain orders, it is difficult to measure the sleep duration in CV-based power gating
method.
In this section, we focus on the remaining two factors when building the heuristic
algorithms, the overall number of sleep gates and the frequency of sleep events. For
a CV-based power gating circuit, the total power reduction can be estimated using
the following formula:
Ptotal reduction = Σ
ndomain
i=1 P0 ×Ni × pi (4.2)
where ndomain is the number of power domains, P0 is the power reduction per unit
time obtained from one power-off gate, Ni represents the total number of sleep gates
in power domain i and pi is the CV-probability of the sleep signal that controls power
domain i. pi corresponds to the frequency of sleep events of power domain i. As is
mentioned above, P0 is assumed to be the same for all types of sleep gates. Hence
in this dissertation, achieving the optimum case of CV-based power gating control is
defined as a problem of maximizing the value of Equation 4.3.
Σndomaini=1 Ni × pi (4.3)
In CV-based power gating method, a gate might have several candidates for its
55
Figure 4-2: Counterexample of optimizing power reduction by adopting the best sleep
signal for each gate.
sleep signal. An example is shown in Figure 4-2, where Gate2 can be controlled by
two signals, s1 and s2. Each of them has a different CV-probability, and there is an
optimum sleep signal for each potential sleep gate. According to Equation (4.3), the
sleep signal candidate with the highest CV-probability will always result in the best
power reduction. Hence finding the optimum sleep signal for each sleep gate seems to
be the best approach. However, focusing on the power saving of sleep gate individually
will, in many cases, fail at obtaining the overall maximum power reduction. The CV-
based power gating application on the circuit shown in Figure 4-2 describes such an
example, in which using the optimum sleep signal for each sleep gate does not produce
the best overall power reduction. By observing the original circuit shown on the top
of the figure, Gate1, Gate2, Gate3 and Gate4 can be power gated. For Gate1, signal
56
s1 is the only choice. For Gate2, s2 is chosen, since s2 has a higher CV-probability of
0.8 compared to that of s1. For Gate3, 3 signals can be used to control this gate, s3
wins by having the highest CV-probability (0.9) by comparing their CV-probabilities.
Since s3 is the output of Gate4, Gate4 can not be power-gated anymore. If Gate4 is
power-off, then its floating output signal s3 will lead to the dysfunction of Gate3 and
cause short-circuit current. In CV-based power gating method, the gates generating
sleep signals can not be power-gated, since it is responsible for keeping the sleep signal
clear and stable. Thus, the final power gating structure becomes the circuit shown in
Figure 4-2(b), whose ENSG is 1× 0.7 + 1× 0.8 + 1× 0.9 = 2.4.
on the other hand, if we use the structure as shown in Figure 4-2(a), then the
ENSG is 4 × 0.7 = 2.8, which is better than that of the structure in Figure 4-2(b).
Note that in this assignment, s1 is the worst candidate for Gate3. Actually, the (b)
it not the optimum power gating structure of this circuit. By using s1 as the power-
gating sleep signal of Gate1 and using s2 as the sleep signal of Gate2, Gate3 and
Gate4, the resulting ENSG is 1×0.7+3×0.8 = 3.1, which is higher than the results
of both (a) and (b).
This example shows that the sleep signal selection might affect the further appli-
cation, and searching for the optimal power domain clustering is more important then
finding the best sleep signal for each gate. Accordingly, the task of the optimal algo-
rithm would be, finding out all possible cases of power domain clustering and choosing
the one that produces the most power saving. The circuit shown in Figure 4-2 has
only three possible ways to cluster the power domains and is easy to enumerate all
cases. The enumeration, however, is not so easy for large circuits. Therefore, heuristic
optimization algorithms are necessary to cope with the complexity.
Based on the above discussion, two factors are mainly considered in the heuristic
power optimization algorithms shown in the next section. One is the Number of sleep
gates in a power domain (N) and the other is the CV-probability of the corresponding
sleep signal (p). Both factors are directly related to the overall power reduction
according to Equation (4.2). By focusing on either of these two factors, different
power domain clustering would be obtained. Since the efficiency of CV-based power
57
gating method highly depends on the structure, the importance of the two factors is
hard to decide. It is necessary to develop individual heuristics focusing on either one
of these two factors and check their efficiency on various circuits.
4.3 Heuristic Algorithms
In this section, three heuristic algorithms for CV-based power gating implementation
are introduced, including a power-domain-size-based (N-based) algorithm, probability-
first algorithm and a pN value based heuristic algorithm. Each of them select the
sleep signals and sleep gates based on a different aspect that influences the energy
saving obtained by CV-based power gating. The efficiency of all algorithms are eval-
uated in terms of ENSG (Expected Number of Sleep Gates) and compared with
each other. Since the efficiency of the CV-based power gating highly depends on the
structure of the circuit, both algorithms have their own advantages and drawbacks.
Note that all the results shown in this section are obtained under the SMDC
(Steady Maximum Depth Constraint), which means the expected power saving is
achieved without sacrificing the performance of the original circuit. In addition, the
trade-off characteristic of the steady maximum depth constraint is checked by setting
some delay overhead tolerance on the constraint.
Even with SMDC, all three algorithms managed to achieve, at least, 15% of av-
erage power energy saving on ISCAS’85 benchmark circuits. Over 37% of ENSG is
observed on some circuit. Not only by ENSG-based estimation, the benchmark cir-
cuits are also implemented by using 0.18µm process technology and HSpice simulation
is performed.
The basic algorithm described in Chapter 3 is the simplest CV-based power gating
algorithm, which selects the sleep signal based on the size of power domains they can
control. Hence it can also be considered as a heuristic focusing on N, in fact the basic
algorithm is, indeed, effective at finding the maximum number of sleep gates. The
problem of this algorithm is that the handling of logic gates with multiple fanout is
relatively sweeping since in some cases, the logic gates with fanout can be power-
58
Figure 4-3: An example of controlling logic gates with fanout outputs.
gated without causing any problem. As is shown in Figure 4-3, both gates fed by the
output signal s4 of Gate4 are controlled by the same sleep signal s1. In this case, as
long as Gate4 is put into sleep at the same time with Gate2 and Gate3, no electrical
problems will be caused.
Besides, the visiting order of the basic algorithm is also inappropriate since all
signals in the target circuit should be given opportunities to be evaluated together
regardless of their depth.
4.3.1 N-based Heuristic Algorithm
The N-based heuristic algorithm proposed in this chapter is an improvement of the
basic algorithm. In stead of visiting and checking the signals depth-wise, the N-based
algorithm searches all power domains with respect to sleep signal candidates and the
sleep signals are selected based on the sizes of their power domains. The signal that
can control the largest number of logic gates is selected first as a sleep signal and
the logic gates controlled by this sleep signal is clustered as the first power domain.
The gate generating the sleep signal and all sleep gates are then marked and will not
be checked in future procedure. Then for the rest of the circuit, execute the same
59
N-based algorithm:
01: Build BDD for the target circuit;
————– Compute N and p for each signal ————
02: For all primary output signals, do {
03: For primary output i do {
04: Traversal(i);
05: }
06: }
07: For all signals in the target circuit, do {
08: ComputeP(i);
09: }
————- Search for sleep signals based on their N and form power domains ——-
10: For all unmarked signals, do {
11: if N(j) is the largest and N(j) 6=0, do {
12: Mark(j);
13: ENSG ← ENSG+N(j)× cv − prob(j);
14: Update Depth();
15: UpdateN();
16: }
17: else if N(j) is the largest and N(j)=0, do {
18: quit loop;
19: }
20: }
End N-based algorithm
Sub-function: UpdateN()
——–Used to recompute N for each signal after a power domain is clustered——
01: For all primary outputs, do {
02: For primary output signal i, do {
03: Traversal(i);
04: }
05: }
Figure 4-4: Pseudo code of N-based heuristic algorithm
60
process until no signal is available as a sleep signal.
To accomplish this procedure, it is required that for every signal of the target
circuit, the number of controllable logic gate is computed beforehand. In the N-
based algorithm, the number of controllable gates is calculated by using the same
backward tracing procedure in the basic algorithm with some modifications. One of
most important modification is the addition of fanout checking mechanism, which
significantly increases the overall number of sleep gates. The pseudo codes of the
N-based heuristic algorithm and some of the sub-functions are shown in Figure 4-4,
Figure 4-5 and Figure 4-7 respectively.
As is shown in Figure 4-4, firstly, the Binary Decision Diagram of the target circuit
is built for further calculation of probabilities. Then for each signal in the circuit, N
(the number of controllable gates) and p (the 0 and 1 probability) are computed by
using sub-functions “Traversal()” and “ComputeP()” respectively. Note that during
the computation of N, the Steady Maximum Depth Constraint (SMDC) is applied,
which means that the logic gates are counted only when the new paths created by
power-gating do increase the maximum depth. Up to here, all the information re-
quired by N-based algorithm are prepared.
The next step is to select the sleep signals and form power domains. In the N-
based algorithm, the visiting order is decided based on N for each sleep signal. Note
that the signals generated by logic gates that are already marked as sleep gates can
not be used as sleep signals, thus will not be considered in the N-comparison during
the search. The search is performed repeatedly until there is no signal available as
a sleep signal. For each searching round, the signal with the largest N among all
the unmarked signals is selected as a sleep signal, and the logic gates that can be
controlled by the signal are then clustered as a power domain and the sleep signal
is assigned to each of them. This task is accomplished by a sub-function referred
as “Mark()”, in which the clustered sleep gates and the gate generating the newly
selected sleep signal are also marked so that they will not be touched in further
procedures.
After each power domain is set, some aftercare works is necessary since new power
61
domain could affect the unmarked gates. For example, the N of each unmarked
control signal candidate might change since the already-marked gates can not be
controlled by other sleep signals anymore. For the remaining unmarked signals, the
newly-marked sleep gates should be subtracted from their controllable list. The N
of those signals decreases. The sub-function “UpdateN()” in the N-based algorithm
does the job.
The depth of some of the unmarked signals may also be changed by the new
clustering. As is mentioned before, the lengths of certain paths still increase after
some gates on these paths are controlled even though not larger than the maximum
depth of the original circuit. Take the structure in Figure 4-1 as an example. For
all unmarked gates which are involved to generate the newly-assigned sleep signal
i, if DPO(j) + 1 is larger than DPO(i), the DPO’s of those unmarked gates might,
correspondingly, be increased. On the other hand, for all unmarked gates which are
directly or indirectly generated by the newly-selected sleep gate j, if DPI(i) + 1 is
larger than DPI(j), the DPI ’s of those unmarked gates should be enlarged. In those
cases, the depth information of the related unmarked gates must be recomputed before
the update of N since the depth information is required during the computation of N
due to the steady maximum depth constraint. The sub-function “Update Depth()”
is developed to perform this re-computation. To estimate the power saving potential,
the ENSG is also calculated during the procedure.
Figure 4-5 shows the “Traversal” sub-function, for computing the number of con-
trollable gates (N). During the parsing of an input circuit file, the gates with more
than two inputs are divided into 2-input gates, and the circuit is then converted to
a structure with only 2-input gates. Therefore, in the Traversal function, a preorder
traversal approach can be used to visit signals related to a certain primary output.
In usual circuits, signals are related to multiple primary outputs, so a “visited”
flag is introduced to avoid recurring visit. For a visited gate id which denotes its
output signal, the Traversal procedure computes the number of controllable gates for
each of its inputs respectively using a sub-function “Count()”, which is also shown
in Figure 4-5. The “Count()” function takes two signals as parameters, one is the
62
Sub-Function: Traversal(id):
——— Used to compute N for each signal visited ————
01: i = input 1 (id); j = input 2 (id);
02: if signal i 6=primary input and marked(i) = 0 and visited(i) = 0, do {
03: if fanout i == 0, do {
04: Count(i, j);
05: }
06: }
07: if signal j 6=primary input and marked(j) = 0 and visited(j) = 0, do {
08: if fanout j = 0, do {
09: Count(j, i);
10: }
11: }
12: visited(id) ← 1;
13: if signal i = primary input and j = primary input, do {
14: return;
15: }
16: else, do {
17: Traversal(i);
18: Traversal(j);
19: }
Sub-function: Count (id, Sleep signal)
———Used to trace backwardly from id and find gates that can be controlled by Sleep signal
01: while signal id meets the following requirement:
02: a–marked(id) = 0;
03: b–id 6= primary input;
04: c–all fanout gates of id are also controllable by Sleep signal
05: then, do {
06: if DPI(Sleep signal) +DPO(id) < DMAX , do {
07: N(Sleep signal) ← N(Sleep signal) + 1;
08: add Sleep signal to candidate list(id);
09: Count(input 1 (id), Sleep signal);
10: Count(input 2 (id), Sleep signal);
11: }
12: }
Figure 4-5: Pseudo code of sub-functions used in N-based heuristic algorithm
63
current gate and the other is one of its input. From the input signal, the controllable
gates are traced. For example, Count(id, j) is for computing the N for signal j, which
is one of the input signals of gate id. “Count()” function also uses a preorder traversal
method to check the controllable gates for signal j. Note that we should start from
the other input of gate id for computing the controllable block for j. For each signal
visited by “Count()” function, the following three conditions should be met for the
visited gate to be controlled by signal j:
(1) the gate should not be marked, in other words, the gate shouldn’t have
already been selected as a sleep gate for other signals.
(2) Primary inputs are also managed as a kind of special gates, “Count()”
stops the traversal for primary inputs.
(3) if the visited gate has multiple fanout, then all of its fanout successor gates
are checked. As is mentioned in Figure 4-3, if all fanout gates are also counted as
controllable gates to the candidate sleep signal j, then the visited gate with fanout is
also qualified as a controllable gate of signal j.
If the visited gate passed the qualification, then the Steady Maximum Depth
Constraint (SMDC) is checked. The length of the possible new path formed by
power-gating the visited gate with signal j is computed and compared with the max-
imum depth of the original circuit. If the SMDC is not violated, the visited gate can
be counted as a controllable gate for signal j. For the convenience of future process,
a candidate list is set up for each gate, if the visited gate can be controlled by signal
j, then j is added into the candidate list. The “Count()” then continue to visit the
input gates of the currently visited gate in order to check whether the input gates
can also be controlled by signal j. Hence the “Count()” sub-function is a recursive
procedure, it stops when the visited gates does not meet the above three conditions.
“Count()” will found out all gates that are controllable by signal j.
By applying the above procedure to both inputs, the number of controllable gates
can be obtained for both input signals of the gate id visited by “Traversal()” function.
The “Traversal()” then marks the current gate id as visited and moves to visit the
input gates of the gate id. The “Traversal()” procedure stops when it reaches the
64
Figure 4-6: Multiple power domains can be controlled by one signal.
primary inputs. Therefore, by performing the “Traversal()” procedure one by one
from all primary outputs, all gates in the target circuit can be visited and the N
values can be obtained for every signal, including the primary inputs, which are also
possible of being sleep signals.
Note that one sleep signal can control multiple power domains in the CV-based
power gating, as shown in Figure 4-6. A signal with multiple fanouts can form a
power domain for every fanout. In this case, the number of controllable gates from
each fanout should be accumulated together as the N of this fanout signal.
The probabilities of taking logic-0 and logic-1 are computed by a sub-function
referred as “ComputeP()”, the detailed procedure of which is shown in Figure 4-7.
As is mentioned before, the logic-0 and logic-1 probability of primary inputs can be
specified. In this section, the logic-1 and logic-0 probabilities of primary inputs are
both 0.5. Note that the logic-0 and logic-1 probabilities of constant-0-node are 1.0
and 0.0, respectively. The logic-0 and logic-1 probabilities of constant-1-node are 0.0
and 1.0, respectively.
The “ComputeP()” function is a recursive procedure. According to Equation
(2.2) and (2.3), the function calculates the probabilities of a certain signal using the
corresponding probabilities of its left and right children of the BDD representation
65
Sub-function: ComputeP(id)
———Used to compute the CV-probability for every signal in the circuit ——
***The 0 and 1 probability of constant-0-node is set to be 1.0 and 0.0 beforehand.
***The 0 and 1 probability of constant-1-node is set to be 0.0 and 1.0 beforehand.
———Recursively compute probability for each signal ——
01: if node(id) = constant-0-node or constant-1-node, do {
02: return;
03: }
04: else, do {
05: if left child(id) hasn’t been computed yet, do {
06: ComputeP(left child(id));
07: }
08: if right child(id) hasn’t been computed yet, do {
09: ComputeP(right child(id));
10: }
11: Prob0(id)← 12 × (Prob0(left child) + Prob0(right child));
11: Prob1(id)← 12 × (Prob1(left child) + Prob1(right child));
12: }
Sub-function: Mark(id)
———Used to assign sleep signal and mark sleep gates ——
01: Search in the candidate list of every gates, do {
02: if id is a candidate for gate i, do {
03: sleep signal(i) ← id;
04: marked(i) ← 1;
05: visited(i) ← 1;
06: }
07: }
Figure 4-7: Pseudo code of sub-functions used in N-based heuristic algorithm
66
of the corresponding logic function. Hence the procedure recursively traces the lower
level children related to the current BDD node until it comes to constant nodes. Since
the probabilities of constant nodes are given beforehand, the probabilities of the nodes
only related to constant nodes can be calculated. The “ComputeP()” procedure then
uses these newly-computed probabilities to calculate the higher level nodes related to
the newly-handled nodes, and so on. Finally, the logic-0 and logic-1 probabilities of
the firstly required node can be obtained.
Although not mentioned in the N-based algorithm shown in Figure 4-4, a check-
ing mechanism is used before the calculation of ENSG to examine the type of the
successor gate of the newly-selected sleep signal after each sleep signal is selected
and a corresponding power domain is formed. If the successor gate is a NAND/AND
gate, the logic-0 probability of the sleep signal is clearly the CV-probability, while if
the successor gate is a NOR/OR gate, logic-1 probability is the CV-probability. The
chosen probability is then used to compute the ENSG together with the N of sleep
signal. In addition, not only for the estimation of power saving, for real circuit design,
this mechanism is also essential due to the demand of sleep transistor type decision.
For NAND/AND gates, the nMOS footer sleep transistors should be adopted while
for NOR/OR gates, pMOS header sleep transistors are used.
A sub-function named “Mark()” is shown in Figure 4-7, which is used to mark
all the sleep gates after a sleep signal is selected. Using the candidate list generated
during the process of “Count”, the “Mark” function searches for all gates which
have the newly-selected sleep signal as a controlling candidate. These gates are then
marked as sleep gates and the new sleep signal is assigned to each of them to control
their sleep transistors. In addition, the gates in the new power domain are further
marked as visited so that they shall not be visited by the algorithm for further search
of sleep gates and signals.
The N-based algorithm is more complex than the basic algorithm, and much better
results are expected with the new handling strategy for fanout gates and the non-
ordering open searching structure. The steady maximum depth constraint is included
in the N-based algorithm, so the maximum depth is the same as that of the original
67
Table 4.1: The estimated power saving results obtained by using N-based algorithm
(with steady maximum depth constraint)
Circuit Info. N -based algorithm
Name Gate Number Original Depth No. of Sleep No. of ENSG Depth Run
Gates Sleep Signals with PG time
C432 252 31 72 34 43.1 (17.1%) 31 0.5s
C499 454 19 48 40 20.0 (4.4%) 19 0.8s
C880 435 30 194 58 60.7 (13.9%) 30 0.4s
C1355 590 26 48 40 20.0 (3.4%) 26 0.6s
C1908 1057 44 527 62 246.7 (23.3%) 44 1.3s
C2670 1400 39 821 139 355.0 (25.4%) 40 1.3s
C3540 1983 56 1162 196 671.5 (33.9%) 56 5.3s
C5315 2973 52 1137 248 445.4 (15.0%) 52 3.2s
C7552 4042 45 1987 443 835.9 (20.7%) 46 7.1s
circuit. By this constraint, the efficiency of the ENSG might be affected.
We have implemented the N-based algorithm in C program and performed the
algorithm on a personal computer (Intel Core 2 CPU 2.4GHz, 2038MB memory,
Ubuntu Linux 9.04 Operating System)
Table 4.1 shows the power saving obtained by using the N-based algorithm in
terms of Expected Number of Sleep Gates (ENSG). Even with Steady Maximum
Depth Constraint (SMDC), the N-based algorithm can still achieve considerable
power savings, averagely 17.4% of ENSG. Note that ENSG corresponds to the
total power reduction and no delay overhead caused by the critical path increase is
observed. As is predicted, the results varies largely on circuits. For C3540, over 33%
of ENSG is achieved while for C499 and C1355, the ENSG hardly reaches 4%. The
discussion about power saving varying with circuit structure will be addressed at the
end of this chapter, after all heuristic algorithms are shown.
Comparing to the basic algorithm shown in the last chapter, the average ENSG
decreases from 19% to 17%, because of the application of the SMDC. However, con-
sidering the 50% performance loss observed in the basic algorithm, the 2% reduction
is acceptable. The N-based algorithm seems to be the one that suffers the least from
the SMDC, since the loss of sleep gates caused by the addition of SMDC is small.
68
The N-based algorithm is an improved version of the basic algorithm, which fo-
cuses on find the maximum number of sleep gates. The sleep signal selection is based
on the number of gates in each power domain. The N-based algorithm is simple but
a re-convergent fanout manipulation mechanism is included. Although the N-based
algorithm might not be able to achieve the highest power saving due to the limitation
of only focusing on one of the aspects of power saving, it is still effective for large cir-
cuits where the complex algorithms become too time consuming. Many sub-functions
of N-based algorithm are used in heuristic algorithms introduced later
4.3.2 Probability-First Heuristic Algorithm
The probability-first algorithm focuses on the CV-probability, which is one of the
important facts that influence the power saving in CV-based power gating method. In
several circuits, the signals controlling large power domains might have extremely low
CV-probabilities, which make their power saving even lower. As shown in Equation
(4.2), CV-probability corresponds to the percentage of time which the power domains
spent in power-off state. Therefore, the higher the CV-probability is more power can
be reduced for a power domain of a given size.
Similar with the N-based algorithm, the probability-first algorithm searches in
the whole target circuit for proper sleep signals. In the N-based algorithm, the only
required information to accomplish the sleep signal selection is the number of con-
trollable gates (N) in a power domain. The probabilities of taking logic-0 and logic
-1 are also calculated together with N before the search, they are used only for
the ENSG computation. The probability-first algorithm, on the other hand, uses
the CV-probability of every signal in the target circuit as the key in sleep signal
searching. The probability-first algorithm selects the sleep signals based on their
CV-probabilities, from larger ones to smaller ones.
The pseudo code of the probability-first algorithm is shown in Figure 4-8. Firstly,
the previously introduced “Traversal()” function is performed for the generation of
sleep signal candidate list, where the checking of qualifications and SMDC have been
done. The information N, which is not required in probability-first algorithm, is also
69
Probability-first algorithm:
01: Build BDD for the target circuit;
————– Compute N and p for each signal ————
02: For all primary output signals, do {
03: For primary output i do {
04: Traversal(i);
05: }
06: }
07: For all signals in the target circuit, do {
08: ComputeP(i);
09: CVprob(i);
10: }
——– Search for sleep signals based on their CV-probability and form power domains —–
11: For all unmarked signals, do {
12: if cv-prob(j) is the largest, do {
13: Mark(j);
14: **ENSG ← ENSG+N(j)× cv − prob(j);
15: Update Depth;
16: **UpdateN;
17: }
18: if all unmaked gates have empty candidate list, do {
19: quite loop;
20: }
21: }
End probability-first algorithm
Figure 4-8: Pseudo code of probability-first heuristic algorithm
calculated for the convenience of power saving estimation. The logic-0 and logic-1
probability of every signal in the target circuit is then calculated by using the BDD
generated beforehand.
A sub-function named “CVprob()” is performed after the probability calculation
in order to determine whether the logic-1 probability or the logic-0 probability should
be used as the CV-probability for each signal. Up to here, the information needed to
execute the probability-first algorithm is prepared.
As the next step, the probability-first algorithm checks all unmarked signals in
the target circuit and the signal with the highest CV-probability will be selected.
The gates controlled by the selected signal are then clustered as a power domain and
70
Sub-function: CVprob(id)
————- Used to decide which probability is the controlling value probability for signal i ——-
01: if signal id is a primary input, do {
02: cv-prob(id) ← 0.5;
03: }
04: else if signal id has only one fanout, do {
05: j ← the successor gate of id;
06: if j is an AND/NAND gate, do {
07: cv-prob(id) ← Prob0(id);
08: }
09: else if j is an OR/NOR gate, do {
10: cv-prob(id) ← Prob1(id);
11: }
12: else if signal id has multiple fanout, do {
13: for all successor gates of id, do {
14: j ← one of the successor gates of id;
15: if j is an AND/NAND gate, do {
16: cv-prob(id)←cv-prob(id)>Prob 0(id)?cv-prob(id):Prob 0(id);
17: }
18: else if j is an OR/NOR gate, do {
19: cv-prob(id)←cv-prob(id)>Prob 1(id)?cv-prob(id):Prob 1(id);
20: }
21: }
Figure 4-9: Pseudo code of the sub-function CV prob used in probability-first heuristic
algorithm
the newly-selected signal is assigned to all of them as their sleep signal. Same as
the N-based algorithm, these sleep gates and the gate generating the sleep signal are
marked after the clustering so that they are not checked during further process. After
forming the power domain, the depth information is updated and algorithm moves
to the next searching round. The probability-first algorithm ends when there is no
signal available as sleep signal.
Different from the N-based algorithm, updating the information of N is not re-
quired in the probability-first algorithm because the N information is not used during
the search of sleep signals. The ENSG calculation part and the “UpdateN()” is op-
tional and marked with “**”. The N should be modified before the final estimation.
71
Table 4.2: The estimation results of probability-first algorithms (with steady maxi-
mum depth constraint)
Circuit Info. Probability-first algorithm
Name Gates Original Depth No. of Sleep No. of Sleep ENSG Depth Run time
Gates Signals with PG
C432 252 31 70 35 46.6 (18.5%) 31 0.9s
C499 454 19 24 24 14.0 (3.1%) 19 1.5s
C880 435 30 114 53 59.6 (13.7%) 30 0.8s
C1355 590 26 24 24 14.0 (2.4%) 26 1.6s
C1908 1057 44 231 140 166.3 (15.7%) 44 2.0s
C2670 1400 39 365 187 194.5 (13.9%) 40 3.5s
C3540 1983 56 957 204 735.3 (37.1%) 56 10.8s
C5315 2973 52 163 132 126.6 (4.3%) 52 16.1s
C7552 4042 45 865 422 554.2 (13.7%) 46 12.1s
The sub-function “CVprob()” is described in Figure 4-9. In CV-based power
gating method, whether the logic-0 or logic-1 should be selected as the CV-probability
completely depends on the type of the successor gates. In the “CVprob” function,
the signals are checked as follows. If the checked signal is a primary input, the CV-
probability is 0.5 no matter which type its successor gates are. Note that both logic-0
and logic-1 probability is assumed to be 0.5. For other signals with only one fanout,
if the successor gate is an AND/NAND gate, then the CV-probability should be the
probability of taking logic-0. While if the successor gate is an OR/NOR type, the
logic-1 probability should be used as the CV-probability. For signals with fanouts
to multiple successor gates, we check each fanout and its successor gate and find
the highest controlling value probability of all fanouts. This is a simplified solution
since if the fanout signal is capable of power-gating one power domain for each of
its fanout. The effective CV-probability might be higher than the CV-probability
from any of its fanout, however, without considering the number of controllable gates
from each fanout, the effective CV-probability is difficult to obtain due to the lack of
information for the weight of each fanout. This problem of probability-first algorithm
will be well handled by the next heuristic algorithm.
We have also implemented the N-based algorithm in C program and performed
72
Table 4.3: The estimation result of probability-first algorithm without steady maxi-
mum depth constraint.
Circuit Information Probability-First Algorithm
ID Gate Number Original Depth No. of Sleep No. of Sleep Depth ENSG Run time
Gates Signals with PG
C432 252 31 132 83 53 77.3(30.6%) 0.391
C499 454 19 184 126 35 97.2(21.4%) 0.792
C880 435 30 168 129 43 104.0(23.9%) 0.570
C1355 590 26 184 126 42 97.2(16.5%) 0.947
C2670 1400 40 687 440 66 470.4(33.6%) 2.365
C3540 1983 56 1129 624 73 837.7(42.2%) 4.140
C5315 2973 52 1612 1045 71 1058.2(35.6%) 10.215
C6288 2416 124 465 465 154 417.5(17.3%) 361.25
C7552 4042 45 2047 1459 61 1270.5(31.4%) 10.444
the algorithm on a personal computer (Intel Core 2 CPU 2.4GHz, 2038MB memory,
Ubuntu Linux 9.04 Operating System).
The probability-first algorithm is also applied to the ISCAS’85 benchmark circuits
and the estimation results are shown in Table 4.2. Averagely 13.6% of overall power
saving is obtained by applying the probability-first algorithm. Although the average
power saving of probability-first algorithm is worse than that of N-based algorithm,
it gives better results for circuits like C432 and C3540. Besides, if we remove the
SMDC, results can be greatly improved. As is shown in Table 4.3, over 28% of total
power reduction is observed, which is almost twice as large as the results under the
limitation of SMDC. The probability-first algorithm suffers significantly from the
addition of SMDC.
Usually, the signals with high CV-probability tend to be the outputs of high
depth gates. Therefore, adopting these high probability signals to power-gating other
parts of the circuit often leads to the violation of Steady Maximum Depth Constraint
(SMDC). Hence under the SMDC, loss of power gating control in probability-
first algorithm is more serious than that in N-based algorithm, in which large power
domains could be power-gated by low-depth sleep signals.
On the probability-first algorithm, we need to improve an adjustment to SMDC
73
and the determination of CV-probability for fanout signals. Note that the probability-
first algorithm can achieve considerable expected power saving on certain circuit
structures. The potential of the probability-first algorithm should not be neglected
either. If a performance loss is acceptable, then probability-first algorithm can rise
the power saving.
4.3.3 pN-Based Heuristic Algorithm
In section 4.2, two important factors which influence the power saving in the CV-
based power gating, are indicated, one is the number of sleep gates (N) and the other
is the CV-probability (p). In the previous discussion, two heuristic algorithms are
described, each of which focuses on one of these two important factors. The N-based
algorithm, which selects the sleep signals based on the number of logic gates that can
be controlled by these signals, offers acceptable ENSG and is not so influenced by the
SMDC. The N-based algorithm, however, could hardly achieve significantly higher
power saving even without the timing constraint. The probability-first algorithm, on
the other hand, shows averagely low power reduction but can achieve a remarkably
raise of power saving once the SMDC is removed or is relaxed.
These heuristics focusing on only one aspect of power saving is relatively simple
to implement and considerably fast to run. Therefore, they both have their position
when the target circuit has certain logic structures or is too large for other complex
algorithms to finish in a reasonable period of time.
According to Equation (4.2), with the assumption that P0 is the same for all types
of sleep gates, the problem of maximizing the power saving is actually a problem of
maximizing the summation of p × N for all power domains. The optimum solution
for this maximizing problem would be, attempting all feasibilities of power domain
clustering and find the one that produces the most
∑
pN . However, in CV-based
power gating method, a signal can usually control a large amount of logic gates and
reversely, a gate may also has a large amount of sleep signal candidates. Therefore,
even for a not-so-large circuit, the number of all clustering cases can still be too large
for trying all of them. All possible clustering cases would be impractical to check.
74
In this section, based on the two heuristic algorithms already introduced, a new
heuristic focusing on both power domain size and CV-probability is proposed, which
is referred as pN -based algorithm in this dissertation. In pN -based algorithm, the
product of the number of controllable gates (N) and the CV-probability of the candi-
date sleep signal (p) is adopted as the criteria to select sleep signals and cluster power
domains. Therefore, both N and p should be computed before searching for the sleep
signals. Since CV-probability (p) and the number of controllable gates (N) can both
be considered as power saving capabilities of a sleep signal, a pN value should be
computed for every signal in the target circuit for the preparing of searching and
selecting sleep signals.
The pN -based algorithm then searches in the whole circuit, comparing the pN
values and the signal with the highest pN value is selected first. The logic gates
that can be power-gated by the selected signal are then clustered into one power
domain (or multiple domains in the case that the sleep signal has multiple fanout).
The clustered sleep gates and the gate generating the sleep signal are marked as the
same way in the previous algorithms. At the next round, the pN -based algorithm
continues to search among the unmarked signals and again the one with the highest
pN value is selected, etc. The same procedure keeps being performed until there is no
signal available as sleep signal. The pN -based algorithm ends when all the remaining
signals are with 0 pN value.
The pseudo code of the pN -based algorithm is described in Figure 4-10. Most
of the major sub-functions used in the pN -based algorithm are the ones used in
the two previously introduced algorithms. Since we should modify pN value at each
repetition, the complexity of the pN algorithm is the highest among these algorithms.
As usual, the first step of the algorithm is the generation of BDD for the target
circuit. In addition, dividing logic gates with more than two inputs into two-input
logic gates during the generation of BDD is quite convenient for the further procedure
since it actually changes the circuit into a binary-tree-like structure, which is relatively
easier for the traversal. The next step in the pN -based algorithm is to compute the
logic-0 and logic-1 probabilities for each signal.
75
pN -based algorithm:
01: Build BDD for the target circuit;
————– Compute pN value for each signal ————
02: For all signals in the target circuit, do {
03: ComputeP(i);
04: }
05: For all primary output signals, do {
06: For primary output i do {
07: Compute-pN(i);
08: }
09: }
————- Search for sleep signals based on their pN values ——-
10: For all unmarked signals, do {
11: if pN(j) is the largest and pN(j) 6=0, do {
12: Mark(j);
13: ENSG ← ENSG+N(j)× cv − prob(j);
14: Update Depth;
15: Update-pN;
16: }
17: if pN(j) is the largest and pN(j)=0, do {
18: quite loop;
19: }
20: }
End pN -Based algorithm
Figure 4-10: Pseudo code of pN -based heuristic algorithm
Different from the other heuristic algorithms, in the pN -based algorithm, the
number of controllable gates for each signal is computed with the probability. Since
for signals with multiple fanout to several successor gates, simply adding the number
of controllable gates from each fanout together usually leads to incorrect results. So
considering the types of the fanout successor gates, both logic-0 and logic-1 probability
of the fanout signal should be used to calculate the pN value. We compute the pN
value instead of N value. Figure 4-11 illustrates the pN value calculation for signals
with multiple fanouts.
Hence instead of using “Traversal()” function to calculate N , another sub-function
called “Compute-pN” is introduced to compute the pN value of every signal while
76
Figure 4-11: An example of controlling logic gates with fanout outputs.
traversing the circuit in the pN -based algorithm. The pseudo code of this sub-function
is shown in Figure 4-12. Similar with “Traversal” function, the “Compute-pN” is
also a recursive procedure which traverses the circuit from one primary output to the
related primary inputs. By applying this procedure to every primary output of the
target circuit, the pN values of all signals can be obtained.
For signals with multiple fanouts fed to several gates, they will be visited several
times during the traversal procedure. As is shown in Figure 4-11, assuming that Gate3
is firstly visited from Gate1, the number of controllable gates N1 can be computed
by using “Count” function to trace the other input of Gate1. Since Gate1 is an AND
gate, the pN value obtained by power-gating logic block 1 should be the product of
N1 and the logic-0 probability of signal s3. Then, the Compute-pN procedure visits
Gate3 for the second time through Gate2 and the number of controllable gates N2
on the Gate2 side is computed. Since Gate2 is an OR gate, the pN value obtained by
power-gating logic block 2 is obviously the product of N2 and the logic-1 probability
of s3. Finally, the overall pN value of signal s3 can be accumulated from these two
pN values.
After the calculation of pN values for all signals, the pN -based algorithm starts
searching for proper sleep signals. As mentioned before, the signal with the highest
77
Sub-function: Compute-pN(id)
————– Used to traverse the circuit and compute pN value for each signal ————
————– Note that the signal id is also the output of gate id ————
01: if gate id is a NAND/AND gate, do {
02: if input 1(id) is primary input, do {
03: temp prob 1←0.5;
04: }
05: else, do {
06: temp prob 1←Prob0(input 1(id));
07: }
08: if input 2(id) is primary input, do {
09: temp prob 2←0.5;
10: }
11: else, do {
12: temp prob 2←Prob0(input 2(id));
13: }
14: else if gate id is a NOR/OR gate, do {
15: if input 1(id) is primary input, do {
16: temp prob 1←0.5;
17: }
18: else, do {
19: temp prob 1←Prob1(input 1(id));
20: }
21: if input 2(id) is primary input, do {
22: temp prob 2←0.5;
23: }
24: else, do {
25: temp prob 2←Prob1(input 2(id));
26: }
27: if input 1(id) is marked, do { temp prob 1←0.0; }
28: else, do {
29: Count(input 2(id), input 1(id));
30: pN(input 1(id))←pN(input 1(id))+N(input 1(id))×temp prob 1
31: }
32: if input 2(id) is marked, do { temp prob 1←0.0; }
33: else, do {
34: Count(input 2(id), input 1(id));
35: pN(input 2(id))←pN(input 2(id))+N(input 2(id))×temp prob 2
36: }
————– Continue to compute the pN value for the rest of the circuit ————
37: Compute-pN(input 1(id));
38: Compute-pN(input 2(id));
Figure 4-12: Pseudo code of pN -based heuristic algorithm
78
Table 4.4: Estimation results of pN -based algorithms (with steady maximum depth
constraint)
Circuit Info. pN-based algorithm
Name Gates Original Depth No. of Sleep No. of Sleep ENSG Depth with Run time
gates signals PG
C432 252 31 88 51 53.8 (21.3%) 31 0.5s
C499 454 19 48 40 20.0 (4.4%) 19 0.8s
C880 435 30 191 65 76.9 (17.7%) 30 0.6s
C1355 590 26 48 40 20.0 (3.4%) 26 0.8s
C1908 1057 44 532 83 277.1 (26.2%) 44 0.9s
C2670 1400 39 801 164 457.5 (32.7%) 40 1.7s
C3540 1983 56 1225 289 742.9 (37.5%) 56 8.3s
C5315 2973 52 1085 284 475.8 (16.0%) 52 2.5s
C7552 4042 45 1807 491 967.9 (23.9%) 46 5.3s
pN value among all unmarked signals will be chosen as a sleep signal and all the
logic gates with the new sleep signal in their sleep signal candidate lists will be
clustered into one power domain (or several domains if the sleep signal has multiple
fanout gates). These new sleep gates are then marked and isolated from the future
searching rounds.
Note that once a power domain is formed, the N and depth information of the
unmarked signals should be changed. We need to update for depths and pN values
for some of the gates before the next searching round.
The rest of the task is to repeat the searching-clustering-updating procedure until
there is no signal available as a sleep signal. The space complexity of the pN based
algorithm is obviously higher than the probability-first algorithm due to the additional
amount of calculation required by updating pN values. However, because the pN -
based algorithm usually selects bigger power domains (N) with comparatively high
controlling value probability, the remaining size of the circuit decreases much faster
than the probability-first algorithm. Therefore, the extra calculation in pN -based
algorithm does not change the runtime too much comparing with probability-first
algorithm. In fact, the pN -based algorithm performs even faster than probability-
based algorithm under the same experimental environment.
79
The pN -based algorithm is implemented in C and executed on a PC with 2.40GHz
Intel Core 2 CPU and 2038MB of memory, the same with the one used for the
experiments of other algorithms. Since both p and N are considered, the pN -based
algorithm is expected to be more effective than the previously introduced heuristics.
Table 4.4 shows the power saving estimation obtained by applying the pN -based
algorithm to the ISCAS’85 benchmark circuits.
According to Table 4.4, averagely 20.3% of total power reduction can be achieved
by using pN -based algorithm, approximately 3% and 7% higher than N-based al-
gorithm and probability-first algorithm, respectively. Furthermore, the pN -based
algorithm is shown to be more effective in larger circuits. For circuits like C7552,
C2670 and C5315, the ENSG obtained by pN -based algorithm is 2-3 times larger
than that of the probability-first algorithm, and also about 10% larger than that of
the N-based algorithm. For other circuits like C432, C880 and C1908, about 20%-
60% of ENSG increase is observed compared with probability-first algorithms, while
comparing to N-based algorithm, the ENSG increases by 12%-27%.
Three heuristic algorithms, N-based, probability-first and pN -based, have been
described according to the power saving estimation shown in Equation (4.2). Among
these algorithms, the pN -based algorithm is the best one since it considers both
important factors, the CV-probability and the size of power domains at the same
time. For many circuits, the pN -based algorithm can control the amount of overall
sleep gates that is close to the upper limit of sleep gates shown in Table 3.2.
However, the pN -based algorithm is not the optimum algorithm for CV-based
power gating method. Generally speaking, there are two major limitations on the
power reduction of the pN -based algorithm.
One is that no further optimization is performed inside of the clustered power
domains. For example, if we apply the pN -based algorithm on the original circuit
shown in Figure 4-2, the power gating structure shown in Figure 4-2(a) will be im-
plemented since the signal s1 has the largest pN value (0.7×4 = 2.8). This structure
is not the optimum case as mentioned before. However, if we apply the pN -based al-
gorithm one more time and only to the power domain, which contains Gate1, Gate2,
80
Gate3 and Gate4, signal s2 will be chosen and assigned to Gate2, Gate3 and Gate4
as sleep signal instead of s1. Hence after applying the pN algorithm for the second
time, the power domain clustering becomes s1→ Gate1 and s2→ Gate2, 3, 4, which
is the optimum case for the circuit in Figure 4-2. Furthermore, if we apply the pN
algorithm again on the power domain containing Gate2, 3, 4, the s3 will be selected
and the power domain cluster shall become the case in Figure 4-2(b), which is not
the optimum case since one sleep gate is lost. This example illustrates that further
optimization on already-clustered power domain is necessary but the gains and losses
should be carefully checked.
The other limitation is that the duration of sleep time is barely considered in
pN -based algorithm. Although the CV-probability corresponds to the percentage of
overall sleep period in total runtime, it is rather hard to measure the exact sleep
interval for each sleep event individually. Sleep signals taking the vectors of “010101”
and “000111” results in the same CV-probability but totally different sleep duration
for each sleep event. Thus the switching power of sleep transistors is also difficult to
measure unless the input vectors are provided beforehand.
4.4 Trade-offs between Delay and Power Saving
All three algorithms are implemented with the Steady Maximum Depth Constraint
(SMDC) attached. Although SMDC is a direct way of maintaining the performance
of the original circuit, the loss of possible sleep gates might damage the overall power
saving obtained from the CV-based power gating method. There is a trade-off between
the sleep gates and the maximum depth. In practical LSI designs, there might be
some gap between the maximum delay and the real delay. If we can use the gap, the
CV-based power gating can achieve more overall power reduction without hurting the
performance too much.
In this section, the trade-off characteristic of the SMDC is discussed. In steady of
requiring that the maximum depth after applying the CV-based power gating must
not exceed the original maximum depth, 10%-50% tolerances of maximum depth
81
Figure 4-13: Trade-off between performance and power saving in the CV-based power
gating.
increase are set when checking the length of new paths formed during the application
of CV-based power gating. The result obtained by applying pN -based algorithm to a
benchmark C7552 in Table 4.3.3 is shown in Figure 4-13. It can be observed that by
extending the SMDC by 20%, the power saving is already close enough to the saving
without SMDC, furthermore, with the tolerance set to be 30%, the power saving is
almost the same with the saving obtained without applying the timing constraint.
Figure 4-13 shows a good trade-off characteristic between overall power reduc-
tion and delay overhead caused by the critical path increase. The sensitivity of
the CV-based power gating to maximum depth tolerance makes it more suitable to
practical LSI designs. Note that the results are obtained by applying the pN -based
algorithm with SMDC relaxation mechanism. On other heuristic algorithms like
the probability-first algorithm, more gain can be obtained by the timing constraint
relaxation.
4.5 HSPICE Simulation Results and Comparisons
The results shown in previous sections are the evaluation of Expected Number of
Sleep Gates (ENSG). It is quite effective for comparing the efficiency of different
82
Table 4.5: HSpice simulation results (With steady maximum depth constraint)
Original Probability-first algorithm pN-value-based algorithm
Circuit ID Power (W) Power (W) Reduction R1 (%) Power (W) Reduction R2 (%) Improvement R2/R1
C432 0.1202 0.1182 1.7% 0.1170 2.7% 1.6
C499 0.0015 0.0015 0.2% 0.0016 -4.1% –
C880 0.2557 0.2260 11.6% 0.1984 22.4% 1.9
C1355 0.0184 0.0182 1.1% 0.0181 1.4% 1.3
C1908 0.3060 0.3012 1.6% 0.2685 12.3% 7.7
C2670 0.6121 0.5503 10.1% 0.4614 24.6% 2.4
C3540 0.7393 0.5900 20.2% 0.5423 26.6% 1.3
C5315 1.3389 1.3100 2.2% 1.1085 17.2% 7.8
C7552 1.6847 1.5547 7.7% 1.3715 18.6% 2.4
heuristic algorithms. But since the power savings from different types of sleep gates
are not identical, the value of ENSG does not show the real power saving, as is
mentioned in Section 4.2.
In this section, the CV-based power gating method is applied to the ISCAS’85
benchmark circuits using both probability-first algorithm and pN -based algorithm
respectively. The BLIF-formated benchmark circuits are firstly disposed by a C pro-
gram, and both algorithms are applied, in which the structures of the original circuits
are parsed and the proper sleep signals and sleep gates are selected and sleep signals
are connected according to the two algorithms respectively. The obtained power gat-
ing circuits are then converted into HSPICE netlists. In the conversion, each gate of
the resulting circuits are replaced one by one to the corresponding model in VDEC
Rohm 0.18µm process. All HSPICE simulations are performed on a computer with
2.8GHz × 4 AMD Opteron CPU’s and 32GB of memory. 1000 random input pat-
terns are applied during the simulation with the interval of 20ns. The average power
consumptions for the original circuit, the circuit obtained by using probability-first
algorithm and the one obtained by using pN -based algorithm are shown in Table 4.5.
Note that the effects of interconnect loads are not included during the experiments
because of the direct conversion of each gate.
Table 4.5 shows significant improvement of pN -based algorithm over the probability-
first algorithm. For the relatively small circuits, like C432 and C880, etc, the pN -
83
based algorithm achieves 60%-90%more power reduction compared to the probability-
first algorithm.
Unfortunately, the efficiency of the CV-based power gating method is limited
on some circuit structures. As shown in Table 4.5, both algorithms gains very little
(approximately 1%) power saving in the benchmark C1355. For C499, the situation is
even worse, as the power dissipation actually increases after the pN -based algorithm
is applied. By checking the logic structures of these two benchmark circuits, it is
realized that both circuits are with XOR-tree logic structures. Since an XOR logic
gate has no controlling value (CV), the CV-based power gating method fails in finding
an acceptable amount of sleep gates. In fact, a large part of sleep elements selected
by the pN -based algorithm are inverters and the small power saving is gained from
power-gating these inverters. While in the case of probability-first algorithm, fewer
inverter is controlled due to the natural lower capability of finding sleep gates than the
pN -based algorithm, by which the probability-first algorithm can gain a small amount
of power reduction (around 4%). For C1355, the logic of which is almost identical
with C499, small amount of power reductions are obtained by both algorithms. This
is the same in C1355, where XOR gates are expended to the equivalents with four
2-input NAND gates. Only one NAND gate can be controlled in every four NAND
gate group, which leads to the differences of results obtained from C499 and C1355.
On the other hand, for large circuits including C1908, C2670, C5315 and C7552,
pN -based algorithm achieves much better results. The power savings obtained by
using pN -based algorithm are 3-6 times larger than that of the probability-first algo-
rithm. For C3540, both algorithms give considerable power reduction, which means
that the probability-first algorithm is also effective for this circuit.
All results obtained by applying the CV-based power gating method are under
the Steady Maximum Depth Constraint (SMDC) without any tolerance specified.
Hence averagely 20%-26% of total power dissipation can be reduced by pN -based
algorithm with very little performance cost. Also, since there is comparatively little
leakage current in 0.18µm technology, most of the power reduction achieved by the
CV-based power gating method might be dynamic power reduction. Therefore, if the
84
circuits is implemented by using sub-100nm process technologies, the power saving
of CV-based power gating method is expected to be more since leakage power will
become comparable to dynamic power.
4.6 Summary
In this chapter, the implementation of the CV-based power gating method is discussed
and several heuristic optimization algorithms are proposed to improve the power
saving.
In order to avoid the delay overhead caused by the critical path increase, a Steady
Maximum Depth Constraint (SMDC) is introduced. During the application of CV-
based power gating, the lengths of new paths formed by the power gating control are
checked. If certain power gating control results in the increase of maximum depth, this
control will not be applied. By using the SMDC, the maximum depth of the power-
gating circuit can be the same as that of the original circuit and the performance loss
caused by the critical path increase can be prevented. There is a trade-off between
performance and power reduction.
Furthermore, the factors that influence the power saving most in CV-based power
gating are found and three heuristic algorithms are proposed in order to effectively
implement the CV-based power gating method and rises the power saving. The first
one is an N-based algorithm, which selects the sleep signal based on the size of their
power domain. The second one is a probability-first algorithm, which clusters the
power domains according to the CV-probabilities of their sleep signal. The third one,
also the most effective, pN -based algorithm, which focus on both CV-probability and
power domain size. All three algorithms are better than the basic algorithm shown
in Chapter 3. According to the estimation results, approximately 17%, 13% and 21%
of Expected Number of Sleep Gates (ENSG) can be obtained by using N-based,
probability-first and pN -based algorithm, respectively.
In addition, not only estimation, HSPICE simulations are also performed to
check the real power saving capability of the CV-based power gating method. By
85
applying the pN -based algorithm, over 26% of total power reduction is observed with
very little performance penalty of SMDC.
86
Chapter 5
The Sizing of Sleep Transistors in
CV-based Power Gating
Sleep transistors, used to turn off the power supply to certain part of the circuit
during sleep time, can cause serious area and delay overhead. In CV-based algorithm,
although sleep transistors are the only extra elements attached to the original circuit,
the influence of sleep transistor should not be neglected.
A sleep transistor can be either an nMOS or a pMOS transistor, which is re-
ferred as “footer” and “header” sleep transistor, respectively. In traditional power
gating methods, the object is simply reducing leakage current during standby mode.
Therefore, the sleep transistors are usually with higher threshold voltage than the
transistors used in the original circuit. In the early days of power gating, both header
and footer sleep transistors are used for a single power domain in order to achieve
minimum leakage current during standby mode. However, at a process technology of
90nm and below, the area penalty caused by using both types of sleep transistors is
no longer tolerable for LSI designers.
Usually, due to the IR drop across the sleep transistor during active mode, perfor-
mance degradation is caused. For a fixed placement, the performance loss decreases as
the size of sleep transistors becomes larger. Hence the sleep transistors shouldn’t been
sized too small, even for the state-of-the-art LSI designs, the size of sleep transistors
is at least twice larger than the size of normal transistors. Even with coarse-grain
87
power gating, in which only one sleep transistor is inserted to power-gate a whole
module, the area overhead caused by the sleep transistor can still be more than 5%.
The situation is much worse for fine-grained power gating, in which each sleep gate
has an individually built-in sleep transistor.
On the other hand, the leakage current flowing through the sleep transistor during
the standby mode is proportional to the size of the sleep transistor, thus power con-
sumption will increase if the sleep transistor is sized too large. Therefore, the sizing
of sleep transistors becomes a challenge in any power gating design.
This chapter is written to discuss the sizing issue of sleep transistors in the CV-
based power gating method. The relation between the size of the sleep transistor
and the power saving is investigated experimentally by using Nanosim with layout
information under VDEC Rohm 0.18µm process technology. Due to the capability of
reducing both dynamic power and leakage power during active time, the sleep transis-
tor sizing of the CV-based power gating method is quite different from the traditional
power gating methods. According to the experimental results, the minimum overall
power dissipation is achieved when we adopt sleep transistors with almost the same
size of the transistors used in the original circuit.
To simplify the description, the transistor used in the original circuit is referred
as “core transistor”.
5.1 Switching Power of Sleep Transistors
As is mentioned above, the leakage current during the standby mode through the sleep
transistor is proportional to the size of the sleep transistor. The IR drop during the
active mode varies inversely with the size of sleep transistor. Therefore, in traditional
power gating methods, the sizing of sleep transistor is based on the trade-off between
the performance degradation and the power consumption of sleep transistors.
In the CV-based power gating method, however, the situation is quite different.
The shutting OFF and turning ON of the sleep transistor completely depends on the
logic structure and the input patterns. Since the switching activity of sleep transistors
88
Figure 5-1: A sample circuit used in the experiments for sleep transistor sizing.
is the same with its sleep signal, the switching of sleep transistors occurs much more
frequently than the ones in traditional power gating. Therefore, the switching power
of sleep transistors becomes a much more serious issue in CV-based power gating
method.
In this section, the switching power of sleep transistor in CV-based power gating
is investigated by using a sample circuit shown in Figure 5-1. In this circuit, signal
in3 is the input of the final output gate and also the sleep signal of the logic block
1, which contains only one AND gate with in1 and in2 as its inputs. For a given
process technology (0.18µm in this experiment), the gate length of a transistor is
usually fixed, the length of the sleep transistor is set to be 0.82u (u=unit length), the
same with that of core transistor. Hence in this experiment, we modify the width
of the sleep transistor and measure the corresponding power dissipation of the sleep
transistor.
The circuit is firstly described in Verilog code without the sleep transistor and
then synthesized by using Design Compiler. In the next step, Astro is used to obtain
the layout of the original circuit. Note that the floor plan of this circuit is based
on the area information from the synthesis report and the minimum area is calcu-
lated and used during layout so that the influence of the interconnections is reduced.
After performing the DRC and LV S checking, the steam file containing the layout
89
Figure 5-2: The relation between the width and switching power of sleep transistors.
information of the original circuit is converted into a distributed netlist written in
SPICE netlist format. By modifying the netlist, the V SS nodes of the AND gate in
logic block 1 are then replaced with virtual ground nodes and the sleep transistor is
inserted between V SS and the virtual ground node.
After preparing the netlist of the power gating circuit, a post-layout simulator
called Nanosim is used to perform the simulation. The value of signal in1 and in2
are both set to be logic-0, which means no matter what value in3 is, both AND gate
will not switching during the simulation. Therefore, by applying some random input
vectors to signal in3, the power consumption of this sample circuit can be considered
as the power dissipation caused mainly by the sleep transistor.
It should be mentioned that the gate length of the nMOS core transistors is 0.82u
and the gate width is 3.3u. In addition, the minimum width allowed by the library is
1.0u and the maximum width allowed is 227u respectively. Figure 5-2 shows the power
consumption of the circuit while modifying the gate width of the sleep transistor.
As shown in this figure, the power consumption of sleep transistor starts to raise
significantly at 13.6u, where the size of the sleep transistor becomes comparable to
the size of the core transistor. The power consumption increases dramatically as the
sleep transistor shrinks.
The results of the experiment illustrate that the power dissipation of the sleep
90
transistor increases greatly as the size of the sleep transistor decreases, which makes
it risky to size the sleep transistor too small in CV-based algorithm. If the minimum
sized sleep transistors are adopted, it is possible that the power saving gained by ap-
plying CV-based power gating will be canceled by the explosion of power consumption
caused by the sleep transistors.
91
5.2 Voltage of Virtual Power Supply
The object of the CV-based power gating method is not only about reducing the
leakage current, but also reducing the dynamic power dissipation during active time.
The dynamic power reduction is achieved by utilizing the voltage change of virtual
power supplies during sleep time, which is already described in section 3.2.1.
Taking the circuit in Figure 5-1 as an example, once signal in3 takes logic-0,
which means the sleep transistor is turned OFF and the sleep time of logic block 1
starts. The voltage of the virtual ground node would be charged gradually from 0 to a
voltage near VDD. Hence for the logic elements inside of the logic block 1, the voltage
added to their power supply nodes becomes so small that the switching capability of
these logic elements is seriously restricted. Therefore, even though the inputs of the
power-off logic block might still be changing, the output of gates in the block are not
capable of switching as usually, which produces significant reduction in the switching
power dissipation. Similar phenomenon can also be observed when pMOS header
sleep transistor is adopted. The only difference is that the voltage changes the virtual
VDD, which discharges gradually down to a voltage near VSS after the sleep transistor
is shut down. In traditional power gating method, the voltage fluctuation of virtual
power supplies is usually considered as a negative effect since it is the direct reason
leading to the floating outputs of power domains during standby mode. In addition,
traditional power gating techniques are only used during the standby mode, where
the action of power domains have already been disabled. In the CV-based power
gating method, the power dissipation are reduced during active mode, in addition,
the floating outputs of the power-off gates are naturally isolated by their sleep signal
and successor gates, hence the voltage fluctuation can be utilized in CV-based power
gating without influencing the power-on part of the circuit.
In ideal situation, the sleep transistor should be completely OFF as soon as the
sleep signal becomes the controlling value. However, the sleep transistor usually
requires certain time to be completely shut down and during this period, the power-
off gates in the corresponding power domain could still switch if their input signals
92
Figure 5-3: The voltage fluctuation of internal signal during sleep time.
changes. Furthermore, although the decrease of supply voltage makes the power-off
gate quite difficult to perform further switch, mild switch can be caused by the change
of inputs under the relatively small voltage.
Figure 5-3 shows the voltage change of internal signal out2, which is the output of
the logic block 1 in the sample circuit shown in Figure 5-1. After signal in3 takes the
controlling value (logic-0), the voltage of signal out2 still discharge for several times
when the input signals of the logic block 1 are changed. For each time, the discharge
amplitude of in2 becomes smaller, it is hard to observe any voltage change of out2.
Although in2 does not discharge completely from logic-1 to logic-0 during sleep
time, switching power is still consumed by this action. In order to further reduce the
dynamic power during sleep time, it is preferred that the voltage between the power
supply and the virtual power lines is further reduced, in other words, the smaller the
supply voltage of the power domain becomes in sleep period, more dynamic power
can be saved.
In this dissertation, the sleep-time supply voltage reduction is achieved by mod-
ifying the size of the sleep transistors. According to the equation shown below, the
leakage current flowing through the sleep transistor during sleep time is proportional
to the gate width of the sleep transistor.
93
Figure 5-4: The comparison of internal signals and virtual ground by using different
sized sleep transistor.
Isub = µ0COX
W
L
(m− 1)KT
q
× e
(Vg−VTH )
mKT/q × (1− eVDS/VT ) (5.1)
where W and L is the gate width and length of the transistor, respectively. m is
the body effect coefficient, Vg is the gate voltage and VTH is the threshold voltage.
Usually, the enlarging of leakage current through the sleep transistor results in low
switch efficiency, which in return, leads to weaker charge or discharge of the virtual
power lines. In other words, by reducing the width of the sleep transistor, the voltage
of the virtual power lines would becomes closer to the real power lines during sleep
time, which makes the supply voltage of the power-off gates smaller. Figure 5-4 shows
a voltage comparison of internal signal and virtual ground by using sleep transistors
with different width on the circuit shown in Figure 5-1. It is easy to observe that the
sleep transistor with smaller width, 1.0u in this figure, makes the virtual ground line
charged significantly higher than the one with larger width during the period when
in3 takes logic-0. Correspondingly, the discharge amplitude of the internal signal
out2 with the smaller sized sleep transistor is relatively smaller than that with larger
sized sleep transistors.
Hence in the CV-based power gating, reducing the width of sleep transistor con-
tributes not only to the leakage current reduction, but also to the dynamic power
94
reduction during the sleep period. Considering that the switching power consumed
by the sleep transistor, increases dramatically as the size of sleep transistor shrinks,
the sizing problem of sleep transistors in CV-based power gating method is actually a
problem of finding a sleep transistor size to balance the dynamic power reduction and
sleep transistor switching power and find the minimum overall power consumption.
As for the delay overhead caused by the sleep transistor, which is already discussed
in section 3.2.3, it is shown that the proposed CV-based power gating method suffers
little from the wakeup delay. Note that the results shown in Table 3.1 are obtained
by using a sleep transistor whose width is 1.0u. Therefore, even for the smallest
sleep transistor, the delay overhead caused by the wakeup delay of sleep transistor is
still considerably low. In further experiments shown in this chapter, we focus on the
trade-off between the power reduction gained by using the CV-based power gating
and the switch power of sleep transistors.
5.3 Experimental Results on the Size of Sleep Tran-
sistor in CV-based Power Gating
Experimental investigation is performed in order to check the relation between the
overall power saving and the size of the sleep transistors in CV-based power gating
method.
Note that since the CV-based power gating is an extremely fine-grained power
gating method, it is possible that a power domain contains only one sleep gate. In
addition, hundreds of sleep signals might be found for a single circuit, as shown in the
results of the three algorithms. Therefore, a built-in power gating style is adopted for
the CV-based power gating method, in which, a sleep transistor is inserted into each
sleep gate individually and the sleep gate with the built-in sleep transistor together
can be considered as a standard library cell. One of the merits of the built-in power
gating style is that since the virtual power supply is also individual for each sleep
gate, one sleep transistor is only responsible for one sleep gate, which makes the
95
Table 5.1: RMS Power Consumption versus the Width of Sleep Transistor
RMS Power (uW)
Input Vectors No PG Sleep Transistor Width (normalized with unit length u)
227 45 23 14 5 4 3.6 3.3 2.7 2.3 1.8 1.4 1.0
Vector 01 18.4 17.2 15.9 15.6 15.1 14.6 14.6 14.5 14.6 14.6 14.6 14.6 14.9 14.9
Vector 02 16.4 15.5 15.6 14.9 14.7 14.2 14.3 14.2 14.3 14.3 14.3 14.2 14.3 14.4
Vector 03 17.1 14.8 13.6 12.9 12.6 12.2 12.3 12.2 12.3 12.3 12.5 12.6 12.9 13.4
Vector 04 17.2 17.1 15.8 15.3 14.9 14.4 14.5 14.4 14.5 14.6 14.6 14.7 14.9 14.8
Vector 05 16.1 15.1 14.1 13.5 13.1 12.8 12.9 12.8 12.8 12.9 12.7 12.7 13.0 12.9
experimental investigation quite simple. In addition, since no shared virtual power
supply exists in this power gating fashion, it is not necessary to build another power
network just for the virtual power supply, and almost no modification is required
during layout. However, because of the large amount of sleep transistors inserted into
the original circuit, the area penalty might be large.
Since one sleep transistor is inserted into one sleep gate individually, it is reason-
able to investigate the sizing of sleep transistor by using a power domain that only
contains one logic gate. The circuit shown in Figure 5-1 is used again in the following
experiments.
The circuit is firstly written in Verilog code and then synthesized by using Design
Compiler. After obtaining the layout netlist and gate/transistor information, the
circuit is simulated by using Nanosim. 5 groups of input vectors are applied during
the simulation, each of them contains 1000 random input patterns. Furthermore,
the circuit is implemented by in 0.18µm process technology and all simulations are
executed on a computer with 2.8GHz × 4 AMD Operon CPU’s and 32GB memory.
In the netlist used in the simulation, the gate length of the sleep transistor is fixed
to be 0.18µm and the width of the sleep transistor is modified in order to find the size
leading to the minimum power consumption. The range of the width modification
is relatively large, which is from 227u (u=unit length) to 1.0u. The experimental
results are shown in 5.1 and further illustrated in Figure 5-5 in order to improve the
readability.
As is shown in Figure 5-5, the minimum point of power consumption appears
96
Figure 5-5: The relation between the size of sleep transistor and power consumption.
when the width of the sleep transistor is specified near 3.6u, which is almost the
same size with the core transistor (3.3u). From 3.6u and below, the overall power
consumption rebounds rapidly as the gate width of sleep transistor shrinks, which
illustrates that the power consumed by the sleep transistor starts to overwhelm the
total power reduction obtained by CV-based power gating.
As is mentioned, in the fine-grained power gating style, the area overhead caused
by the insertion of sleep transistor is significant and should be evaluated. In order to
form an AND gate, six transistors are required as is shown in Figure 5-6. According
to the device information obtained after layout, the lengths of all transistors are fixed
to 0.8u, the unit length. The widths of the pMOS transistors, p1, p2 and p3, are
3.4u, 3.4u and 5.7u respectively, while for nMOS transistors (n1, n2 and n3), the
widths are 3.3u, 3.3u and 5.6u respectively. Hence, by specifying the width of sleep
transistor st to be 3.6u, the area overhead can be roughly estimated to be 14.7%,
which is obtained simply by adding the area of all transistors together and comparing
with the area of an AND gate without the sleep transistor. Note that with proper
layout design and optimization, the area overhead can be further reduced for each cell
with sleep transistor inserted. In traditional fine-grained power gating methods, the
sleep transistor usually has to be sized at least twice the size of the core transistor,
which often results in around 50% of area overhead for each sleep gate.
97
Figure 5-6: An AND gate with a sleep transistor integrated.
For a large circuit applied with CV-based power gating method, assuming all sleep
gates are implemented by using built-in power gating style, then overall area overhead
(Oarea) can be estimated as follows.
Oarea =
Areapower−gating − Areano−power−gating
Areano−power−gating
=
(N − n) + n× (1 + β)−N
N
=
n× β
N
(5.2)
where N is the total number of logic gates in the target circuit, n is the number of
sleep gates after applying the CV-based power gating method, and β is the rate of
area overhead for a single sleep gate, which is approximately 14% for an AND type
sleep gate according to the discussion above. In Chapter 4, benchmark circuit C3540
is said to be able to achieve more than 26% of total power reduction. According
to Table 4.4, the total gate number in C3540 is 1983, among which 1225 gates are
power-gated and more than half of the sleep gates are AND type. Therefore, the
area overhead of C3540 after CV-based power gating can be roughly estimated as
1225× 0.14/1983 = 8.6%.
According to the estimation shown above, the CV-based power gating method is
98
capable of achieving more than 26% of overall power saving with less than 10% of
area overhead, which is considerably more effective compared with other fine-grained
power gating methods.
5.4 Summary
In this chapter, the issue of sleep transistor sizing in CV-based power gating method
is addressed and experimentally investigated. Due to the capability of reducing dy-
namic power, the sizes of sleep transistors in CV-based power gating method can
be specified much smaller than the ones used in traditional power gating while still
achieve considerable overall power reduction.
According to the experimental results, the sleep transistor can be sized almost
the same as the core transistor in the logic blocks, which lead to approximately only
14% area overhead if inserted into an AND-type standard cell. By estimation, for
C3540 in ISCAS’85 benchmark circuit, the CV-based power gating implemented in
fine-grained style can achieve over 26% of total energy saving with approximately
8% area overhead. Note that in this dissertation, the CV-based power gating circuit
is implemented by inserting a built-in sleep transistor for each sleep gate, which is
usually considered to be quite area-consuming.
99
100
Chapter 6
Conclusions and Future Work
6.1 Controlling Value based Power Gating Method
In this dissertation, an active mode power gating method based on the controlling
value of logic gates is proposed. Distinguished from other power gating methods, the
proposed power gating mechanism is capable of reducing both dynamic power and
leakage power effectively during active mode with considerably little delay and area
penalty.
In the proposed power gating method, the controlling property of CMOS logic
gates is utilized. For NAND/AND/OR/NOR logic gates, which are basic logic ele-
ments and used in almost all logic circuits, if one input takes certain value, the output
of these gates will be determined no matter what values other inputs are. Therefore,
other inputs are not necessary and we can reduce the power for computing these un-
necessary inputs. Such values are called the controlling value (CV). The controlling
value based (CV-based) power gating method adopts one of the input signals of a
logic gate as the sleep signal to power-gate the logic blocks that only involved in
generating the other inputs of this gate. By using this method, the sleep transistors
are allowed to shut down and turn on dynamically during active mode, which leads
to significant dynamic power reduction. Furthermore, all sleep signals in CV-based
power gating method are extracted from the original circuit, which means no extra
power management units are required. The structure of the CV-based power gat-
101
ing method also uses no extra signal isolation cells since the floating outputs of the
power-off gates are naturally isolated by the sleep signal and its successor logic gate.
Therefore, the proposed method is area-efficient since no extra circuit, except sleep
transistors, is attached to the target circuit.
On the other hand, the CV-based power gating method suffers from the delay
penalty caused by the critical path depth increase, which, however, can be easily
handled by adding some constraint on the depth during the application of power
gating. In addition, the CV-based power gating method is found to suffer little from
the wakeup delay penalty caused by the switch of the sleep transistor due to the
utilization of floating voltage of power-off gates.
In order to implement the CV-based power gating method, a basic algorithm is
built to select the sleep signals and to cluster the power domain based on the depths
of signals. A timing constraint called Steady Maximum Depth Constraint (SMDC)
is added during the sleep signal selection in order to prevent the critical path from
increasing. The power saving is estimated in terms of Expected Number of Sleep
Gates (ENSG), which is the product of the controlling value probability and the
number of sleep gates. Even though the selecting mechanism of the basic algorithm
is not so effective, it still achieves averagely 19% of ENSG without SMDC. The
effectiveness of CV-based power gating method is also tested on some AND(OR)-tree
circuit by using the basic algorithm and more than 66% of ENSG can be observed
with very little delay overhead. AND(OR)-tree circuits are widely used in equivalence
checking and other applications like carry-look-ahead adders.
6.2 Heuristic Optimization in CV-Based Fine-Grained
Power Gating
In order to achieve more power saving under the limitation of steady maximum depth
constraint, we have improved the sleep signal selecting mechanism. In this part of the
dissertation, the two factors that influence power saving are considered: the overall
102
number of sleep gate (N) and the CV probability of sleep signals (p).
Based on the above discussion, three heuristic algorithms are proposed, the N-
based algorithm, probability-first algorithm and pN -based algorithm. For each con-
trol candidate, we compute the N (number of controlled gates) and p (probability of
taking the controlling value). Each of the first two algorithms focuses on one factor
mentioned above and the pN -based algorithm, based on the previous two, combines
the two factors together. All three algorithms are developed under the steady max-
imum depth constraint during the sleep signal selection so that the results obtained
from these algorithms are pure power reduction without performance penalty at-
tached.
For the N-based algorithm, the size of power domain is the primary criteria based
on which the sleep signal is selected. By using this algorithm, the maximum number
of sleep gates is expected to be achieved while fulfilling the requirement of SMDC.
As a result, averagely 17.4% of ENSG is obtained by applying N-based algorithm
on some ISCAS’85 benchmark circuits. The power saving is only 2% less than the
results obtained by basic algorithm without SMDC. For certain circuit like C3540,
over 33% of ENSG is observed.
The probability-first algorithm, on the other hand, selects the sleep signal only
based on their probability of taking the controlling value. Different from the N-
based algorithm, the probability-first algorithm suffers greatly from the limitation of
SMDC, however, will achieve better results than N-based algorithm if the SMDC
is removed. Generally, the probability-first algorithm achieves approximately 13.6%
of ENSG on the same benchmark circuits.
Absorbed the advantages of both above algorithms, the pN -based algorithms se-
lects the sleep signals by considering both CV-probability and gate count. For a
candidate signal, the product of its CV-probability and the number of its control-
lable gates is adopted as the selecting criteria. As expected the pN -based algorithm
achieves much more power saving compared to the previous algorithms. For ISCAS’85
benchmark circuits, the pN -based algorithm achieves 20.3% of average ENSG, for
large circuits, over 37% of ENSG can be observed.
103
Post layout simulation is also performed to check the real power reduction of
the CV-based power gating. The best algorithm so far, the pN -based algorithm, is
evaluated. Even under the SMDC, pN -based algorithm managed to achieve more
than 20% of total power reduction and is proved to be more effective for larger circuits.
In addition, the results obtained by pN -based algorithm can be several times better
than the previous algorithms for certain circuits.
6.3 Sizing of sleep transistors in CV-based power
gating
The sleep transistor sizing problem is also addressed and experimentally investigated
in this dissertation. According to the experimental results, the sizes of sleep tran-
sistors in CV-based power gating method can be specified almost the same with the
transistors used in the original circuit, thanks to the capability of reducing both
dynamic power and leakage power of CV-based power gating method.
The optimum size of the sleep transistors is then obtained by post-layout ex-
periments and by estimation the area overhead caused by the insertion of the sleep
transistor for a single AND gate cell is approximately 14%. For a large benchmark
circuit, calculation shows that by using sleep transistors sized much smaller than
that of the traditional power gating methods, the CV-based power gating is capable
of achieve more than 26% of total power with less than 10% of area penalty.
6.4 Future work
In this dissertation, all the benchmark circuits are implemented in 0.18µm process
technology, which is far behind the current state-of-the-art LSI design technology
generations. For the future work, we would like to implement the CV-based power
gating method with sub-90nm or even sub-50nm process technology and check the
efficiency of CV-based power gating further, which will surely expose new issues of
the proposed method and we are sure that by fixing those problems, the proposed
104
method would be more practical.
105
106
Acknowledgments
I would like to thank all people who have helped and inspired me during my doctoral
study.
This dissertation could not have been written without Professor KIMURA, who
not only served as my supervisor but also encouraged and challenged me throughout
my academic program in the power consumption reduction area. He offered me
unconditional help whenever I need and also assisted me to build an active attitude
in scientific research, which will be beneficial for the rest of my life.
I would like to thank Professor Takeshi Yoshimura, Professor Takahiro Watanabe
and Professor Kimiyoshi Usami also for their valuable suggestions on my research
and for reviewing this dissertation. Special Thanks to Professor Satoshi Goto for
the financial support during my study of pursuing the Doctor’s Degree, and also for
bringing me to this graduate school in the first place, the precious experience in Japan
all this years will be a fortune of my life.
I am also honored to have the opportunity to work with all those brilliant members
of KIMURA Lab. Thank you all for the selfless help and inspiring advices on both
study and daily life. During my early days of doctor course study, Dr. Yun Yang
and Dr. Chengjie Zang offered me exhaustive guidance to experimental methodology
and English paper writing. I also would like to thank Mr. Weijie Xing, who provided
me with sincere assistance. The other laboratory members have always been very
friendly to me and helped me selflessly, I would like to give my best wishes to all of
them.
My deepest gratitude goes to my family for their unconditional love and support
throughout my life, this dissertation is simply impossible without them. Thanks to
my parents, Lianqing Chen and Jing Zhang, for their endless support and love, you
make me more assiduous and confident then I ever thought I could be. I am indebted
to my beautiful and loving wife, Mengru Wang, for her love and understanding during
the past years. I couldn’t ask more from her as she is just perfect.
Last but not least, thanks to the Graduate School of IPS, Waseda University for
107
offering me the opportunity to study here and to pursue the doctor’s degree. This
research was partly supported by “Ambient SoC Global COE Program of Waseda
University” of the Ministry of Education, Culture, Sports, Science and Technology,
Japan. Also partly supported by CREST Ultra Low Power Project of JSPS and by
Grant in Aid for Scientific Research from JSPS.
108
Bibliography
[1] B.H. Calboun, F.A. Honore, and A.P. Chandrakasan, “A leakage reduction
methodology for distributed MTCMOS,” IEEE J. Solid-State Circuits, vol.39,
no.5, pp.818-826, May. 2004.
[2] C. Kim and K. Roy, “Dynamic vth scaling scheme for active leakage power
reduction,” Proc. Design, Automation and Test in Europe, pp.163-167, 2002.
[3] N. Kim, T. Austin, D. Blaauw, T. Mudge, K. Flautner, J. Hu, M. Irwin, M. Kan-
demir, and V. Narayanan, “Leakage current: Moore’s law meets static power,”
Computer, vol.36, no.12, pp. 68-75, Dec. 2003.
[4] L Wei, Z Chen, M Johnson, K Roy, and V. De, “Design and optimization of
low voltage high performance dual threshold CMOS circuits,” Proc. 35th Design
automation Conference, pp. 489-494, Jun. 1998.
[5] T.W. Chang, T.T. Hwang, and S.Y. Hsu, “Functionality directed clustering for
low power MTCMOS design,” Proc. 10th Asia and South Pacific Design Au-
tomation Conference, pp. 862-867, Jan. 2005.
[6] M. Anis, S. Areibi, M. Mahmoud, and M. Elmasry, “Dynamic and leakage power
reduction in MTCMOS circuits using an automated efficient gate clustering tech-
nique,” Proc. 39th Design Automation Conference, pp. 480-485, Jun. 2002.
[7] S. Shigematsu, S. Mutoh, Y. Matsuya, and J. Yamada, “A 1-v high-speed MTC-
MOS circuit scheme for power-down applications,” Symposium on VLSI Circuits
Digest of Technical, pp. 125-126, 1995.
109
[8] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, “1-v
power supply high-speed digital circuit technology with multithreshold voltage
CMOS,” IEEE J. Solid-State Circuits, vol. 30, no. 8, pp. 847-854, Aug. 1995.
[9] R. Vilangudipitchai and Poras T. Balsara, “Power switch network design for
MTCMOS,” Proc. 18th International Conference on VLSI Design, pp. 836-839,
Jan. 2005.
[10] M. Keating, D. Flynn, R. Aitken, A. Gibbons, and K. Shi, “Low power method-
ology Manual”, Springer, 2007.
[11] L. Chen, T. Horiyama, Y. Nakamura, and S. Kimura, “Fine-grained power gating
based on the controlling value of logic elements,” IEICE Trans. Fundamentals,
vol. E91-A, no. 12, pp. 3531-3538, Dec. 2008.
[12] K. Usami and N. Ohkubo, “A design approach for fine-grained runtime power
gating using locally extracted sleep signals,” IEEE International Conference on
Computer Design, pp. 155-161 Oct. 2006.
[13] K. Usami and H. Yoshioka, “Dynamic sleep control for finite-state-machines to
reduce active leakage power,” IEICE Trans. Fundamentals, vol.E87-A, no. 12,
pp. 3116-3123, Dec. 2004.
[14] Bryant R.E., “Symbolic boolean manipulation with ordered binary decision dia-
grams,” ACM Comput. Surv., Vol. 24, no.3, pp. 293-318, Sep. 1992.
[15] Nair P.S., Koppa S. and John E.B., “A comparative analysis of coarse-grain
and fine-grain power gating for FPGA lookup tables,” 52nd IEEE International
Midwest Symposium on Circuits and Systems, pp. 507-510, 2009.
[16] H. Singh, K. Agarwal, D. Sylvester and K.J. Nowka, “Enhanced Leakage Re-
duction Techniques Using Intermediate Strength Power Gating,” IEEE Tran. on
Very Large Scale Integration Systems, vol. 15, no. 11, pp. 1215-1224, Nov. 2007.
110
[17] R. Bhanuprakash, M. Pattanaik, S.S. Rajput, and K. Mazumdar, “Analysis and
reduction of ground bounce noise and leakage current during mode transition of
stacking power gating logic circuits,” TENCON 2009, pp. 1-6, Jan. 2009.
[18] T. Lin, K.S. Chong, B.H. Gwee and J.S. Chang, “Fine-grained power gating for
leakage and short-circuit power reduction by using asynchronous-logic,” ISCAS
2009, pp. 3162-3165, May. 2009.
[19] M.H. Chowdhury, J. Gjanci and P. Khaled, “Controlling Ground Bounce Noise
in Power Gating Scheme for System-on-a-Chip,” ISVLSI’08, pp. 437-440, Apr.
2008.
[20] H.L. Jiang and M. Marek-Sadowska, “Power/Ground Supply Network Optimiza-
tion for Power-Gating,” ICCD 2006, pp. 332-337, Oct. 2007.
[21] S. Kim, S.V. Kosonocky, D.R. Knebel and K. Stawiasz, “Experimental Mea-
surement of A Novel Power Gating Structure with Intermediate Power Saving
Mode,” ISLPED’04, pp. 20-25, 2004.
[22] J. Fu, J. Hu and X. Luo, “The Implementation of Single-Phase Power-Gating
Adiabatic Circuits Using Improved CAL Circuits,” PACCS’09, pp. 334-337, May.
2009.
[23] H.L. Jiang and M.S. Malgorzata, “Power-Gating Aware Floorplanning,”
ISOED’07, pp. 853-860, Mar. 2007.
[24] S.H. Chen and J.Y. Lin, “Implementation and verification practices of DVFS
and power gating,” VLSI-DAT’09, pp. 19-22, Apr. 2009.
[25] H.L. Jiang, M. Marek-Sakowska and S.R. Nassif, “Benefits and costs of power-
gating technique,” ICCD 2005, pp. 559-566, Oct. 2005.
[26] H.H. Huang and C.H. Cheng, “Using Clock-Vdd to Test and Diagnose the Power-
Switch in Power-Gating Circuit,” 25th IEEE VLSI Test Symposium, pp. 110-118,
May 2007.
111
[27] S. Roy, N. Ranganathan and S. Katkoori, “Exploring compiler optimizations for
enhancing power gating,” ISCAS 2009, pp. 1004-1007, May 2009.
[28] M.H. Chowdhury, J. Gjanci and P. Khaled, “Innovative power gating for leakage
reduction,” ISCAS 2008, pp. 1568-1571, May 2008.
[29] B. Bollig, I. Wegener, “Improving the variable ordering of OBDDs is NP-
complete,” IEEE Transactions on Computers, vol.45, issue 9, pp. 993-1002, Sept.
1996.
[30] A. Todri, M. Marek-Sadowska and S.C. Chang, “Analysis and optimization of
power-gated ICs with multiple power gating configurations,” ICCAD 2007, pp.
783-790, Nov. 2007.
[31] Z.G. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson and
P. Bose, “Microarchitectural techniques for power gating of execution units,”
ISLPED’04, pp. 32-37, Aug. 2004.
[32] H. Homayoun, K.F. Li and S. Rafatirad, “Functional units power gating in SMT
processors,” PACRIM. 2005, pp. 125-128, Aug. 2005.
[33] P. Khaled, J.Y. Xu and M.H. Chowdhury, “Dual diode-Vth reduced power gating
structure for better leakage reduction,” MWSCAS 2007, pp. 1409-1412, Aug.
2007.
[34] L. Bolzani, A. Calimera, A. Macii, E. Macii and M. Poncino, “Placement-aware
clustering for integrated clock and power gating,” ISCAS 2009, pp. 1723-1726,
May 2009.
[35] C.Y. Yeh, H.M. Chen, L.D. Huang, W.T. Wei, C.H. Lu and C.N. Liu, “Using
power gating techniques in area-array SoC floorplan design,” IEEE International
SOC Conference 2007, pp. 233-236, Sept. 2007.
[36] H.O. Kim and Y.S. Shin, “Semicustom Design Methodology of Power Gated
Circuits for Low Leakage Applications,” IEEE Tran. on Circuits and Systems II:
Express Briefs, vol. 54, issue. 6, pp. 512-516, June 2007.
112
[37] E. Pakbaznia, F. Fallah and M. Pedram, “Sizing and placement of charge recy-
cling transistors in MTCMOS circuits,” ICCAD 2007, pp. 791-796, Nov. 2007.
[38] T.M. Tseng, M.C.-T. Chao, C.-P. Lu and C.-H. Lo, “Power-switch routing for
coarse-grain MTCMOS technologies,” ICCAD 2009, pp. 39-46, Nov. 2009.
[39] Z.Y. Liu and V. Kursun, “Characterization of wake-up delay versus sleep mode
power consumption and sleep/active mode transition energy overhead tradeoffs
in MTCMOS circuits,” MWSCAS 2008, pp. 362-365, Aug. 2008.
[40] H.S. Won, K.S. Kim, K.O. Jeong, K.T. Park, K.M. Choi and J.T. Kong,
“An MTCMOS design methodology and its application to mobile computing,”
ISLPED’03, pp. 110-115, Aug. 2003.
[41] Z.Y. Liu and V. Kursun, “Charge Recycling Between Virtual Power and Ground
Lines for Low Energy MTCMOS,” ISQED’07, pp. 239-244, March 2007.
[42] H. Jiao and V. Kursun, “Ground bouncing noise suppression techniques for
MTCMOS circuits,” ASQED 2009, pp. 64-70, July 2009.
[43] H. Singh Deogun, D. Sylvester, R. Rao and K. Nowka, “Adaptive MTCMOS for
dynamic leakage and frequency control using variable footer strength,” Proceed-
ings. IEEE International SOC Conference, 2005, pp. 147-150, Sept. 2005.
[44] C.J. Akl and M.A. Bayoumi, “Self-Sleep Buffer for Distributed MTCMOS De-
sign,” VLSID 2008, pp. 673-678, Jan. 2008.
[45] A. Davoodi and A. Srivastava, “Wake-up protocols for controlling current surges
in MTCMOS-based technology,” ASP-DAC 2005, vol. 2, pp. 868-871, Jan. 2005.
[46] C. Hwang, C. Kang and M. Pedram, “Gate sizing and replication to minimize the
effects of virtual ground parasitic resistances in MTCMOS designs,” ISQED’06,
pp. 741-746, March 2006.
[47] H. Hassan, M. Anis and M. Elmasry, “A Timing-Driven Algorithm for Leakage
Reduction in MTCMOS FPGAs,” ASP-DAC’07, pp. 678-683, Jan. 2007.
113
[48] M.H. Anis, M.I. Elmasry, “Power reduction via an MTCMOS implementation of
MOS current mode logic,” 15th Annual IEEE International ASIC/SOC Confer-
ence, 2002, pp. 193-197, Sept. 2002.
[49] J. Hu and J. Fu, “Leakage dissipation reduction of single-phase power-gating
adiabatic sequential circuits using MTCMOS,” Asia Pacific Conference on Post-
graduate Research in Microelectronics & Electronics, 2009, pp. 455-459, Jan.
2009.
[50] M.W. Allam, M.H. Anis and M.I. Elmasry, “ High-speed dynamic logic styles
for scaled-down CMOS and MTCMOS technologies,” ISLPED ’00, pp. 155-160,
2000.
[51] H.L.A. Chen, E.K.W. Loo, J.B. Kuo, and M.J. Syrzycki, “Triple-Threshold
Static Power Minimization Technique in High-Level Synthesis for Designing
High-Speed Low-Power SOC Applications Using 90nm MTCMOS Technology,”
CCECE 2007, pp. 1671-1674, April 2007.
[52] B. Amelifard, F. Fallah and M. Pedarm, “Low-power Fanout Optimization Using
MTCMOS and Multi-Vt Techniques,” ISLPED’06, pp. 334-337, Oct. 2006.
[53] K.K. Das and C.T. Chuang, “Ultra-low leakage MTCMOS circuits with regular-
Vt long channel stacked footers for deep sub-100 nm technologies,” VLSI-DAT
2008, pp. 81-84, 2008.
[54] A. Calimera, L. Benini, and E. Macii, “Optimal MTCMOS Reactivation Un-
der Power Supply Noise and Performance Constraints,” DATE’08, pp. 973-978,
March 2008.
[55] M.H. Anis, M.W. Allam, and M.I. Elmasry, “Energy-efficient noise-tolerant dy-
namic styles for scaled-down CMOS and MTCMOS technologies,” IEEE Trans-
actions on Very Large Scale Integration (VLSI) Systems, vol. 10, Issue. 2, pp.
71-78, Apr. 2002.
114
[56] E. Pakbaznia and M. Pedram, “Coarse-Grain MTCMOS Sleep Transistor Sizing
Using Delay Budgeting,” DATE’08, pp. 385-390, March 2008.
[57] K.K. Das, Shih-Hsien Lo and Ching-Te Chuang, “High performance MTC-
MOS technique for leakage reduction in hybrid SOI-epitaxial technologies with
enhanced-mobility PFET header,” 19th International Conference on VLSI De-
sign, 2006. Held jointly with 5th International Conference on Embedded Systems
and Design, 4 pp. Jan. 2006.
[58] J. Kao, S. Narendra and A. Chandrakasan, “MTCMOS hierarchical sizing based
on mutual exclusive discharge patterns,” Design Automation Conference, 1998,
pp. 495-500, Jun. 1998.
[59] M. Anis, S. Areibi and M. Elmasry, “Design and optimization of multithreshold
CMOS (MTCMOS) circuits,” IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 22, Issue. 10, pp. 1324-1342, Oct. 2003.
[60] J.B. Kim and D.W. Kim, “Low-Power Carry Look-Ahead Adder with Multi-
Threshold Voltage CMOS Technology,” CAS 2007, pp. 537-540, Oct. 15 2007-
Sept. 17 2007.
[61] V. Khandelwal and A. Srivastava, “Leakage Control Through Fine-Grained
Placement and Sizing of Sleep Transistors,” IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, vol. 26, Issue. 7, pp. 1246-1255,
July 2007.
[62] L.T. Clark, R. Patel and T.S. Beatty, “Managing Standby and Active Mode
Leakage Power in Deep Sub-micron Design,” ISLPED’04, pp. 274-279, 2004.
[63] J.H. Choi, Y. Xu and T. Sakurai, “Statistical leakage current reduction in high-
leakage environments using locality of block activation in time domain,” IEEE
Journal of Solid-State Circuits, vol. 39, Issue. 9, pp. 1497-1503, Sept. 2004.
[64] S. Hemantha, A. Dhawan and Haranath Kar, “Multi-threshold CMOS design for
low power digital circuits,” TENCON 2008, pp. 1-5, Nov. 2008.
115
[65] S. Henzler, G. Georgakos, M. Eireiner, T. Nirschl, C. Pacha, J. Berthold and
D. Schmitt-Landsiedel, “Dynamic state-retention flip-flop for fine-grained power
gating with small design and power overhead,” IEEE Journal of Solid-State Cir-
cuits, vol. 41, Issue. 7, pp. 1654-1661, July 2006.
[66] M. Imai, K. Takada, T. Nanya, “Fine-Grain Leakage Power Reduction Method
for m-out-of-n Encoded Circuits Using Multi-threshold-Voltage Transistors,”
ASYNC’09, pp. 209-216, May 2009.
[67] N. Jayakumar and S.P. Khatri, “An ASIC design methodology with predictably
low leakage, using leakage-immune standard cells,” ISLPED’03, pp. 128-133,
Aug. 2003.
[68] K. Choi, Y. Xu and T. Sakurai, “Optimal zigzag (OZ): an effective yet feasible
power-gating scheme achieving two orders of magnitude lower standby leakage,”
Symposium on VLSI Circuits, 2005, pp. 312-315, June 2005
[69] R. Chen, R. Liu and J.B. Kuo, “Gate-level dual-threshold total power optimiza-
tion methodology (GDTPOM) principle for designing high-speed low-power SOC
applications,” ICSICT 2008, pp. 2164-2167, Oct. 2008.
[70] P. Babighian, L. Benini, A. Macii and E. Macii, “Enabling Fine-Grain Leakage
Management by Voltage Anchor Insertion,” DATE’06, pp. 1-6, March 2006.
[71] J. Kao, S. Narendra, A. Chandrakasan, “Subthreshold leakage modeling and
reduction techniques [IC CAD tools],” ICCAD 2002, pp. 141-148, Nov. 2002.
[72] T. Kuroda, “Low power CMOS digital design for multimedia processors,” 6th
International Conference on VLSI and CAD 1999, pp. 359-367, 1999.
[73] H. Huang and J. Fan, “Multicore processor cluster based sleep transistor sizing
considering delay profile,” ASICON’09, pp. 654-657, Oct. 2009.
[74] W. Shen, Y. Cai, X. Hong and J. Hu, “Activity-Aware Registers Placement for
Low Power Gated Clock Tree Construction,” ISVLSI’07, pp. 383-388, March
2007.
116
[75] J. Oh and M. Pedram, “Gated clock routing minimizing the switched capaci-
tance,” DATE 1998, pp. 692-697, Feb. 1998.
[76] K. Choi, R. Soma and M. Pedram, “Fine-Grained Dynamic Voltage and Fre-
quency Scaling for Precise Energy and Performance Trade-Off Based on the
Ratio of Off-Chip Access to On-Chip Computation Times,” DATE’04, vol.1, pp.
4-9, Feb. 2004.
[77] A. Matsuzawa, “Issues of Current LSI Technology and the Future Technology
Direction,” IEICE Transactions on Electronics, vol.J87-C, no.11, pp.802-809,
Nov. 2004.
[78] S. Kimura, D. Dill, and S. G. Govindaraju, “A New Symbolic Image Com-
putation Algorithm Based on BDD Constrain Operator,” in Proc. of the 10th
Workshop on Synthesis And System Integration of Mixed Technologies (SASIMI
2001), pp. 167-171, Oct. 2001.
[79] V. Tiwari, S. Malik, and Pranav Ashar, “Guarded evaluation: pushing power
management to logic synthesis/design,” Proceedings of the 1995 International
Symposium on Low Power Design, pp. 221-226, April 1995.
[80] A. Correale, “Overview of the power minimization techniques in the IBM pow-
erPC 4xx embedded controllers,” Proceedings of the 1995 International Sympo-
sium on Low Power Design, pp. 75-80, April 1995.
[81] A.P. Chandrakasan, and R.W. Brodersen, “Minimizing power consumption in
digital CMOS circuits,” Proceedings of the IEEE, pp. 498-523, April 1995.
117
List of Publications
Journal Papers
[1] Lei Chen and Shinji Kimura, “Optimizing Controlling-Value-Based Power Gat-
ing with Gate Count and Switching Activity,” IEICE Transactions on Fundamen-
tals of Electronics, Communications and Computer Sciences, Vol.E92-A, No. 12,
pp. 3111-3118, December 2009.
[2] Lei Chen, Takashi Horiyama, Yuichi Nakamura and Shinji Kimura, “Fine-Grained
Power Gating Based on the Controlling Value of Logic Elements,” IEICE Trans-
actions on Fundamentals of Electronics, Communications and Computer Sciences,
Vol.E91-A, No. 12, pp. 3531-3538, December 2008.
International Conference Papers
[3] Lei Chen, and Shinji Kimura, “Active Mode Leakage Reduction Based on the
Controlling Value of Logic Gates,” Proceedings of 14th Workshop on Synthe-
sis And System Integration of Mixed Information technologies, pp.266-271, Oct.
2007.
Domestic Conference Papers
[4] Lei Chen, and Shinji Kimura, “A New Heuristic for Autonomic Controlling Value
Based Power Gating,” 2009-SLDM-140, No.5, pp.1-6, May 2009.
[5] Lei Chen, and Shinji Kimura, “Fine-Grained Power Gating Based on the Con-
trolling Value of Logic Gates,” 135th IPSJ Sig. Technical Report on SLDM,
2008-SLDM-135, pp.31-36, May 2008.
118
