PAPER Special Section on VLSI Design and CAD Algorithms

# Power-Minimum Frequency/Voltage Cooperative Management Method for VLSI Processor in Leakage-Dominant Technology Era

Kentaro KAWAKAMI<sup>†a)</sup>, Miwako KANAMORI<sup>†</sup>, Yasuhiro MORITA<sup>†</sup>, Jun TAKEMURA<sup>†</sup>, Masayuki MIYAMA<sup>†</sup>, and Masahiko YOSHIMOTO<sup>††</sup>, Members

SUMMARY To achieve both of a high peak performance and low average power characteristics, frequency-voltage cooperative control processor has been proposed. The processor schedules its operating frequency according to the required computation power. Its operating voltage or body bias voltage is adequately modulated simultaneously to effectively cut down either switching current or leakage current, and it results in reduction of total power dissipation of the processor. Since a frequency-voltage cooperative control processor has two or more operating frequencies, there are countless scheduling methods exist to realize a certain number of cycles by deadline time. This proposition is frequently appears in a hard real-time system. This paper proves two important theorems, which give the power-minimum frequency scheduling method for any types of frequency-voltage cooperative control processor, such as  $V_{dd}$ -control type,  $V_{th}$ -control type and  $V_{dd}$ - $V_{th}$ -control type processors.

**key words:** low power, dynamic voltage frequency scaling (DVFS), adaptive body biasing,  $V_{dd}$ -hopping,  $V_{th}$ -hopping

#### 1. Introduction

Continual progress of fabrication technology has been realizing expansion of a processing performance and higher integration of LSI. In sub-decimicron era, it also brings unexpected increase of power dissipation so that how to reduce power dissipation is becoming still more important. To achieve both of a high peak performance and low average power characteristics, Ref. [1] proposes  $V_{dd}$ -control type DVFS (Dynamic Voltage and Frequency Scaling) processor. This processor supports operation on two or more different frequencies, and power dissipation is effectively cut down by dynamic control of operating voltage according to frequency. Since a DVFS processor has two or more operating frequencies, there are countless scheduling methods exist to realize a certain number of cycles by deadline time. This proposition is frequently appears in a hard real-time system. For example, if five frequencies of  $f_0$ ,  $f_1$ ,  $f_2$ ,  $f_3$  and  $f_{max}$ are available on a DVFS processor, three scheduling methods shown in Fig. 1 realize same operating cycles; however, these methods consumes quite different amount of energy.

In case that the power dissipation of a processor can

Manuscript received March 18, 2005.

Manuscript revised June 14, 2005.

Final manuscript received August 2, 2005.

<sup>†</sup>The authors are with the Graduate School of Natural Science and Technology, Kanazawa University, Kanazawa-shi, 920-1192 Japan.

††The author is with the Department of Computer and Systems Engineering, Kobe University, Kobe-shi, 657-8501 Japan.

a) E-mail: kawakami@cs28.cs.kobe-u.ac.jp DOI: 10.1093/ietfec/e88-a.12.3290



Fig. 1 Various control methods which realize same operation cycles.

be modeled only with dynamic power component, Ref. [2] proves some important theorems about the power-minimum frequency scheduling. However, subthreshold leakage current is actualized in recent years, and it is occupying the great portion of the total power dissipation compared to that of the dynamic component. Because the subthreshold leakage current can be reduced by control of threshold voltage of transistors, Ref. [3] proposes a  $V_{th}$ -control type DVFS processor, which modulates the threshold voltage. Moreover, in order to cut down both dynamic power and leakage power, Ref. [4] proposes a  $V_{dd} - V_{th}$ -control type DVFS processor. If it rushes into the full-scale sub-decimicron era, various leakage current, such as gate leakage current, DIBL current, GIDL current and so on, will be actualized. Therefore, a variety of frequency-voltage cooperative control processor will be developed. This paper analytically derives the power-minimum frequency scheduling for the frequencyvoltage cooperative control processors which will play an important role in the leakage-dominant technology era.

The rest of the paper is organized as follows. Section 2 will explain ideas of frequency-voltage cooperative control processors, and reveal relation between power dissipation and operating frequency. In Sect. 3, two important theorems will be proved. These Theorems give the power-minimum frequency scheduling which is uniquely settled. In Sect. 4, power reduction ratio will be actually estimated in case of MPEG4 encoding process. Finally, Sect. 5 will summarize findings of this work.

# 2. Idea of Frequency/Voltage Cooperative Control Processor

Power dissipation of LSI P arise from three major compo-

nents, switching current  $P_{dy}$ , short circuits current  $P_{sc}$  and leakage current  $P_{leak}$ . They are expressed as Eqs. (1)–(4).

$$P = P_{du} + P_{sc} + P_{leak} \tag{1}$$

$$P_{dy} = \alpha C_L f V_{dd}^2 \tag{2}$$

$$P_{sc} = \frac{\beta}{12} (V_{dd} - 2 V_{th})^3 \frac{\tau}{T}$$
 (3)

$$P_{leak} = V_{dd}I_0 \exp\left(\frac{-V_{th}}{S}\ln(10)\right) + V_{dd}I_{gate}$$
 (4)

where

 $\alpha$ : switching activity

f: operating frequency

 $C_L$ : load capacitance

 $V_{dd}$ : supply voltage

 $V_{th}$ : threshold voltage

 $\beta$ : gain factor ( $\mu$ A/V<sup>2</sup>) of MOS transistors

 $\tau$ : rise or fall time

T: period-time of a signal (= 1/f)

 $I_0$ : drain current when  $V_{qs} = V_{th}$ 

S: subthreshold slope

 $I_{qate}$ : gate leakage current.

The first and second terms of Eq. (4) represent power dissipation of subthreshold leakage current and gate leakage current, respectively. References [7], [8] tried formulating gate leakage current by means of many parameters precisely. According to Refs. [7], [8] measured gate leakage is monotone increase function of gate voltage, in other words, supply voltage.

The maximum operating frequency of a processor  $f_{max}$  can be modeled as Eq. (5) [5].

$$f_{max} = \frac{k(V_{dd} - V_{th})^a}{nC_L V_{dd}} \tag{5}$$

where k is delay coefficient, n is logic depth of critical path and a is velocity saturation index  $(1 < a \le 2)$ . Equation (5) means larger operating frequency of a processor requires either larger operating voltage or smaller threshold voltage. From Eq. (1)–(4), such modification increases  $P_{dy}$ ,  $P_{sc}$  and  $P_{leak}$ . Therefore, P is a monotonous increase with respect to f (Eq.(7)). Utilizing this characteristics, a frequency-voltage cooperative control processor reduces power dissipation through changes of operating voltage and/or threshold voltage according to the change of operating frequency. Threshold voltage is modulated through control of substrate bias voltage using the substrate effect taken by Eq. (6) [4].

$$V_{th} = V_{t0} + \gamma (\sqrt{2\phi_B - V_{bb}} - \sqrt{2\phi_B})$$
 (6)

where

 $V_{t0}$ : constant for a given technology process

 $\gamma$ : body factor

 $\phi_B$ : flatband voltage

 $V_{bb}$ : body bias voltage.

In addition, since  $P_{dy}$  is directly proportional to f, increasing rate of P is larger than that of f. Therefore,  $\frac{dP(f)}{df}$  is also

monotonous increase described as Eq. (8).

$$\frac{dP(f)}{df} > 0 \tag{7}$$

$$\frac{d^2P(f)}{df^2} > 0\tag{8}$$

The SPICE simulation was performed in order to confirm that f and P satisfy the relations of Eqs. (7) and (8). "Common Design Rules for 0.1 micron" recommended by Semiconductor Technology Academic Research Center (STARC) was used as a model file of the SPICE simulation. The simulation flow is shown below.

- 1. The maximum operating frequency  $f_{max}$  of a conceptual 32-bit RISC illustrated in Fig. 2 is derived with various combination of  $V_{dd}$  and the NMOS body bias voltage  $V_{bn}$ . Latice points in Fig. 3 presents the simulated combinations of  $V_{dd}$  and  $V_{bn}$ . PMOS body bias voltage  $V_{bp}$  is symmetrically set to  $V_{dd}$ – $V_{bn}$ .
- 2. Power dissipation is simulated for all  $V_{dd}$ – $V_{bn}$  combination. Toggle rate of the logic portion, accessing rate of D-Cache, I-Cache and Internal SRAM were set to 15, 50, 90, 13%, respectively. At this time, frequencies below  $f_{max}$  were set up among 50, 100, 150, 200 or 250 [MHz].
- 3. From the result of simulation 2, the power-minimum  $V_{dd}$  and  $V_{bn}$  combination was searched for  $V_{dd}$ -control type,  $V_{th}$ -control type and  $V_{dd}$ - $V_{th}$ -control type proces-



**Fig. 2** Block diagram of a conceptual 32 bits RISC which includes a shared 256 kbyte SRAM, 16 kbyte Data Cache and 16 kbyte Instruction Cache.



**Fig. 3** Power-minimum  $V_{dd}$ - $V_{bn}$  combinations.



Fig. 4 Maximum operating frequency of a 32bits RISC (simulated).



**Fig. 5** Power dissipation of F-V cooperative control processors (simulated).

**Table 1** Values of fitting parameters.

| CPU Type                     | α     | β                     | γ    |
|------------------------------|-------|-----------------------|------|
| V <sub>dd</sub> -control     | 0.172 | $5.53 \times 10^{-5}$ | 1.74 |
| V <sub>th</sub> -control     | 0.120 | $1.60 \times 10^{-7}$ | 2.73 |
| $V_{dd}$ - $V_{th}$ -control | 0.029 | $2.91 \times 10^{-5}$ | 1.76 |

sors.

Figure 4 shows the relation between  $V_{dd}-V_{bn}$  combination and  $f_{max}$ . It can grasp that larger  $V_{dd}$  or  $V_{bn}$  achieves larger  $f_{max}$ . Figure 3 illustrates the derived  $V_{dd}-V_{bn}$  combinations that minimize power dissipation for operating at 50, 100, 150, 200 and 250 [MHz].

Relations between f and P are shown in Fig. 5. Normalized power dissipation is plotted on the vertical axis. The leakage current consumes 65.1% of total power dissipation in case of  $V_{dd}$ -control processor at 250 [MHz] operation. The curves which are obtained by parameter fitting with Eq. (9) are also drawn in Fig. 5. The fitting parameters of each type of processors are shown in Table 1. These curves of all of the three certainly satisfy Eqs. (7) and (8).

$$P = \alpha + \beta f^{\gamma} \tag{9}$$

# 3. Energy-Minimum Frequency Scheduling

In this section, two important theorems are proved under the

following assumptions. These two theorems give energyminimum frequency scheduling for executing a process with a certain number of cycles by deadline time.

### **Assumptions**

- 1. The relation between P and f of a frequency-voltage cooperative control processor satisfies Eqs. (7) and (8).
- 2. Energy dissipation for generating various f,  $V_{dd}$ ,  $V_{bn}$  and  $V_{bp}$  can be disregarded.
- 3. Time and energy dissipation required to change operating frequency and corresponding operating voltage and body bias voltage can be disregarded.

Also in case that power dissipation is mainly occupied only with that of dynamic, the first assumption is satisfied [2]. Therefore the theorems proved in the rest of this section are approved in that case.

## 3.1 Case of Continuous Operating Frequency

If a frequency-voltage cooperative processor can use continuously variable frequencies, Theorem 1 is approved.

**Theorem 1:** Constant control at an ideal frequency  $f_I$  minimizes energy dissipation for an operation with a given cycle H during a given time constraint  $T_f$ .

**Proof:** Processing time operating at frequency f is expressed as t(f) which is a function of f. The function t(f) satisfies Eqs. (10) and (11).

$$\int_0^{f_{max}} f \cdot t(f)df = f_I \cdot T_f \tag{10}$$

$$\int_{0}^{f_{max}} t(f)df = T_f \tag{11}$$

Power dissipation P(f) is expressed as Eq. (12) by Taylor series and the remainder term.

$$P(f) = P(f_I) + P'(f_I)(f - f_I) + \frac{P''(f_c)}{2}(f - f_I)^2$$
 (12)

where

 $f_c$ : a certain value satisfying either  $f > f_c > f_I$  or  $f_I > f_c > f$ .

Energy dissipation E is calculated as follows;

$$E = \int P(f) \cdot t(f)df$$

$$= \int \{P(f_I) + P'(f_I)(f - f_I) + \frac{P''(f_c)}{2}(f - f_I)^2\}t(f)df$$

$$= P(f_I) \int t(f)df + P'(f_I)\{\int ft(f)df - \int f_I t(f)df\} + \int \frac{P''(f_c)}{2}(f - f_I)^2 t(f)df$$

$$= P(f_I)T_f + \int \frac{P''(f_c)}{2}(f - f_I)^2 t(f)df$$
(16)

Description of the interval was omitted in Eqs. (13)–(16). Since  $\frac{P''(f_c)}{2}$  is always positive from Eq. (8), Eq. (16) states that E is minimized if and only if  $f \equiv f_I$ .

**Consideration 1:** Under the following additional conditions, Ref. [2] states that the frequency scheduling controlled to the fixed  $f_I$  realizes the minimum energy dissipation.

- In addition to zero, only one frequency can be chosen.
- Only  $V_{dd}$ -control type processor is considered.
- Power dissipation other than dynamic power can be disregarded.

Theorem 1, however, states that the fixed  $f_I$  realizes the minimum energy dissipation even in the following conditions.

- Two or more frequencies can be chosen.
- Any types of frequency-voltage cooperative processor can be considered.
- Power dissipation other than dynamic power can be regarded.

## 3.2 Case of Discrete Operating Frequency

Because using continuously variable voltage and frequency is infeasible [6], a frequency-voltage cooperative processor has only discretely variable frequency-voltage combinations. In such case, Theorem 2 is approved.

**Theorem 2:** Frequency scheduling satisfying following two conditions minimizes energy dissipation for an operation with a given cycle H by a given time constraint  $T_f$ .

- A. Utilizing two frequencies of  $f_a$  and  $f_b$ , which are immediate neighbors to  $f_I$  and satisfy  $f_a \ge f_I > f_b \ge 0$ .
- B. Setting frequency to  $f_a$  during period  $t_a = T_f(f_I f_b)/(f_a f_b)$  and to  $f_b$  during period  $t_b = T_f(f_a f_b)/(f_a f_b)$

**Proof:** In case that a processor has m operating frequencies, let us consider an arbitrary frequency control for an operation with a given cycle H during a given time constraint  $T_f$ . The jth operating frequency  $f_j$  and the period  $t_j$  in which the processor running at  $f_j$  satisfy Eqs. (17)–(20). Figure 6



**Fig. 6** *m* level frequency scheduling.

illustrates a example of m level frequency scheduling.

$$t_i \ge 0 \tag{17}$$

$$T_f = \sum_{i=0}^{m-1} t_i \tag{18}$$

$$H = \sum_{j=0}^{m-1} f_j t_j \tag{19}$$

$$f_{m-1} > f_{m-2} > \dots > f_i > f_I$$
 (20)  
  $f_{i-1} > \dots > f_0 = 0 \quad (m-1 \ge i > 0)$ 

Now consider three periods  $t_i$ ,  $t_{i-1}$  and arbitrary one  $t_k$  out of the remaining periods. Total energy dissipated during these three periods is expressed as Eq. (21).

$$E = P(f_i) \cdot t_i + P(f_{i-1}) \cdot t_{i-1} + P(f_k) \cdot t_k \tag{21}$$

If operation cycles of these three periods (Fig. 7(a)) is executed by only  $f_i$  and  $f_{i-1}$  (Fig. 7(b)), energy dissipation is changed to E' calculated with Eq. (22).

$$E' = P(f_i) \cdot t'_i + P(f_{i-1}) \cdot t'_{i-1}$$
(22)

Since the two controls achieves same cycles at the same duration, Eqs. (23) and (24) are approved.

$$t_i + t_{i-1} + t_k = t_i' + t_{i-1}'$$
(23)

$$f_i t_i + f_{i-1} t_{i-1} + f_k t_k = f_i t_i' + f_{i-1} t_{i-1}'$$
(24)

Time differences for the period at  $f_i$  and  $f_{i-1}$  are calculated as Eqs. (25) and (26).

$$t_i - t_i' = \frac{f_{i-1} - f_k}{f_i - f_{i-1}} t_k \tag{25}$$

$$t_{i-1} - t'_{i-1} = \frac{f_k - f_i}{f_i - f_{i-1}} t_k \tag{26}$$



(a) Control with three frequencies.



(b) Control with two frequencies.

Fig. 7 Replacement of frequency control.

Therefore, the difference of energy dissipation between two controls is calculated as follows.

$$E - E' = P(f_i)(t_i - t'_i) + P(f_{i-1})(t_{i-1} - t'_{i-1})$$

$$+ P(f_k)t_k$$

$$= \{P(f_i) - P(f_k)\}(t_i - t'_i)$$

$$+ \{P(f_{i-1}) - P(f_k)\}(t_{i-1} - t'_{i-1})$$

$$= \frac{t_k}{f_i - f_{i-1}} [\{P(f_i) - P(f_k)\}(f_{i-1} - f_k)$$

$$- \{P(f_{i-1}) - P(f_k)\}(f_i - f_k)]$$
(29)

Here,  $\{P(f_i) - P(f_k)\}(f_{i-1} - f_k) - \{P(f_{i-1}) - P(f_k)\}(f_i - f_k)$  in Eq. (29) is always positive, which is proved as Lemma 1 in Appendix. Hence Eq. (30) is obtained.

$$E - E' > 0 \tag{30}$$

It states that (m-1) level control dissipate less energy than m level control. If the above-mentioned proposition is recursively applied for (m-2) times, the periods for other than  $f_i$  and  $f_{i-1}$  become equal to zero.

$$T_f = t_i + t_{i-1} (31)$$

$$H = f_i t_i + f_{i-1} t_{i-1} = f_I T_f \tag{32}$$

Here,  $t_i$  and  $t_{i-1}$  are uniquely determined to  $t_i = T_f(f_I - f_{i-1})/(f_i - f_{i-1})$  and  $t_{i-1} = T_f(f_i - f_I)/(f_i - f_{i-1})$  from Eqs. (31) and (32), therefore Theorem 2 is approved.

**Lemma 1:** Shorter a period in which processor operates at a higer frequency than  $f_i$  is, smaller energy dissipation becomes.

**Proof:** Considering a case that a processor have only three frequencies of  $f_{i+1}$ ,  $f_i$ ,  $f_0$  in the proof of Theorem 2, it is immediate that smaller  $t_{i+1}$  becomes, in other words more operation cycles processed at  $f_{i+1}$  are executed at  $f_i$ , less energy is consumed.

**Consideration 2:** Theorem 2 demonstrates that the power-minimum frequency scheduling method is identical with that of Ref. [2] even if all types of DVFS processors and leakage power are considered.

In reality, it is difficult to know the ideal frequency before starting or finishing processes so that many practical frequency scheduling algorithms have been proposed to reduce power dissipation [9], [10] for  $V_{dd}$ -control type DVFS processors. These algorithms have following common idea.

- Ensuring of real time constraint, where all processes are finished before there dead line time comes.
- Shortening period in which processor operates at a higher frequency.

Theorem 2 and Lemma 2 guarantee that these practical scheduling algorithms are also effective with all types of DVFS processors in leakage-dominant technology era.

#### 4. Application for MPEG4 Encoding Process

In case that a software is performed on a normal processor which has only one operating frequency, the processor always runs at the fixed maximum frequency despite of the workload of that software. On the other hand, the frequency-voltage cooperative control processors are set at suitable frequencies depending on the kind of software and they can save power dissipation as compared to the normal processor. The suitable frequencies are beforehand estimated in simulations.

If the required performance fluctuates according to input data for processing, finer frequency control even for one kind of software can achieve the further power reduction. Motion video encoding is one of such the software. For example, in case that MPEG4 visual encoding software is implemented on M32700  $\mu$ T-Engine [13] which has a 32bits RISC, the required frequencies for QCIF (176 × 144 pixel) at 15 [frame/s] real-time encoding are 99.6 (Fig. 8(a)), 139.5 (Fig. 8(b)), 164.3 (Fig. 8(c)) and 166.8 (Fig. 8(d)) [MHz].

Combining this characteristic and the frequency-voltage cooperative control processor, Ref. [12] effectively cut down the power dissipation for software base MPEG4



Fig. 8 Examples of simulated sequences.



Fig. 9 Conventional scheduling and energy-minimum scheduling.

visual encoding. It proposes the frequency control for every frame according to the predicted workload. Reference [12] changes the frequency and voltages only once in a frame as illustrated in Fig. 9(b). If frequency control is performed according to Theorem 2 as indicated in Fig. 9(c), further power reduction can be achieved.

Power dissipation ratio r is estimated by assuming three kinds of  $V_{dd} - V_{th}$ -control type processors shown in Table 2. Each of which have six, four or three discrete operating frequencies. The power dissipation ratio r is calculated from Eqs. (33)–(38). The numerator of Eq. (33) is  $E_{min}$ ,  $E_{one}$  or  $E_{two}$ .

$$r = \frac{E_{min}|E_{one}|E_{two}}{E_{conv}} \tag{33}$$

$$P(f) = 0.029 + 2.91 \times 10^{-5} \times f^{1.76}$$
(34)

$$E_{min} = P(f) \cdot T_f \tag{35}$$

$$E_{one} = P(f_a) \cdot \frac{f_a}{f_{max}} \cdot T_f + P(0) \cdot \frac{f_{max} - f_a}{f_{max}} \cdot T_f$$
 (36)

$$E_{two} = P(f_a) \cdot t_a + P(f_b) \cdot t_b \tag{37}$$

$$E_{conv} = P(f_{max}) \cdot \frac{f}{f_{max}} \cdot T_f + P(0) \cdot \frac{f_{max} - f}{f_{max}} \cdot T_f \quad (38)$$

where  $E_{min}$ : theoretical minimum energy dissipation drawn from Theorem 1,  $E_{one}$ : energy dissipation of the one-level scheduling,  $E_{two}$ : energy dissipation of the two-level scheduling drawn from Theorem 2,  $E_{conv}$ : energy dissipation of conventional scheduling, f: operating frequency re-

**Table 2** Three types of  $V_{dd} - V_{th}$ -control processor.

| Frequency | Type |              |     | $V_{dd}$ | $V_{bn}$ |
|-----------|------|--------------|-----|----------|----------|
| [MHz]     | I    | II           | III | [V]      | [V]      |
| 250       | √    | √            | √   | 0.70     | -0.1     |
| 200       | √    |              |     | 0.65     | -0.3     |
| 150       | √    | √            | √   | 0.60     | -0.4     |
| 100       | √    |              |     | 0.55     | -0.5     |
| 50        | √    | $\checkmark$ |     | 0.50     | -0.5     |
| 0         | √    | √            | √   | 0.40     | -0.5     |

quired for MPEG4 real-time encoding,  $f_a$ ,  $f_b$ ,  $t_a$ ,  $t_b$ : operating frequencies and periods derived from Theorem 2.

Figure 10 shows the calculated power ratio r. It indicates the following results. 1. The frequency scheduling derived from Theorem 2 can effectively reduces power dissipation. Especially in Type I, almost the minimum value of Theorem 1 is realized. In case of "Akiyo" with Type I processor, 44.6% of power reduction is estimated. 2. As the number of operating frequency decreases, the power reduction becomes small in case of the frequency scheduling proposed in Ref. [12]. On the other hand, the frequency scheduling derived from Theorem 2 keeps its effectiveness even with Type II or Type III processor. Difference between the one-level scheduling and the two-level scheduling becomes maximum of 23.3% in case of "Mobile & Calendar" with Type II or Type III processor.

These results demonstrate that the frequency scheduling derived from Theorems is practically effective to cut down power dissipation in leakage-dominant technology era.

#### 5. Conclusion

To achieve both of a high peak performance and low average power characteristics, three types ( $V_{dd}$ -control,  $V_{th}$ -control and  $V_{dd}$ - $V_{th}$ -control) of DVFS processor have been proposed. According to workload running on processors, DVFS can reduce power dissipation of processors by means of frequeycy-voltage (supply voltage or/and threshold voltage of MOS transistors) cooperative control. This paper derived two relations of Eqs. (7) and (8) which are approved between power dissipation and voltage of three types of DVFS processors. Equations (7) and (8) are confirmed by SPICE simulation. This paper also proved two important theorems about the energy-minimum frequency scheduling of DVFS processors. These two theorems guarantees that frequency scheduling algorithms proposed to  $V_{dd}$ -control type processor are applicable to  $V_{th}$ -control type and  $V_{dd}$ -



Fig. 10 Power dissipation ratio.

 $V_{th}$ -control type of processors. The frequency-voltage cooperative control processors will play a much more important role in the coming full-scale sub-decimicron era together with the power-minimum scheduling derived by the two theorems.

### Acknowledgments

The authors acknowledge Dr. H. Ohira for encouragement to this study.

#### References

- [1] K.J. Nowka, G.D. Carpenter, E.W. MacDonald, H.C. Ngo, B.C. Brock, K.I. Ishii, T.Y. Nguyen, and J.L. Burns, "A 32-bit PowerPC system-on-a-chip with support for dynamic voltage scaling and dynamic frequency scaling," IEEE J. Solid-State Circuits, vol.37, no.11, pp.1441–1447, Nov. 2002.
- [2] T. Ishihara and H. Yasuura, "Voltage scheduling problem for dynamically variable voltage processors," Proc. International Symposium on Low Power Electronics and Design, pp.197–202, Aug. 1998.
- [3] K. Nose, M. Hirabayashi, H. Kawaguchi, S. Lee, and T. Sakurai, "Vth-hopping scheme to reduce subthreshold leakage for low-power processors," IEEE J. Solid-State Circuits, vol.37, no.3, pp.413–419, March 2002.
- [4] J. Kao, M. Miyazaki, and A.P. Chandrakasan, "A 175-mV multiply-accumulate unit using an adaptive supply voltage and body bias architecture," IEEE J. Solid-State Circuits, vol.37, no.11, pp.1545–1554, Nov. 2002.
- [5] T. Sakurai and A.R. Newton, "Alpha-power law MOSFET model and its applications to CMOS inverter dealy and other formulas," IEEE J. Solid-State Circuits, vol.25, no.2, pp.584–594, April 1990.
- [6] A.P. Chandrakasan and R.W. Brodersen, Low power digital CMOS design, Kluwer Academic Publishers, 1995.
- [7] W. Lee and C. Hu, "Modeling CMOS tunneling currents through ultrathin gate oxide due to conduction- and valence-band electron and hole tunneling," IEEE Trans. Electron Devices, vol.48, no.7, pp.1366–1373, July 2001.
- [8] Y. Yeo, T. King, and C. Hu, "MOSFET gate leakage modeling and selection guide for alternative gate dielectrics based on leakage considerations," IEEE Trans. Electron Devices, vol.50, no.4, pp.1027– 1035, April 2003.
- [9] H. Kawaguchi, G. Zhang, S. Lee, and T. Sakurai, "An LSI for VDD-hopping and MPEG4 system based on the chip," Proc. International Symposium on Circuits and Systems, vol.4, pp.918–921, May 2001.
- [10] K. Flautner, S. Reinhardt, and T. Mudge, "Automatic performance setting for dynamic voltage scaling," Proc. 7th Annual Intl. Conf. on Mobile Computing and Networking, pp.260–271, July 2001.
- [11] Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose, "Microarchitectural techniques for power gating of execution units," Proc. International Symposium on Low Power Electronics and Design, pp.32–37, Aug. 2004.
- [12] H. Ohira and K. Kawakami, "A feed-forward dynamic voltage control algorithm for low power MPEG4 on multi-regulated voltage CPU," IEICE Trans. Electron., vol.E87-C, no.4, pp.457–465, April 2004.
- [13] M32700  $\mu$ T-Engine, http://www.renesas.com/

#### **Appendix: Proof of Lemma 2**

**Lemma 2:** If a function P = P(f) satisfies Eqs. (7), three points  $(f_A, P(f_A))$ ,  $(f_B, P(f_B))$  and  $(f_C, P(f_C))$ ,  $(f_A > f_B > f_C)$  on P = P(f) always satisfy Eq. (A·1).

**Table A**·1 Increase and decrease of D(f).

| f                  | fc | • • • | $f_D$ | • • • • | $f_A$ |
|--------------------|----|-------|-------|---------|-------|
| $\frac{dD(f)}{df}$ | +  | +     | 0     | -       | -     |
| D(f)               | 0  | ^     | Max.  | /       | 0     |



**Fig. A**· 1 Auxiliary figure.

$$(P(f_A) - P(f_B)) \cdot (f_B - f_C)$$
  
>  $(P(f_B) - P(f_C)) \cdot (f_A - f_B)$  (A·1)

**Proof:** The straight line which passes two points  $(f_A, P(f_A))$  and  $(f_C, P(f_C))$  is expressed as Eq. (A·2).

$$P = \frac{P(f_A) - P(f_C)}{f_A - f_C} (f - f_A) + P(f_A)$$
 (A·2)

From Eq. (A·2), difference D between the straight line di and curve line di is represented as Eq. (A·3).

$$D(f) = \frac{P(f_A) - P(f_C)}{f_A - f_C} (f - f_A) + P(f_A) - P(f) \quad (A \cdot 3)$$

The first and second derivative of D(f) are obtained as Eqs. (A·4), (A·5).

$$\frac{dD(f)}{df} = \frac{P(f_A) - P(f_C)}{f_A - f_C} - \frac{dP(f)}{df}$$
 (A·4)

$$\frac{d^2D(f)}{df^2} = -\frac{d^2P(f)}{df^2} \tag{A.5}$$

Equation (A·5) indicates  $\frac{dD(f)}{df}$  is monotone decreasing function so that the equation  $\frac{dD(f)}{df}=0$  have at most one solution. Additionally, Cauchy's mean value theorem assures the existence of the solution  $f_D$  on an open interval  $(f_C, f_A)$ . Therefore, increase and decrease of D(f) is obtained as Table A·1. Since Table A·1 indicates D(f) is always positive on the open interval  $(f_C, f_A)$ , the value on the line di at  $f_B$  is above point  $(f_B, P(f_B))$ . Hence Eq. (A·6) about four areas in Fig. A·1 is approved.

area acge > area abfe, area fhl
$$j$$
 > area ghl $k$  (A·6)

Now, Eq.  $(A \cdot 7)$  is approved about the area abfe and the area fhlj. Therefore, the area acge > the area ghlk is approved.

area abfe = area fhlj  $(A \cdot 7)$ 

$$= \frac{(P(f_A) - P(f_C))(f_A - f_B)(f_B - f_C)}{f_A - f_C}$$
 (A·8)



Kentaro Kawakami received the B.E. degree in electrical and information engineering in 2002 and the M.E. degree in electronic and information system in 2004 from Kanazawa University, Ishikawa, Japan. He transfered his school from Kanazawa University to Kobe University in 2005. He is currently a Ph.D. candidate at Kobe University, Kobe, Japan. His research interests include power efficient image codec VLSI.



Miwako Kanamori received the B.E. degree in electrical and infomation engineering from Kanazawa University, Ishikawa, Japan, in 2003. She researched RISC based image codec LSI, and received the M.E. degree in electronic and infomation system from Kanazawa University, Ishikawa, Japan, in 2005. Currently she is working for Hitachi-Omron Terminal Solutions Corporation in Japan.



Yasuhiro Morita received the M.E. degree in Electronics and Computer Science from Kanazawa University, Ishikawa, Japan, in 2005. He is currently working in the doctoral course at the same university. His current research interests include high-performance and low-power multimedia VLSI designs.



Jun Takemura entered Information and Systems Engineering at Kanazawa University, Ishikawa, Japan in 2000. He received the B.E. degree in Information and Systems Engineering from Kanazawa University, Ishikawa, Japan, in 2004. He is currently working in the M.E. cource at the same university. His current research is low-power multimedia VLSI designs.



Masayuki Miyama was born on March 26, 1966. He received the B.S. degree in computer science from University of Tsukuba in 1988. He Joined PFU Ltd. in 1988. He received the M.S. degree in computer science from Japan Advanced Institute of Science and Technology in 1995. He joined Innotech Co. in 1996. He received the Ph.D. degree in electrical engineering and computer science from Kanazawa University in 2004. He is a research assistant in the Department of Electrical and Electronic Engi-

neering at Kanazawa University. His present research focus is low power design techniques for multimedia VLSI.



Masahiko Yoshimoto received the B.S. degree in electronic engineering from Nagoya Institute of Technology, Nagoya, Japan, in 1975, and the M.S. degree in electronic engineering from Nagoya University, Nagoya, Japan, in 1977. He received a Ph.D. degree in Electrical Engineering from Nagoya University, Nagoya, Japan in 1998. He joined the LSI Laboratory, Mitsubishi Electric Corp., Itami, Japan, in April 1977. From 1978 to 1983 he was engaged in the design of NMOS and CMOS static RAM in-

cluding a 64 K full CMOS RAM with the world's first divided-word-line structure. From 1984, he was involved in research and development of multimedia ULSI systems for digital broadcasting and digital communication systems based on MPEG2 and MPEG4 Codec LSI core technology. Since 2000, he has been a Professor of the Dept. of Electrical and Electronic Systems Engineering at Kanazawa University, Japan. Since 2004, he has been a Professor of the Dept. of Computer and Systems Engineering at Kobe University, Japan. His current activity is focused on research and development of multimedia and ubiquitous media VLSI systems including an ultra-low-power image compression processor and a low power wireless interface circuit. He holds 70 registered patents. He served on the Program Committee of the IEEE International Solid State Circuit Conference from 1991 to 1993. In addition, he has served as a Guest Editor for special issues on Low-Power System LSI, IP, and Related Technologies of IEICE Transactions in 2004. He received the R&D100 awards from R&D Magazine for development of the DISP and development of a realtime MPEG2 video encoder chipset in 1990 and 1996, respectively.