Dynamic supply voltage scaling (DVS) is one of the best ways to reduce the energy consumption of a device when there is a super-linear relationship between energy and supply voltage, and a pseudo-linear relationship between delay and supply voltage. However, most DVS schemes scale the clock frequency of the supply-voltage-clock-scalable (SVCS) CPU only and do not address the energy consumption of the memory. The memory is generally non-supply-voltage-scalable (NSVS), but its energy consumption is variable to its clock frequency and the total execution time. Thus, DVS for an SVCS CPU cannot achieve an optimal system-wide energy saving without consideration of the memory, as far as it is controlled by an SVCS CPU.
INTRODUCTION
Lossless energy reduction for digital systems is based on power management during slack times. Dynamic Power Management (DPM) shuts down unused devices, while taking into consideration the energy overhead for shut down and wake up. Voltage-mode CMOS logic devices generally allow supply voltage scaling, with a super-linear relationship between the energy consumption and the supply voltage, and a pseudo-linear relationship between the delay and the supply voltage. Dynamic supply-voltage scaling (DVS) is one of the best ways to utilize slack time for energy saving in the case of supply-voltage-clock-scalable (SVCS) devices [1] of this sort.
Most DVS schemes proposed so far only consider SVCS CPUs. Certainly, many devices include current-mode circuits that do not allow supply voltage variation. And, even if a device allows DVS, there is no energy gain unless there is a super-linear relationship between the supply voltage and the energy consumption. Furthermore, most I/O bus standards specify the logic swing, and any scaling of the logic swing would violate the standard. Consequently, DVS is only effective on the SVCS CPUs. So far, most DVS research elaborates on novel scheduling algorithms based on a frequency assignment which ignores the memory.
The effect of non-supply voltage-scalable(NSVS) memory technology on DVS has lately been demonstrated [2] . As the complexity of systems increases, even in battery-operated applications, the energy consumption of the CPU is no longer dominant. Unfortunately, scaling down the supply voltage and thus the clock frequency of the CPU decreases the slack times of synchronous NSVS memory, which may even increase the energy consumption of those devices. Thus, existing DVS techniques can easily fail to achieve system-wide optimization of energy consumption. Applying DPM to NSVS components can mitigate the energy overhead caused by the extended execution time resuling from DVS. The merger of DPM and DVS within a predictive power control strategy has therefore been proposed as an approach to system-wide optimization [3] . More specifically, a recent approach to DVS takes the memory access time into account when the CPU frequency is determined [4] , and a power-aware memory system using DPM can significantly improve the energy economy of NSVS memory when DVS is applied to an SVCS CPU [5] .
Although the combination of DVS and DPM overcomes the practical limitations of DVS to an extent, it does not guarantee a system-wide optimal configuration in terms of energy consumption. In this paper, we aim at the more limited goal of a system-wide energy-optimal frequency assignment for an SVCS CPU and a synchronous NSVS memory, when both of them are awake. For this purpose, we derive the relationship between energy consumption and clock frequencies for the SVCS CPU and the synchronous NSVS memory, based on the both the static and dynamic energy consumption. We prove that there exists an energy-optimal pair of clock frequencies for the SVCS CPU and the synchronous NSVS memory, and that is a function of the number of CPU clock 
PRINCIPLE OF MEMORY-AWARE OF FREQUENCY ASSIGNMENT

Problem statement
Up to now, there has been no proper strategy introduced to determine the clock frequency of a synchronous NSVS memory which is controlled by a SVCS CPU, subject to low energy consumption. Mostly, DVS schemes for an SVCS CPU simply ignores the memory and assume that the execution time and energy consumption of the system are solely determined by the CPU. A recent scheme does consider the proportion of total execution time occupied by memory access, and use this to pursue a lower clock frequency and hence a lower supply voltage for the CPU [4] . However, existing schemes do not consider how the energy consumption of the memory varies with the clock frequency change of the CPU.
For real cases, the memory does contribute both to the execution time and to the energy consumption. Even if we do not scale the supply voltage of synchronous NSVS memory, its energy consumption does change with clock frequency due to leakage energy during the active state and dynamic energy during the idle state. (The detailed energy consumption of a synchronous NSVS memory will be discussed in Section 2.2.) Thus, when we assign a clock frequency to the CPU, we must also consider the clock frequency of the memory. Since both the CPU and the memory clock frequencies affect the execution time, a pair of the energy-optimal frequencies must be derived together.
Our method of clock frequency assignment determines a pair of energy-optimal frequencies for the CPU and the memory that minimize the energy consumption of the entire system, with a given application task and hardware configuration. The supply voltage of the synchronous NSVS memory is fixed, and the supply voltage of the SVCS CPU can then be determined by its clock frequency. We assign the lowest possible supply voltage that ensures reliable operation. Even if the energy-optimal clock frequencies are not feasible due to deadline miss, we can find feasible suboptimal values. Our technique is applicable to any synchronous devices controlled by the CPU, but we focus on synchronous memory for consistency (but without loss of generality). Our frequency assignment technique is also applicable to an NSVS CPU. Since the mathematics underpinning our method is quite involved, we will define all the symbols used in advance, in Table 1 .
Energy consumption of a synchronous NSVS memory
To address the energy-aware memory clock frequency assignment, we first derive generalized energy consumption equations for a synchronous NSVS memory. The active-mode energy of the memory is primarily determined by the number of memory transactions. The total number of external memory transactions is given by
If the CPU is designed to stall when there is an cache miss, we assume that the total execution time of a task is largely given by
without loss of generality. Fig. 1 shows a generalized energy model of a synchronous NSVS memory. The total idle-to-active dynamic energy is determined by the number of external memory transactions, which is independent of the clock frequencies of the CPU and the memory, such that
We assume an auto-precharge mode [6] that immediately sends the SDRAM to idle mode after every burst-mode access. An auto-precharge scheme is common in battery-operated systems. The total precharge energy is given by
Current-mode circuits exhibit a distinct active-mode static current but negligible leakage energy during idle mode. The active-mode static energy is directly proportional to the length of time that the circuit remain in the active mode. It is determined by the number of external memory transactions, the clock period of the SDRAM and the number of memory clock ticks required for a burst-mode transfer. Thus the total active-mode static energy is given by
Simple energy models usually consider the sum of three energy components, E AD + E PCD + E AS , without distinguishing them from each other, and usually ignore the dynamic energy consumption during idle mode as well. However, a synchronous 
memory has distinct dynamic energy consumption during idle mode, unlike an asynchronous memory, because of the necessity for clock propagation during the idle mode. SDRAM is a slave device, and thus idle mode implies ready state keeping an eye on an external request. The total idle-to-idle dynamic energy consumption is determined by the number of idle clock cycles, which is in turn determined by
This explains why an unnecessarily fast memory clock frequency increases the number of idle clock cycles and thus the total idleto-idle dynamic energy. As the CPU clock frequency increases, the total execution time, τ exe , decreases, and thus less idle-to-idle dynamic energy is consumed. On the other hand, the total idlemode static energy is solely determined by the duration of the idlemode:
As the CPU clock frequency increases, the total idle-mode static energy decreases. If τ exe < τ d , then both the CPU and the memory should be better to be in power-down mode to avoid unnecessary energy consumption. The energy overhead for the power down and wake up, E PDD + E W D , is proportional to the number of power-down operations. We assume that the slack time, τ d − τ exe , is located at the end of task execution, and that there is one power-down and one wakeup operation. We can then derive the dynamic energy required to enter and leave the power-down mode as follows:
During the slack time, τ d − τ exe , the SDRAM stays in power-down mode. During power-down mode, the SDRAM consumes a small amount of static energy:
Note that E PDD + E W D + E PDS is not significant. Finally, the total energy consumption of the SDRAM is given as follows:
).
(10) 
ENERGY-OPTIMAL FREQUENCY AS-SIGNMENT
Energy-frequency relation of a synchronous NSVS memory
In this section, we see how the synchronous NSVS memory behaves as the clock frequency changes. As we vary f m for a given f c , E m responds as shown in Fig. 2 . This variation is primarily caused by a trade-off between E ID and E AS . The total energy consumption, E m , is convex, and thus there exists an energy-optimal clock frequency.
We will now derive the clock frequency of a synchronous NSVS memory that corresponds to the least energy consumption for a given number of CPU clock cycles, a given number of memory transactions, a given CPU clock frequency, and the parameterized energy model. Theorem 1. The energy consumption of a synchronous NSVS memory has a minimum, corresponding to the energy-optimal value of f m such that
Proof: Since E m is convex function, we derive the optimal memory clock frequency from the derivative equation of f m .
It is remarkable that the optimal f m is proportional to root square of f c . If CPU frequency increases four fold, we can easily achieve memory energy optimal to increase memory frequency two fold.
The SVCS CPU energy model
So far, we have examined the energy-frequency relation of a synchronous NSVS memory. This section describes the assignment of frequencies to the CPU and the memory in order to achieve systemwide energy optimum. We assume that well-designed clock gating is implemented in the CPU, and thus that the CPU consumes only negligible energy during a stall state, without loss of generality. Thus, To avoid divergence of the context, we will assume that there is no leakage energy in the CPU (Fig. 3 (a) ). But we can easily accommodate leakage as a function of the duration of the active state of the CPU, which makes the cost function a little more complex ( Fig. 3 (b) ). Allowing for leakage energy in the CPU makes the total energy, E t , more convex, and so the optimal frequency pair ( f c , f m ) will exhibit even better energy performance compared with previous frequency assignment techniques. We will discuss E t in Section 3.4. [4] . This approach may not be considered to use the slack time properly, and thus a new frequency assignment is suggested, which takes into account the execution time of the memory [4] , as shown in Fig. 4 (b) . In this case, the frequency assignment yields τ # exe = τ d and thus a lower value of f # c , such that f # c < f c , is feasible.
Frequency assignment ignoring the memory energy
Frequency assignment considering the memory energy
The system-wide energy optimal frequency assignment is not shown in Fig. 4 (a) or in Fig. 4 (b) , because both of these schemes try to minimize E cpu only. In addition, τ exe = τ d does not imply the energy-optimal frequency assignment due to the convex energy behavior of a synchronous NSVS memory [5] . As E cpu and E m are functions of f c and f m , and τ exe is a function of f c and f m (Eqs. 2, 10 and 12) we can prove that there exists a system-wide energyoptimal frequency pair ( f c , f m ). Since f c and f m are cross-coupled, we need to derive both f c and f m together, as shown in Fig. 4 (c) .
The total system energy required by the CPU and the memory, E t , is given by
Note that, E t is also convex because it is a linear combination of convex functions. We have convex contours of E t over f c and f m , as shown in Fig. 5 . There exists a unique pair ( f c , f m ) corresponding to the minimum value of E t . However, we also need to verify that this pair is feasible. There are constraints for a feasible ( f c , f m ) pair:
The f c constraint:
The constraints on f cmax and f mmax are determined by the device data sheet. Since we only scale the clock frequency of the synchronous NSVS memory by fixing its supply voltage, there is virtually no minimum f m . In effect, there exists a possible minimum f m determined by the dynamic logic structure; but is low enough to ensure the feasibility of the optimal solution. However, even an SVCS CPU has a pretty high f cmin . Since we scale the supply voltage for the SVCS CPU, it is not necessary to reduce the clock frequency beyond V dd = k f c as far as the total energy is concerned.
We do not consider Lagrange multiplier due to complexity though an optimization problem with constrains can be solved by Lagrange multiplier in general. Instead, we derive the solution by dividing the problem into several subproblems. The constraints must all be satisfied, and form four boundary conditions as shown in Fig. 5 . Except for the deadline constraints, the boundary conditions are trivial. If the optimal ( f c , f m ) pair exists in the feasible solution space, Theorem 2 derives the optimal solution. We calculate the optimal ( f c , f m ) by solving a differential equation.
Theorem 2. A system equipped with an SVCS CPU and a synchronous NSVS memory has a system-wide energy-optimal
and
if they are included in the feasible solution space determined by Eq. 14 ( Fig. 5) .
Proof:
The optimal ( f c , f m ) pair is derived from the simultaneous equations
= 0 and
We can derive the optimal f m by Theorem 1, and it is given by
Replacing f m by the solution of Eq. 17 in the first equation, we obtain
This simplifies to
and the positive real root of Eq. 19 is the optimal f c .
Since Eq. 15 and the following polynomial equations are too complex to have closed-form solutions, we use a numerical technique.
Unfortunately, we may often encounter situation that the ( f c , f m ) obtained from Theorem 2 is not included in the feasible solution space. In this case, a sub-optimal pair ( f c , f m ) can be located on the boundary of the feasible solution space. To obtain this, we need to derive local optimal ( f c , f m ) pairs on all the boundary conditions and choose the global minimum. Thus, E t is given by
The optimal f c corresponding to the solution of
EXPERIMENTAL RESULTS
We evaluate system-wide energy differences among the frequency assignment schemes in Figs. 4 (a), 4 (b) , 4 (c) and a frequency assignment without the DVS. The target system consists of a 32bit RISC processor and Micron 128Kbit SDRAMs and no other peripherals for plain benchmark. We assume that the CPU is SVCS obeying Eq. 12, and it operates at maximum 400MHz and minimum 200MHz consuming 640mW with V dd = 2V and 160mW with V dd = 1V, respectively (αC cpu k 2 = 1 × 10 − 17nJ/Hz 2 ). The CPU is assumed to have the ARM instruction set architecture and separate 8KB I-cache and D-cache. We compose 32-bit data width with four Micron SDRAMs, i.e, M m = 4. The default f m is 66MHz, which is the most popular setting for most battery-operated systems. Frequency assignment of previous DVS changes f c while keeping f m = 66MHz, and frequency assignment without the DVS implies that f c = f cmax and f m = 66MHz. Table 3 summarizes energy consumption of the Micron SDRAM acquired by accurate cycle-true measurement [9] . We use Table  3 to derive energy-optimal ( f c , f m ) pairs from the theorems and the corollaries. Once we derive energy-optimal ( f c , f m ) pairs, we then perform trace-driven cycle-by-cycle energy simulation with real applications, with the derived ( f c , f m ) pairs. Thus the derived ( f c , f m ) pairs may not be true optimal due to modeling error, but this justifies that the benchmark is not biased at all.
We select three embedded applications, an MPEG4 decoder at 20 frames/sec, a JPEG de-compressor with 96% utilization and an MP3 decoder with 320Kbps streaming. Table 4 summarizes the performance of the frequency assignment techniques. Because E m even increases over 20%, DVS with conventional frequency assignments exhibit only around 10% of the system-wide energy reduction, E t , although they reduce E cpu up to 70%. Note that the target system is equipped with only minimum numbers of SDRAMs for 32-bit data width. Our frequency assignment, however, shows additional 50% system-wide energy saving over previous frequency assignment schemes though the CPU energy consumption is higher than the conventional frequency assignments.
The optimal pair ( f c , f m ) for the MPEG decoder locates around (250MHz, 40MHz). Since the JPEG de-compressor has only 4% of slack time, DVS may not appear to be useful. In effect, DVS with conventional frequency assignments reduces f c only by 5%. However, our scheme achieves 20% reduction of f c by increasing f m instead. Although the system-wide energy reduction is still minor for the JPEG de-compressor, our frequency assignment achieves 2X to 3X energy reduction over previous frequency assignments. The MP3 decoder has very low memory utilization, and thus feasible energy-optimal f m is 7MHz. The energy-optimal f c is f cmin . We eventually have pretty much slack time even with the DVS. Our frequency assignment shows 34% additional system-wide energy reduction over previous frequency assignment schemes, for the MP3 decoder.
CONCLUSION AND FUTURE WORK
This paper is the first attempt to derive an energy-optimal frequency assignment of a synchronous non-supply-voltage-scalable (NSVS) memory and a supply-voltage-clock-scalable (SVCS) CPU for dynamic voltage scaling (DVS). We derive the energy optimal CPU and memory clock frequencies based on accurate energy model including active-mode leakage and idle-mode dynamic energy. We formulate analytical models of the CPU and the memory energy consumption, and derive generalized solutions for various feasibility constraints such as the minimum and the maximum clock frequencies and the deadline of a task. Our frequency assignment enhances previous DVS and results in additional 50% system-wide energy reduction over previous frequency assignment schemes.
We can improve the energy gain by adding a DPM scheme to NSCS memory derives if applicable. SDRAMs and RAMBUS DRAMs can be good candidates. We will include this complementary DPM for NSVS devices and CPU leakage power as future work.
