Among the techniques for system-level power management, it is not currently possible to guarantee timing constraints and have a comprehensive system model supporting multiple components at the same time. We propose a new method for modeling and selecting the power modes for the optimal system-power management of embedded systems under timing and power constraints. First, we not only model the modes and the transitions overhead at the component level, but we also capture the application-imposed relationships among the components by introducing a mode dependency graph at the system level. Second, we propose a mode selection technique, which determines when and how to change mode in these components such that the whole system can meet all power and timing constraints. Our constraint-driven approach is a critical feature for exploring power/performance tradeoffs in power-aware embedded systems. We demonstrate the application of our techniques to a low-power sensor and an autonomous rover example.
Introduction
Recent trends in mobile and autonomous embedded systems are giving rise to a new class of power-aware systems. Unlike low-power systems, whose goal is to minimize power usage, power-aware systems are more general in that they must make the best use of the available power by adapting their behavior to the constraints imposed by the environment, user requests, or their power sources. Power-aware systems must use components that are capable of multiple modes of operation. Many of these components offer modes for power management, while other components allow the user to control the voltage or frequency as other forms of power modes. The selection of mode is thus the primary means of controlling power usage, and it is often done in conjunction with scheduling.
New off-the-shelf components are offering increasingly sophisticated modes for power management. However, the system-level power manager has only limited control over the modes. Some modes can be set by writing commands to a control register of a device. However, the power manager may not be able to arbitrarily select the modes it wishes at all times. It may be forced to wait or request a change through a sequence of intermediate modes. Even if a desired mode is available, changing mode can incur nontrivial overhead both in terms of time and power. The overhead translates into penalty in performance or power, and it can cause a system to miss an important deadline.
Another key issue for power management is that mode selection cannot be done in isolation. The choice of mode in one component must be coordinated with that in other components, or else the whole system may not function correctly. For example, if the mode selection involves a particular encoding scheme, then the rest of the system that depends on the data representation must also change mode in order to handle the encoding correctly.
It can be difficult for designer to track details with modes. The problem is further exacerbated by the fact that the number of components and the available modes are increasing rapidly. Today's methodologies either limit the complexity by using only a small subset of the available modes (e.g., on, sleep, off), or they are unable to guarantee timing or power constraints.
Power management of embedded systems must consider all components in the system. Significant power reduction in one components may not translate into desirable power reduction for the whole system. In mission critical applications, peripheral devices including mechanical and thermal devices can actually dominate power consumption and must be an integral part of power management.
We believe that a new methodology for mode modeling and selection is sorely needed in order to effectively manage the power of the next generation embedded systems. This paper first introduces a new mode dependency graph for modeling the enabling relationships among modes within a component and between components in a system. Second, this paper presents a new mode selection algorithm that produces a mode schedule that satisfies timing and power con- 40  60  80  100  120  140   t3   t1   20  40  60  80  100  120  140   t3   t2   t1   1W   3W   R2   R3   t2   Pmax   (a)   R1   R2   R3   20  40  60  80  100  120  140   t3   t1   20  40  60  80  100  120  140   t3   t1   2W   4W   6W   t1   t2   Pmax   t2   (b)   R1   R2   R3   20  40  60  80  100  120  140   t3   t1   20  40  60  80  100  120  140   t2 straints on multiple processors and devices. It takes advantage of the mode dependency graph in effectively pruning the search space, making it practical to incorporate into an on-line power manager. The advantage with our constraintdriven approach is that it is not hardwired to a specific objective such as power minimization. This is a crucial feature for power-aware embedded systems, for which the ability to make power/performance tradeoffs is more important than just power reduction.
This paper is organized as follows. Section 2 reviews related work. Section 3 presents the mode dependency graph, while Section 4 describes a mode selection algorithm that takes advantage of mode dependency modeling. We discuss the experimental results in Section 5.
Related Work
Many low-power techniques have been developed at all levelsFor system-level designs, since the components are largely off-the-shelf or already designed, the applicable techniques include dynamic voltage scaling (DVS) and dynamic power management (DPM).
Dynamic Voltage Scaling (DVS)
Developed for variable-voltage processors, DVS can achieve significant energy saving while still enabling the processor to continue making progress [12, 2] . Although DVS means running slower, they typically slow down just enough without violating timing constraints, and many are based on real-time task scheduling cores [2, 8, 9, 7] .
It has been shown that maximal energy saving is achieved by running the processor at the slowest possible constant speed, rather than running tasks at full processor speed and changing the processor to a lower power mode when idle. Hong et al [2] proposed a heuristic for scheduling real-time tasks on a variable voltage processor. Shin [8] exploited both execution time variation and idle time intervals for fix-priority tasks. Shin's algorithm in [9] determines the lowest maximum processor speed for each job to achieve power reduction. Quan and Hu [7] further greedily determine the lowest voltage for a set of tasks to achieve more energy savings.
What these DVS techniques have in common is that they are greedy and assume a single processor. A power-aware embedded system, however, consists of multiple resources, which may be one or more processors and peripheral devices. Unfortunately, greedy DVS techniques are not generalizable to multiple resources under power constraints, as shown in the following example. Fig. 1 (a) shows a Gantt chart (top) and the power profile (bottom) for a system with three resources: R 1 is capable of voltage scaling, while R 2 and R 3 are not. The task t 1 on R 1 has a deadline at 110. The system has a max power constraint of 3W. Furthermore, the behavior of the application dictates that R 1 and R 3 be co-active. Co-activation means the execution of one task requires the power consumption of other dependent services or tasks. A simple example is that when the CPU is running, it imposes a co-activation dependency on the memory, but co-activation can be much more general between sets of tasks. Fig. 1(b) shows the schedule and the power profile obtained by greedily slowing down R 1 . Even though all timing constraints are satisfied, it violates power constraints and it is not minimum energy. When it is stretched out, t 1 overlaps t 2 during time 70-110, and their total power exceeds the max power constraint. It is not minimum energy due to the co-activation dependency between R 1 and R 3 : the energy saving by R 1 due to voltage scaling is more than offset by R 3 , whose execution is prolonged by R 1 .
Example: (DVS fails in multi-resource)
The optimal schedule and power profile are shown in Fig. 1 (c). Resource R 1 is slowed down without overlapping t 2 on R 2 . No max power is violated. Although t 3 is stretched with t 1 and therefore consumes more energy than in Fig. 1(a) , t 1 saves even more energy due to voltage scaling of resource R 1 . As a result, the system achieves minimal energy while satisfying all constraints. Fig. 2 summarizes the energy costs. Another problem not highlighted with this example is that mode changes may incur nontrivial power or timing overhead. If so, overhead must be considered in determining the feasibility of the mode schedule.
Luo and Jha [5] presents scheduling for multiple processing elements by reordering tasks and applying voltage scaling in this post-processing step after scheduling. Our approach is similar in that it can also be a post processing step, and handles precedence and timing constraints, but we treat power as a hard constraint. Furthermore, we handle co-activation and other mode-dependency relationships.
Dynamic power management (DPM)
Previous work on DPM mainly aimed to achieve power reduction by predicting the system idle time or event distribution and shutting down resources when idle. The simplest power management policy is time-out based on a fixed or predicted amount of time before the system's shutdown or power-up [3] . Stochastic model [1] is used to address the uncertainty in system behaviors. DPM techniques can be effective for minimizing energy and time penalties on average, but they have several limitations. First, most treat either power or timing as an objective or penalty, rather than a constraint. In real systems, the max power is a real, hard constraint, whose violation can lead to malfunction. Second, they have not considered inter-component dependency in a system, with the exception of Qiu, Qu and Pedram in [6] , which models multiple service providers and their Generalized Stochastic Petri Net (GSPN) model can capture some dependencies among resources. However, their model is mainly for the request/dispatch behavior of servers rather than dependency among the servers themselves.
Our new approach, mode selection, combines the advantages of existing approaches. It is entirely constraint driven, enabling us to make power/performance tradeoffs without hardwiring any specific goal or policy in the algorithm.
Modeling Resource Dependency
Selecting (or not selecting) a mode of a resource may impact the modes that other resources are allowed to select. The impact may be co-activation, exclusion, enabling, and many other possible types of dependency. These dependencies may be extracted from application level specifica-tions or policies for safety, security, fault-tolerant, powersaving.In any case, a legal mode combination of the resources is one that respects all of these dependencies, and a feasible mode combination is one that is legal and satisfies all the constraints (namely timing and power). We use a data structure called the mode dependency graph (MDG) that enables efficient generation of legal mode combinations in an order that facilitates the search for feasible combinations that are also low cost.
Definitions
An edge (m, n) ∈ H γ represents a mode change from mode m to mode n. We define the timing and energy function for a mode change as:
where M γ is the set of modes of resource γ, and T , E are time and energy, respectively. The average power can be obtained from energy and time information.
Definition 2 (Power and delay functions) Power
consumption of a resource γ is represented as a function, π, mapping from power mode to a power number. Formally, π : M γ → R + . Delay of a mode transition is defined as a function, δ, mapping from start mode and end mode of a transition to a delay number. Formally, δ :
Mode Dependency Graph
A mode dependency graph (MDG) G(M, D) characterizes the inter-resource dependency relationships, where M = γ∈Γ M γ is a set of vertices representing power modes, and D is a set of edges standing for dependencies. A vertex is represented by a circle with a label in the format of "γ.m," where γ ∈ Γ is a resource and m ∈ M is a mode of the resource. If two vertices have the same labels, we considered them identical.
The value of a vertex v ∈ M is defined as:
if γ is in other mode, Undetermined if γ has not been selected a mode.
(1) An edge in the MDG represents dependency between two modes. Suppose an edge 
In other words, if |u| is True, |v| MAY be True; but if |v| is False, |u| MUST be False. For example, we represent the dependency between a CPU and a memory chip such that the memory is on only if the CPU is in active mode.If the CPU is not in active mode, then the memory must not be on. If both of the above conditions are met, we say that the CPU and the memory satisfy the mode dependency. Otherwise, they violate the mode dependency. Fig. 3 summarizes the conditions that (do not) violate the mode dependency.
To expand the capability of mode dependency graph, we introduce the logic operators as another kind of vertices. An operator vertex is represented by a square with an operator label in it. For the operator vertex with multiple outgoing edges, the direction combines disjunctively, and the ⇐ direction combines conjunctively. For example, a vertex u, whose value |u| is True, points two vertices v 1 and v 2 . If either v 1 or v 2 , or both, are True, then they satisfy mode dependency. When v 1 and v 2 are both false, they violate mode dependency. The value of an operator vertex can be obtained by evaluating the logic function it represents. We define the operators AND, OR and X OR. The functions of the operators follow the normal boolean functions in the same names except when any input is "undetermined," the output is "undetermined." Given an MDG, a resource γ and one of its mode m, we can use the routine in Fig. 4 to check whether mode m satisfies the MDG.
Generating Mode Combinations
This section shows how to efficiently generate legal mode combinations using the MDG.
We transform and reduce an MDG to a resource list. The purpose is to sequence the resources so that the modes of a resource do not depend on those of the succeeding resources. From the MDG, we shrink each operator vertex to a point, and remove mode name in each mode vertex. We then remove the redundant vertices and edges, break the cycle by removing one edge in the cycle, and apply topological sort to obtain a resource list.
If the MDG is acyclic, then legal mode combinations can be generated by a special version of topological traversal. Starting from the first resource of the list, we check modes of each resource against the MDG and identify the legal modes. We keep them and select one for the current resource γ, and move to the next resource. We are able to determine a mode of γ because upon checking the resource, all the modes of its dependent resources have been already determined since they are all located before γ. We progressively generate a mode combination as we check legality of modes and select one at each resource. As we reach the end of the list, we obtain a legal mode combination. We enumerate the rest of legal modes at the end resource, backtrack to previous resources, and enumerate their legal modes to generate other legal mode combinations.
Note that there may be cycles in an MDG, which implies that in the resource list obtained above, modes of a resource may depend not only on preceding resources, but also on succeeding resources. We call such resources dirtyresource. In this scenario, we keep track of which resources the current resource γ are dependent on. When the modes of all dependent resources are determined, we evaluate a mode of γ to determine whether the mode satisfies the MDG. Fig. 6 shows the detailed algorithm, which is the general case for both acyclic and cyclic MDGs.
Example: Microsensor
A microsensor system is a node in a distributed microsensor network [10] . It consists of a sensor, a processor, memory chips, radio frequency module and other auxiliary parts. The microsensor obtains information from environment and sends processed data to a base station. The sensor and the memory each has two modes, on and off. The processor has three modes, active, idle and sleep. The radio has three modes, transmit-and-receive (tx rx), receive-only (rx), and off. There are a total of 36 mode combinations for these components.
The behavior and dependencies of the devices in this system can be derived from high-level power management policies: the sensor and the radio may be both off only if the processor is in sleep mode; either of them may be on only if the processor is in sleep mode or idle mode; both the sensor and the radio may be on only if the processor is in active mode; the memory is on if and only if the processor is active. Fig. 5(a) shows the MDG of the microsensor.
Using the MDG, our algorithm automatically generate eight mode combinations that satisfy the given MDG (see Fig. 7 ). Suppose we want the microsensor to work in a proactive way: when it is off, the system can only be waken up by the sensor when it senses information from environment. The radio cannot wake up the system, for example, by receiving a remote command. We add another item "the radio may be on only if the sensor is on" (in dashed box in Fig. 5(a) ) to the MDG in Fig. 5(a) . Then we run our algorithm on the new MDG and obtain five mode combinations (without * in Fig. 7 ). This result exactly matches the mode combinations in manually designed results [10] .
Through this simple example, we show our algorithm is able to systematically generate legal mode combinations, and by editing the mode dependency graph, we can obtain mode combinations without manually going through all possible mode combinations.
Mode selection works as a post-processing stage after scheduling. It validates and improves the schedule with more architectural knowledge than the scheduler. Our approach is a constraint-driven search algorithm that considers resource/task dependency and mode change overhead, and tries to find a mode schedule that satisfies system timing and power constraints.
Problem Statement
The input to the problem consists of a set of tasks X , a schedule σ, a mode dependency graph G, power constraints P max and P min , and timing constraints represented by constraint graph G c [4] . The output is a mode schedule σ that meets system power and timing constraints by means of legal mode combinations.
Definition 4 (Task x ∈ X )
A task x is defined by a tuple (τ x , ω x ), where τ x is a task identifier, and ω x ∈ Ω is the workload of the task. In the context of this paper, we assume each task x has already been mapped to a resource γ. The operation delay d x and power profile P x (t) of a task x depend on the workload ω x and the selected modes m of resource γ.
Depending on the nature of the resource, workload ω x can be the number of cycles for a processor, the number of atomic actions for a device, e.g., the number of steps for a step motor, or simply the time to perform a task.
Definition 5 (Schedule σ)
A schedule σ maps each task to its start time. An idle interval with respect to a schedule σ and a resource γ is a time interval during which no task is scheduled to run on γ. Note that during an idle interval, the resource can still consume nonzero power, depending on the mode.
Definition 6 (Mode schedule σ )
A mode schedule σ maps each task x ∈ X (which is mapped to resource γ) to the task's start time and a mode m ∈ M γ , where X = X ∪ X o . X o is a set of overhead tasks, which are inserted whenever there is a mode change on a given resource.
A mode schedule σ is feasible if all mode combinations are legal (Section 3) and all timing and power constraints are satisfied at all times: where t end is the overall schedule length, and P min and P max are the minimum and maximum power constraints, respectively. The reason for a minimum power constraint has been discussed elsewhere [4] . It can be used for not only power/performance tradeoffs but also for jitter control.
Algorithm
Our search algorithm contains a loop with two steps. First we find modes for tasks that satisfy task dependency and timing constraints. Second we determine modes for the idle intervals on each resource. Note that after the first step, the operation delay for certain tasks may be changed due to certain mode selected (i.e., modes of different clock rate due to voltage scaling) or task dependency. An advantage of selecting task modes and idle interval modes separately is that we can apply different kinds of system constraints, which help prune out illegal mode combinations efficiently. We reorder the modes for each resource by their power consumption in increasing order and search from the smallest one. By doing so we both speed up our search process and find solutions very close to the energy-optimal solution. The top level algorithm is shown in Fig. 8 
Selecting modes for tasks
We select modes for tasks by generating legal mode combinations of tasks that satisfy the MDG. Note that the MDG used for a schedule may be a mix of resource dependency and task dependency, which represent time-invariant and time-variant dependency of resources. For example, in Fig. 9 , the sub-graph in the dashed box represents resource dependency, whereas the rest of the graph shows the task dependency. We can still use the algorithm introduced in last section to generate legal mode combinations of tasks. Once a legal mode combination is determined, we can obtain a new schedule since the operation delay of tasks become known under their selected modes and under their coactivation dependency. We check timing constraints for the new schedule. If it fails, we generate another legal mode combinations and check again; if it passes, we use the mode combination for mode selection of idle intervals.
Selecting modes for idle intervals
On each resource, overhead may exist at the mode changes.We find a set of modes for each idle interval such that the time overhead of the mode changes is less than the length of the idle interval. We treat overhead as additional tasks to the schedule we obtained. We characterize those overhead tasks with time and average power, which can be derived from time and energy information. We decompose the new schedule into time intervals such that within each time interval there is no task event (start or end event). The decomposition is done in the following way: We find the start and end events of all tasks. All the events cut the time axis into non-overlapping segments. Each segment forms a time interval. We check system power constraints in each time interval. If the schedule fails power constraints, we attempt a mode change on resources that currently have an idle interval, and check power constraints again. If all the modes fail the power constraints, we backtrack to the previous time interval. If we backtrack to the beginning of the schedule and still cannot find feasible modes, we attempt the next legal mode combination and select modes for idle intervals again. Figure 11 . A mode schedule for microrover.
Experimental Results
We apply our algorithm to an example based on the Mars rover [11] . The rover travels on the surface of Mars to perform scientific experiments and shoot images. Its resources consist of a camera (CAM), scientific devices (SCI), a radio-frequency modem (RF), a microprocessor (PPC), a hazard detector (HAZ), driving motors (DRV) and steering motors (STR). CAM takes a picture, sends the picture data to PPC for processing, PPC outputs to RF, and then the rover moves to another location (HAZ, DRV, STR) to perform scientific experiments (SCI, PPC, RF).
PPC can work at a number of different clock rates (with a full speed of 500MHz) and can be set to doze, nap or sleep modes. RF can be in rx only mode, tx-rx mode and sleep modes. The other resources have only two modes each, on and off. Mode-change overhead is significant for some resources. Due to the low temperature on Mars, DRV must be pre-heated for some time before turned on. Similar reason applies to STR, RF, and SCI. The inter-resource relationships are shown in Fig. 9 . For example, when HAZ is working, neither DRV nor STR should be working. RF may be in tx-rx mode if and only if the processor is operating. Fig. 11 shows a feasible mode schedule, in both time view and power view. Task ppc2 on PPC cannot be further slowed down because PPC and RF must be co-active. If PPC is greedily slowed down, it will violate max power constraint during the interval 500 -560. Task drv1, haz1 and str1 are not overlapped due to the system requirement specified in the mode dependency graph. STR and SCI need significant time to pre-heat, which is adequately considered (the light areas in their tracks). Idle interval between r f 1 and r f 2 on RF is set to rx only rather than off because the timing overhead of mode changes (including pre-heating) is larger than the length of the interval. The idle interval before r f 1 is set to rx only for the same reason.
We compared our algorithm with two other approaches: approach one assumes only two modes, on and offapproach two greedily applies voltage scaling technique whenever possible (we allow power constraint violation in this approach). The results are shown in Fig. 10 . Approach one gives the worst results because it never utilizes available modes. Approach two is better than approach one since it saves energy by applying voltage scaling technique, but its greediness pays the cost since its saving by slowing down the processor is more than offset by the extra energy consumed on the RF modem. And in all the scenarios, approach two violates max power constraint. Our algorithm gives the best results because we utilize multiple modes of resources and apply voltage scaling on the processor. At the same time, we avoid extra energy cost on RF by identifying co-activation dependency between the two resources and performing mode selection to find the feasible solution.
Conclusions
This paper presents a method for capturing mode dependency and an algorithm for mode selection in poweraware embedded systems. The mode dependency graph introduced in this paper enables legal combinations of modes to be systematically derived. Today's designers perform this task manually. However, as components offer increasingly sophisticated modes for power management, while at the same time imposing even more restrictions on mode changes, the complexity will grow quickly beyond what humans can handle. Our MDG represents a structured approach to controlling the complexity of power management. We also present a search algorithm that takes advantage of the MDG. By considering power/timing constraints and overhead on transitions, this technique gives designers more confidence in the feasibility of the synthesized results in real-life applications. Furthermore, our algorithm incorporates heuristic ordering to optimize for the energy cost of the solution, and it shows realistic, system-level improvements over previous techniques that either do not handle constraints or multiple components.
