Increasingly power-hungry processors have reinforced the need for aggressive power management. Dynamic voltage scaling has become a common design consideration allowing for energy efficient CPUs by matching CPU performance with the computational demand of running processes. In this paper, we propose Interaction-Aware Dynamic Voltage Scaling (IADVS), a novel fine-grained approach to managing CPU power during interactive workloads, which account for the bulk of the processing demand on modern mobile or desktop systems. IADVS is built upon a transparent, fine-grained interaction capture system. Able to track CPU usage for each user interface event, the proposed system sets the CPU performance level to the one that best matches the predicted CPU demand. Compared to the state-of-the-art approach of user-interactionbased CPU energy management, we show that IADVS improves prediction accuracy by 37%, reduces processing delays by 17%, and reduces energy consumed of the CPU by as much as 4%. The proposed design is evaluated with both a detailed trace-based simulation as well as implementation on a real system, verifying the simulation findings.
Introduction
Today's CPUs are manufactured with support for energy management through Dynamic Voltage Scaling (DVS). DVS allows software to dynamically reduce CPU voltage, which in turn reduces the CPU's energy consumption, since it is proportional to the square of the voltage, i.e. E ∝ V 2 . However, as the voltage is decreased, the maximum operating frequency is also reduced, resulting in a reduction in performance. Fortunately, maximal CPU performance is usually unnecessary for meeting performance expectations. For example, the performance of memory or I/O bound tasks may not be noticeably degraded when the CPU's operating frequency is reduced. Furthermore, the perceived performance of real-time applications such video players, games, or teleconferencing applications is not affected by varying CPU performance as long as the CPU provides the minimum performance required to maintain perceptual continuity for the user. Therefore, the key to transparently providing energy efficient CPU operation is in accurate prediction of upcoming CPU demand and in providing the minimum required performance level that will meet the upcoming demand without introducing observable delays.
One challenge of efficient energy management lies in increasing the energy efficiency of the entire system rather than just an individual component. Introducing any execution delays through component-wise energy management is usually detrimental to the entire system. If the CPU runs slowly, the entire system may stay on longer, nullifying the energy saved by the CPU, or even increasing the energy consumption of the entire system. However, executing all tasks at the highest performance level is not necessary and may not reduce system delays, as is the case with interactive and real-time tasks. Therefore, it is critical to distinguish between the tasks that do not prolong program execution and those tasks that impose delays on the whole system when executed at lower performance levels. The former tasks can be executed at lower CPU performance settings to save energy, while the latter must be executed at the highest performance setting to minimize the systemwide performance degradation and energy consumption.
Another challenge in designing efficient energy management is providing the energy optimization transparently. In mobile and desktop systems, a majority of the demand placed on the system is in direct response to user input. To accurately correlate user interactions with performance demand, it is necessary to obtain a fine-grained interaction and execution context in complex GUI environments without requiring any user involvement or application modifications [5, 6] .
In this paper, we propose Interaction-Aware Dynamic Voltage Scaling (IADVS), a highly accurate mechanism for matching CPU frequency to task demands and users' performance expectations. Compared to the existing coarsegrained approaches to interaction capture [12] , IADVS's fine-grained interaction capture yields highly accurate predictions of upcoming performance levels demanded by tasks invoked by interactions with specific elements of the GUI. Ultimately, the high prediction accuracy results in energy-efficient executions of the upcoming tasks. In this paper we: (1) Identify and quantify the need for fine-grained task classification; (2) Integrate mouse and keyboard interaction capture mechanisms; (3) Propose IADVS, a CPU energy management mechanism; (4) Perform a detailed simulation of IADVS and compare with existing state-of-the-art DVS mechanisms; (5) Implement IADVS and evaluate its impact on the energy consumed by the CPU, as well as the entire system, on real hardware.
Background

Interactive Task Classes
The majority of system actions in mobile and desktop systems are direct responses to user interactions. At the highest level, tasks can be classified into: 1) tasks that require the highest performance level; and 2) tasks that can be executed at lower performance levels to improve energy efficiency without impacting the system performance. Slowing down performance-oriented tasks prolongs execution, exposes delays to the user, and potentially increases energy consumption since the entire system spends additional time processing a given task [16] . However, users spend a majority of time performing low-level tasks, such as typing or reading, that do not require high CPU performance.
The user's response time while interacting with an application is dictated by the perception threshold. It has been shown that the average perception threshold ranges between 50-100ms [14] , a significant length of time for modern CPUs. Interactive task execution times shorter than the perception threshold are likely to be imperceptible to the user. Further, when a task completes execution early, the system must idly wait for the user to initiate the next task. Tasks that complete before the perception threshold is reached do not impact the speed with which the user interacts with an application, and therefore do not prolong the application's execution time. Subsequently, we treat the perception threshold as the deadline for processing interactive tasks. perception threshold can be executed at a lower CPU performance level, extending their execution time up to, but not beyond, the perception threshold. The lower CPU performance level improves energy efficiency while meeting the user's expectations of interactive performance. By preserving the perception deadline, we prevent the users from noticing any impact on the system from energy management mechanisms. Subsequently, user behavior is unaltered and we can focus on performance and energy metrics without the need for controlled user studies.
Transparent Task Categorization
The accuracy of the IADVS predictor is dependent on the information gathered at application runtime and the granularity of the captured context. For example, Figure 1 shows the main toolbar of the GNU Image Manipulation Program (GIMP). When interaction context includes only the main window components [12] , as shown in Figure 1A , individual interactive elements are aliased to the containing region and the individual tasks initiated by these elements are therefore indistinguishable to the predictor. On the other hand, the fine-grained interaction context shown in Figure 1B allows the predictor to correlate the CPU demand level of each interactive element. The fine-grained context resolution results in lower classification variability and highly accurate predictions.
The variability in the CPU demand following interactions on individual UI components is shown in Table 1 . We have selected several popular features that a user may invoke in GIMP. All of the interactions belong to a single menu region and are classified into the same interaction group as shown in in Figure 1A . For example, classifying a margin adjustment (35 million cycles) and the inverting of image colors (934 million cycles) into a single category results in a high standard deviation (303 million cycles). However, when the menu region is broken down to its constituent elements, as shown in Figure 1B , we obtain a much lower standard deviation for the associated CPU demand.
Due to space limitations, we summarize the differences between categorizing tasks using specific interactive elements and categorizing tasks using coarse-grained window components. 
IADVS Design
The key observation motivating IADVS is that there exists a strong correlation between user interactions and the CPU workload required by the triggered tasks. We exploit this correlation to transparently transition the processor to a power/performance state that meets the task's performance demand while saving energy. The following are the components of the proposed IADVS mechanism: (1) High-detail and low-overhead monitoring mechanism for detecting the tasks triggered by user interactions; (2) Low-overhead correlation and classification of interactions and the triggered CPU tasks; (3) Fully online training and prediction for accurately determining the desired CPU frequency for the upcoming tasks; (4) Support for multi-core CPUs.
Distinguishing UI Interactions
The IADVS system relies on accurate capture and categorization of individual user interactions. Like the previous fine-grained capture mechanisms developed for energy management [5, 6] , IADVS utilizes a monitoring layer (X Monitor shown in Figure 2 ) between the X Window Server and applications in Linux, and utilizes the GUI window structure to identify each interactive component (such as each button or menu component) with a unique integer ID. As designed, the interactive applications connect to the X Monitor, which in turn passes unmodified interaction data and window change requests between the X Server and the client applications. The unique IDs used by IADVS are generated from the interactive element's enclosing window and the element's position in the application's window tree as well as its relative position within the containing window. Subsequent interactions with a particular interactive element generate the same interaction ID, and the same categorization for the task to follow.
We further augment the X Monitor to collect detailed keystroke information from the keyboard. Keystrokes are captured and distinguished by a unique identifying number for each key and key combination on the keyboard. In addition, we also record the type of the UI event for both keyboard and mouse, such as a button-press or button-release. Mouse clicks are also distinguished as left, right, or middle button press, or a double click of any of the mouse buttons.
Mouse-driven interactions with the GUI uniquely identify the application with its context, due to the structure of each application's GUI layout. Keyboard IDs are generic and not unique to an application. However, combinations of mouse interaction IDs and keyboard interaction IDs result in a unique context description for a given application. The combination of the capture mechanisms forms a highly accurate system whose operation is entirely transparent to the user.
Detecting UI-triggered Tasks
The fundamental issue in energy management for CPUs is the frequency of power state switching. Frequent switching can often carry significant transition overheads, while infrequent switches may result in disparate tasks being executed at a CPU frequency that either do not meet performance demand or do not provide the desired energy savings. Subsequently, we select tasks triggered by user interactions as the main switching granularity. We define a task to be the sequence of operations by one or more threads that accomplish the same goal [7, 12] . Specifically, when a thread processing an event causes a new event to occur, the new event is said to be dependent on the first event and is counted as a part of the current task. A trigger event is the first event in a series of events that is not dependent on any other events. Finally, a task is defined as the collection of all CPU processing, including its trigger event and all of the dependent events. The UI monitoring system identifies the application from user interactions, isolating the tasks initiated by the running application shortly following an interactive event.
We define UI-triggered tasks as those tasks that are preceded by UI events which occur when the UI controller (the X Window server in Linux) receives a mouse click or a keystroke. UI-triggered tasks include not only the handling of the device interrupt, but also the processing required to respond to the UI event. For example, when a user clicks the "blur" button in GIMP, the triggered task includes both the processing of the associated application functionality as well as the processing of the mouse interrupt event. The task ends when the operating system's idle process (swapper process in Linux) begins running, but the UI-triggered task is not blocked by I/O, or when the appli- cation receives a new user interface event [12] . It should be noted that the time a task spends waiting for I/O is not included in the total task duration in the analyses that follow. Likewise, task duration excludes preemption periods due to background daemons, such as the Name Server or NFS, since these processes are not dependent on the task that is being monitored.
Correlation Mechanisms
When a UI event initiates a task, IADVS begins monitoring the system activities, recording the duration of CPU time for each relevant process. Once task completion is detected, the task's CPU demand is represented by the task length at the CPU's maximum frequency, which is computed as the sum of the processing time of each event during the task execution. To accurately measure the length of a task, we utilize the high-resolution Time Stamp Counter CPU register. At task completion, the corresponding entry in the prediction table is updated as shown in Figure 2 . IADVS utilizes a hash table indexed by user interaction IDs as the prediction table. Prediction table entries consist of three fields: the interaction ID of a UI event, a weighted sum SU M of previous task lengths, and a weighted count COU N T of the observed task instances triggered by the same ID.
We use the Aged-α method to record the history of the past tasks' CPU demand triggered by the same interaction ID. Aged-α utilizes all past tasks, with the kth most recent having weight α k . Hence, it emphasizes the most recent tasks, permitting a quicker response to changes in the task's CPU demand, while smoothing anomalous task behavior at the same time. For each update with the most recent task's length L, both SU M and COU N T are calculated using their previous values as follows:
Therefore, IADVS need only keep SU M and COU N T in the table entry to record the CPU demand of all past tasks.
Predicting CPU Power Modes
Each time a UI event occurs, IADVS performs a prediction table lookup using the captured interaction ID. The lookup results in two possible outcomes: (1) the entry is found with the weighted sum (SU M ) of previous task lengths and weighted count (COU N T ) of previous tasks; or (2) the entry is not found, indicating that the interaction ID has been seen for the first time. If the entry is found, IADVS first predicts the task length for the upcoming task as the average task length L avg , simply calculated as SU M divided by COU N T . The resulting frequency F remains constant for the duration of the task, or until the the perception threshold is reached, and is calculated as follows:
The CPU's frequency setting that best matches F (equal to or just higher) is predicted, and the frequency and voltage of the CPU are set accordingly. If F is greater than maximum available frequency, it indicates that the earlier tasks triggered by the same interaction ID are usually longer than the perception threshold, and thus IADVS selects the maximum available frequency to run the task. To minimize performance degradation due to mispredictions, where a task continues past the perception threshold at a frequency lower than maximum, IADVS sets the CPU to the maximum frequency immediately upon reaching the perception threshold deadline.
When an interaction ID is first seen by IADVS, a corresponding entry in the prediction table does not yet exist, requiring a heuristic for setting the CPU frequency for the upcoming task. Selecting the maximum frequency eliminates performance degradation during initial training, at the cost of a potential, but relatively small, increase in energy consumption. On the other hand, selecting the minimum frequency prevents the increase in energy consumption, while introducing the potential for higher delays. Since one of the goals of IADVS is to minimize the impact of energy management on the user, we select the CPU's maximum frequency during initial training, eliminating delays that may otherwise be apparent to the user.
Managing Multi-core CPUs
Modern systems utilize multi-core CPUs to provide more processing power for execution of concurrent application or processes. Subsequently, any CPU management mechanisms have to be able to work for multi-core CPUs. Current multi-core CPUs allow per-core frequency settings, but voltage settings affect the entire chip. The CPU voltage is set to match the highest frequency setting of any of the cores. For example, if one core is running at 3GHz the chip's voltage is set to support that frequency. As a result, energy savings are limited since the majority of the savings come from reduced voltage levels. regulators remain external to the chip, adjusting core voltages independently will not be possible. However, on-chip regulators considerably increase chip complexity, so it is more reasonable to expect voltage adjustments to be made to banks of cores. IADVS design is independent of chip design since it sets the frequency only, and the voltage levels are determined by the maximum frequency that one of the cores is operating at. The collection of a task's execution context happens on a per-process basis and is not tied to any specific core. The task frequency settings are tied to the particular task and if this task is rescheduled on another core the frequency level is set accordingly. Therefore, the simplicity of the IADVS design allows it to work on multi-core systems without special considerations or redesign.
Methodology
To evaluate the proposed IADVS mechanism, we developed a trace-driven simulation of IADVS and the following DVS mechanisms:
• STEP. An interval-based mechanism which runs each task by starting at the minimum frequency, and stepping up the next available frequency following a fixed time interval. A 10ms interval length is chosen to meet the average task demand. • APS. A task-based mechanism that predicts the upcoming CPU demand for all UI-triggered tasks, but does not distinguish any of them [7] . • SPACE. A task-based mechanism which separates UI-triggered tasks by the UI events based on the key pressed and the type of window component a mouse event occurred in. SPACE is essentially the PACE-CA mechanism [12] , but simplified to apply a constant frequency for each task. • ORACLE. A task-based mechanism that utilizes future knowledge to select the optimal operating frequency for the task, fitting it to the deadline or running at maximum frequency if the task is longer than the deadline. We assume that each DVS mechanism transitions the CPU to the minimum frequency immediately following a task, when the CPU begins idling. The traces used in the simulation were collected using a modified Linux kernel (2.6.27.5-117) running on an AMD Phenom II X4 940 with 4GB RAM. All traces contain data from a large number of typical-usage sessions of a single user in the GNOME 2.20.1 environment. Application traces are composed of two components: the user interface event trace, and the process activity trace. Process activity traces are collected by using Linux Trace Toolkit (LTTng) that logs program execution details from a patched Linux kernel. The traces include all significant OS events: system calls, thread swap events, disk/network I/Os, and process information. These events can provide in-depth knowledge about a running process, including when context switches occurred, or how much time a process's task spent executing or waiting for I/O. Based on the process information and the ordered event timestamps, our simulator can rebuild the dynamic execution progress of the traced application and analyze the effects of applying DVS.
We use six applications commonly executed on desktop or mobile systems: AbiWord-a word processing application, Gnumeric-a spreadsheet application, Scigraphscientific data plotting software, Eclipse-an Integrated Development Environment for C/C++ and Java development, GIMP-image processing software, and LiVES-an integrated video editing and playback toolkit.
Trace details, such as the total length of the trace duration, the number of UI-triggered tasks, and number of unique interaction IDs, are included in Table 3 . Also included are the counts of unique interaction IDs broken down into mouse clicks and keyboard strokes, which serve as an indicator of the GUI complexity for each application.
Through experimentation with the application traces, we found that when using the "Aged-α" method for computing a weighted mean, the weight α of 0.95 generates the highest prediction accuracy and thus the best overall results. We also use a perception threshold of 100ms as the deadline for UI-triggered tasks in our experiments.
Processor Model
We used the AMD Phenom II X4 desktop CPU [1] to study the implementation of the proposed mechanisms (the CPU specifications are summarized in Table 4 in Section 6). Similarly to the previous findings [7, 12, 13] , experiments are constrained by the small number of available CPU operating states. The four available power states are insufficient to efficiently match the CPU performance to task demand. Therefore, we simulated a custom CPU model based on AMD Phenom II with the frequency range from 800MHz to 3.0GHz, supporting the intermediate frequencies in 250MHz increments. We'll refer to this model as the Full Frequency Range CPU (FFRCPU). With FFR-CPU, we assume that voltage, V , is linearly proportional to the corresponding frequency f :
We further leverage the property of power consumption P being proportional to V 2 f and assume that the CPU consumes 40.8W at the maximum frequency which is the same as the measured power consumption for AMD Phenom II. We will use FFRCPU to evaluate the described DVS mechanisms in the following section. Figure 3 shows the distribution of all UI-triggered tasks based on the optimal CPU frequency required to meet the deadline for the FFRCPU model. It is clear that the processing demand of tasks varies across applications. In APS  SPACE  IADVS  APS  SPACE  IADVS  APS  SPACE  IADVS  APS  SPACE  IADVS  APS  SPACE  IADVS  APS  SPACE AbiWord, Gnumeric and Eclipse almost 90% of the tasks are low-demand workloads easily meeting the deadline at frequencies lower than 1.55GHz. This is due to the user spending most of the time typing, editing, and formatting.
On the other hand, due to graphics and video processing, GIMP and LiVES exhibit high-demand CPU workloads, where over 50% of the tasks require an operating frequency of 1.80GHz or above. Lastly, Scigraph's task distribution is balanced, exhibiting both high and low CPU demands in similar measure. The task's CPU demand distribution provides useful insights into the remaining analysis.
Evaluation
Accuracy
Upcoming task demand prediction accuracy is the key metric that largely determines the actual performance and energy efficiency of task-based DVS mechanisms. We compare the prediction accuracy of different task-based mechanisms to the ORACLE mechanism, which has future knowledge of upcoming task length. The prediction distance from the predicted frequency to the optimal frequency is the number of frequency settings between them. A distance of 0 means that the frequency is correctly predicted, i.e. the same as ORACLE's prediction. Distances other than 0 indicate that the frequency is mispredicted and not optimal. Since there are 10 different frequency settings in FFRCPU, the range of distance values is between -9 and 9, inclusive. For instance, if a frequency of 800MHz is predicted but the optimal is 1.3GHz, the prediction distance is -2. A negative distance indicates that the predicted frequency is lower than required, resulting in a missed deadline and the associated delay. A positive distance indicates that the frequency is higher than required, leading to excessive energy use. Figure 4 shows the aggregate prediction distance for each task-based mechanism over the six applications. Intuitively, the best predictor has the largest fraction of tasks at the optimal frequency. IADVS predicts the exact frequency for 38% of the tasks, resulting in accuracy that is 36% and 34% higher than APS and SPACE, respectively. In addition, Figure 4 shows that IADVS contains the fewest tasks in both the positive and negative distance ranges, meaning that it mispredicts the associated frequencies less often than the others. IADVS, therefore, has the potential to save the most energy while incurring the lowest delays among the three mechanisms. Figure 5 shows the absolute prediction error of each predictor for each workload. The absolute error sums up the absolute values of all the prediction distances in Figure 4 . We use APS's prediction error as the basis for normalizing the errors of the other mechanisms. The prediction error for each mechanism is further broken down into four parts: the cold key and mouse error due to the default frequency selections for the first seen keystroke/mouse-click tasks, and the normal key/mouse errors due to the mispredictions for recurring keystroke/mouse-click tasks. APS generates the highest prediction error since the UI-triggered tasks are indistinguishable from one another. SPACE reduces APS's key error by 75% while retaining a comparable error rate for mouse clicks. IADVS improves SPACE by distinguishing the specific interactive components, reducing SPACE's mouse error by 54%. IADVS's higher granularity generates more task categories and thus results in a higher cold prediction error. Despite this cold training overhead, IADVS significantly reduces overall prediction errors by 47% from APS and 37% from SPACE.
Delay
Performance is an important factor for evaluating the effects of DVS mechanisms on interactive applications, since users do not tolerate excessive delays for the UI-triggered tasks. Figure 6 shows the total runtime for each application and for each mechanism normalized to ORACLE's. Any task that runs longer than its runtime in ORACLE exposes additional delays to the users.
Frequency stepping from the minimum in STEP results in long delays due to the time taken to ramp up to the maximum frequency for tasks longer than the perception threshold, which dominate in GIMP and LiVES. On average, STEP incurs 16%, 42% and 81% more delay than APS, SPACE, and IADVS, respectively. The best performance of IADVS mirrors its highest accuracy in predicting the CPU frequency, as shown in Figure 4 . IADVS has the fewest tasks mispredicted with a lower frequency than is required (the negative range in Figure 4 ). In Scigraph, IADVS achieves the most improvement with 54% and 52% less delay than APS and SPACE, respectively. Note that Scigraph's interaction patterns frequently interleave short-duration and long-duration tasks. Lack of categorization of user interactions, as in APS, or categorization with coarse resolution, as in SPACE, fails to correctly classify these tasks. In AbiWord, Gnumeric and Eclipse, a few long-duration mouse-click tasks are mixed with the majority short-duration keystroke tasks. IADVS accurately recognizes these task patterns and predicts the appropriate frequency for long-duration tasks so that it reduces APS's delay by 37%. SPACE, by distinguishing mouse-clicks and keystrokes, performs relatively well, incurring only 17% more delay than IADVS. In GIMP and LiVES, the majority of long-duration tasks accompany with occasional short-duration tasks. Lacking sufficient user interaction details, APS and SPACE only recognize the long-duration pattern, and aggressively apply higher CPU frequency even for short-duration tasks. Their delays are relatively smaller, but at the cost of increased energy consumption as we will see in the next section. IADVS in these two applications performs comparably to APS and SPACE, but saves more energy. Figure 7 compares the energy consumption of the studied mechanisms, normalized to ORACLE's energy consumption. Each DVS mechanism compared here is made to transition the CPU to the minimum frequency immediately when the CPU begins idling. Our focus is on energy consumed by the CPU when it is engaged in performing some task related to the application being interacted with. Subsequently, the energy consumed during CPU idling is the same across all mechanisms, and is excluded from the total energy consumption in Figure 7 .
Energy
IADVS achieves the best energy efficiency among taskbased mechanisms, reducing the energy consumption of APS and SPACE by 2% and 3% on average. In best cases of GIMP and LiVES, where APS and SPACE over-predict the CPU frequency for short-duration tasks, IADVS reduces energy consumption by 5% and 4% respectively. STEP consumes comparable energy as IADVS, due to its starting from minimum frequency and the gradual stepping up. However, STEP trades energy efficiency for a significantly degraded execution performance, as shown in Figure 6 , which may result in increased energy consumption by other components since the entire system has to stay on longer.
Energy-Delay 2 Product
To simultaneously compare both energy and performance we require a metric that combines both. Energydelay product (ED) is one such metric, but since E ∝ V 2 and T ∝ 1/V , ED product can be minimized by simply reducing the voltage at the expense of performance. A better metric is the energy-delay 2 (ED 2 )product, which to the first order is independent of voltage. A DVS mechanism with a lower ED 2 product provides higher performance than another at equivalent energy levels, or consumes less energy at the same performance level. Figure 8 shows the ED 2 product for studied mechanisms normalized to ORACLE. Idling energy and time are excluded from the execution's total energy consumption and runtime.
We observe that IADVS achieves the best improvement of the ED 2 product, outperforming STEP, APS and SPACE by an average of 6%, 5%, and 4%, respectively. STEP has the longest execution time and thus cannot significantly offset its delays by lowering energy consumption. It is not surprising to see that STEP performs the worst, especially in GIMP and LiVES where frequency adjustments for computationally intensive tasks occur gradually, introducing delays. SPACE improves APS by 0.1% on average, implying that distinguishing tasks by coarse categories of window components is not adequate. In the best case, Scigraph, IADVS achieves as much as 9% and 10% improvement over APS and SPACE, respectively.
In summary, IADVS achieves the best execution performance at the cost of the lowest energy. Therefore, we conclude that increasing the resolution of GUI event capture and fine-grained classification of tasks is worthwhile, since higher prediction accuracy translates to a significantly lower energy-delay 2 product.
The Need for FFRCPU
As discussed in section 4.1, our evaluation of DVS mechanisms included a realistic CPU model, whose available frequency settings proved to be too limited for an effective DVS mechanism implementation. To illustrate this, Figure 9 shows the ED 2 product for each DVS mechanism as applied to the real CPU model of AMD Phenom II X4, shown in Table 4 . Here, the mechanism names include the prefix AMD to distinguish them from their counterparts in the FFRCPU model used thus far. The results are normalized to the FFRCPU's ORACLE mechanism. Comparing Figure 8 and Figure 9 clearly shows that the limited number of frequency settings in the AMD model prevents the efficient matching of CPU frequency to tasks and as a result, the ED 2 products are degraded for all mechanisms. We note that even with less frequency settings, IADVS still outperforms the other mechanisms. Comparing the ORACLE mechanism using the FFRCPU and the AMD model, we find a 4% increase in ED 2 product using the AMD model. Therefore, it clear that if we want to take full advantage of the proposed IADVS as well as the other existing DVS mechanisms, we need wider frequency-voltage scaling than is currently available in the CPUs. 
Implementation
In this section, we compare the efficiency of the standard on-demand mechanism in Linux (MAX), which essentially races-to-idle: using the maximum frequency for each task and switching to minimum when idle, with the implementation of IADVS mechanism. We first describe implementation and measurement details, followed by the experimental results.
Implementation Details
The experiments were conducted on a desktop computer with Fedora 10 64-bit kernel (2.6.27.5-117), running on a AMD Phenom II X4 940 quad-core CPU, and 2x2GB DDR2-1066 RAM. The Phenom II CPU comes with the latest Cool'n'Quiet (C'n'Q) 3.0 power saving technology, that provides four performance states (p-state) for each CPU core. The frequency-voltage transition for one core is performed through the p-state transition of that core. Table 4 shows the combination of the operating voltage and frequency for each of the evaluated p-states.
As shown in Figure 10 , the power consumption of the CPU was measured using a NI PCI 6230 DAQ recording the voltage drop across a 0.01Ω resistor inserted in the CPU power supply. System power consumption was measured with a WattsUp power meter. The collected measurements are shown in Table 4 .
Results
IADVS has fewer transitions than MAX, eliminating the switching overheads as well as selecting intermittent frequencies to better match the task's demand. We measured transition overheads of performing a p-state transition on the AMD Phenom II CPU. The observed overhead is around 50us, from the time when the DVS mechanism sent a transition request to the time when the CPU completed the transition. For each task, MAX has to transition the p-state twice: one to maximum to run the task and the other to minimum when idling. IADVS, due to its ability to accurately predict the task demand and the corresponding performance level, uses much less transitions, which in turn saves more energy and transition delay. Finally, higher utilization of lower frequency levels in IADVS provides higher energy savings than MAX without extending the application execution time. Table 5 shows the energy consumed by the CPU. With MAX, the CPU consumed more energy simply due to the maximum power consumption at the maximum frequency and voltage. IADVS, by predicting the CPU performance rather than using the maximum, improves the energy efficiency of MAX significantly. IADVS saves on average, 6% of the energy from MAX. Note that the energy numbers presented in Table 5 also include the extra energy consumption due to the frequency-voltage transitions that are not related to the application execution. The extension in execution time in case of IADVS was less than 1% for the studied applications; therefore we can expect that energy savings from the CPU will extend to the rest of the system.
Related Work
DVS can be controlled by the OS alone, or by compiler inserted directives, or by a combination of both. OS level techniques observe application/task behavior with respect to CPU utilization and memory references, construct statistical models and apply voltage/frequency schedules based on these models [4, 18] . Detecting regions of low CPU utilization and inserting instructions to direct the switching of voltage/frequency levels [10, 11, 15, 19] . Dynamic approaches involved utilizing application checkpoints and correlating them to memory and CPU behavior at runtime via the OS [2] . The costs of transitioning between frequency states have been quantified [3] since early DVS work developed the threshold-based CPU powerdown based on immediate history of idle periods [17] , then extended by the work that followed to include a lenghtier history [8] and improve the accuracy of the approach.
Task-based DVS mechanisms, on the other hand, characterize the CPU workload rather than the idle times, resulting in improved accuracy. These can be classified into Table 5 . Energy consumed by the CPU.
two categories, the intertask and intratask variants. The intertask variants assign a speed for a task duration. Utilization of the perception threshold allowed deadline specification in interactive applications [7] . Finally, the intratask variants of task-based DVS adjust the processor speed and voltage within tasks. Run-time voltage hopping [11] splits each task into fixed length timeslots and assigns the lowest speed that allows it to complete within its preferred execution time. PACE [13] and Stochastic DVS [9] choose a speed for every cycle of task execution based on the probability distribution of the task workload measured over previous cycles. Both approaches contain simplifying assumptions not guaranteed to hold in real systems.
Conclusions
We proposed and evaluated Interaction-Aware Dynamic Voltage Scaling (IADVS), a novel CPU voltage scaling mechanism that takes advantage of a human physical limitation, the perception threshold, to extend the runtime of interactive tasks by varying the CPU frequency without exposing new latencies to the user. We performed a detailed evaluation of a CPU managed by IADVS through simulation and verified our findings with an implementation and measurements on actual hardware. By relying on a transparent and robust fine-grained interaction capture system, IADVS improved CPU task demand prediction accuracy by 37% over the current state-of-the-art mechanism, reduced processing delays by 17%, and demonstrated a further 4% improvement in the CPU energy consumption. The analysis suggests that CPU energy management would benefit from per-core voltage settings in multi-core systems, as well as a wider range of hardware voltage settings for new CPUs.
Acknowledgments
This material is based upon work supported by the National Science Foundation under Grant No. 0844569.
