A Power Capping Controller for Multicore Processors by Almoosa, Nawaf et al.
A Power Capping Controller for Multicore Processors
N. Almoosa, W. Song, Y. Wardi and S. Yalamanchili
School of Electrical and Computer Engineering
Georgia Institute of Technology
{nawaf, wjhsong}@gatech.edu, {ywardi, sudha} @ece.gatech.edu
Abstract— This paper presents an online controller for
tracking power-budgets in multicore processors using dynamic
voltage-frequency scaling. The proposed control law comprises
an integral controller whose gain is adjusted online based on
the derivative of the power-frequency relationship. The control
law is designed to achieve rapid settling time, and its tracking
property is formally proven. Importantly, the controller design
does not require off-line analysis of application workloads
making it feasible for emerging heterogeneous and asymmetric
multicore processors. Simulation results are presented for
controlling power dissipation in multiple cores of an asymmetric
multicore processor. Each core is i) equipped with the controller,
ii) assigned a power budget, and iii) operates independently
in tracking to its power budget. We use a cycle-level multi-
core simulator driven by traces from SPEC2006 benchmarks
demonstrating that the proposed algorithm achieves a faster
settling time than examples of a static setting of the controller
gain.
I. INTRODUCTION
The effective and efficient control of power and energy
has become central to the design and management of modern
computer systems. It is no longer just the domain of embed-
ded and mobile devices but is as important for enterprise-
class data centers and internet-server farms that can consume
tens of megawatts of power [1]. The prevailing design
methodology has been to design processors and systems
based on peak power dissipation which is based on worst-
case application workloads. This approach has a number of
undesirable consequences. For example, a data center’s cool-
ing capacity and peak processor power dissipation together
determine the density of servers that can be placed within
the facility. However, rarely are all servers operating at peak
power or utilization, and consequently the center is over-
provisioned with a lower average performance per square
foot. Similarly, the packaging cost of a multicore processor
is determined by the target peak power dissipation. Higher
package costs incurred by high peak power dissipation targets
increases the cost of the processor even though peak loads
may rarely occur in practice. Furthermore, as power densities
increase at future technology nodes we will see an increase
in the relative inefficiency of designing systems based on
worst-case workload and peak power dissipation. This trend
is unsustainable.
Researchers have observed that reducing the peak power
dissipation design target leads to relatively little drop in
execution performance reflecting the non-linear relationship
between power and execution performance. However, suit-
able controls must be in place to prevent a processor from
exceeding the power dissipation target in the unlikely event
of a workload spike that would increase the power dissipation
beyond this design target leading to disruptive failures. Con-
sequently, to improve the cost-effectiveness of the systems
several on-line control techniques have emerged that adjust
system parameters to limit power consumption [2], [3], [4]
(see also references therein). These techniques are based on
dynamic scaling of the voltage and/or clock frequency for
controlling the power dissipated by a processor in order to
limit it to a certain value called the power cap. For example,
the authors in [4] proposed a feedback controller for capping
the power of voltage islands in chip multiprocessors, whose
parameters are derived based on extensive off-line system
analysis under various workload conditions.
Similarly, for data-center applications [2] proposed a pro-
portional controller based on a linear system-model. To the
best of our knowledge, [2] contains the first analysis of stabil-
ity, convergence rate, and robustness of a control system for a
blade server level power capping controller. The objective in
that paper is to regulate the power dissipated by blade servers
in order to have them track given reference values. The plant,
comprised of the frequency-power relationship, is assumed to
be linear and memoryless, and this assumption is backed by
simulation tests for some specific workloads under a variety
of conditions. The controller, relating the error (difference
between the power-reference value and the actual power) to
the frequency, is an integrator scaled by the reciprocal of
the plants gain.1 If the plants gain is known exactly, the
control algorithm will converge in a single step, namely the
power will be equal to its reference value in a single iteration.
However, the gain may not be known exactly, and therefore
the paper carries out convergence and robustness analysis
to establish asymptotic convergence of the control law and
stability margins in terms of bounds on the modeling error.
The work reported here has been motivated by [2], but it
considers a more general model where the frequency-power
relationship is nonlinear and the core architecture may no
longer be homogeneous. Thus, we extend the results in [2] in
the following ways - (i) The frequency-power characteristics
are convex and not linear, and (ii) the controller is an
integrator with a variable gain as opposed to a constant gain.
1Reference [2] defines the controller as the relationship between the
incremental error between consecutive samples and the corresponding
incremental power measures; as such the controller is proportional (linear).
However, viewed as the relationship between error and power, the same
controller is actually an integrator.
We then consider the particular case of cubic polynomials
for the plant, supported by physical modeling, and show
that the controller’s gains are computable in an adaptive
fashion based only on measurements but not on any off-
line analysis. This is a significant advantage since unlike
previous techniques, this controller can be packaged as part
of a multicore blade server without any a priori knowledge of
the applications that are to be executed. Finally, as in [2] we
analyze the asymptotic convergence of the control algorithm
as well as its stability and robustness, but as we shall see,
this analysis is quite different from the one in [2]. Finally, we
demonstrate the efficacy of our control law for benchmark
programs executing on multicore processors.
The rest of the paper is organized as follows. Section
II presents some of the main challenges arising in power
control in multicore systems. Section III describes the pro-
posed control law in an abstract setting and analyzes its
asymptotic convergence. Section IV describes the specific
control problem that is addressed in this paper, and Section
V presents simulation results. Finally, Section VI concludes
the paper.
II. SYSTEM MODEL AND CHALLENGE
The target application domain is that of multicore pro-
cessors. An example is a four core processor where each
core has L1 instruction and data caches and a private L2
cache. Cores communicate with each other and with memory
controllers and I/O devices through an on-chip network.
However, we are concerned with an emerging class of
multicore processors that are asymmetric in the designs of
the cores, i.e., not all cores on the chip are of the same
design [5], [6], [7].
In the example we evaluate in Section V, there are two
types of cores. A complex out-of-order (OOO) core which
employs aggressive pipelining and speculation to increase
the average number of instructions executed per clock cycle
(IPC). The second type of core is a simple in-order (IO)
core where instructions are executed and retired in order.
While multiple instructions can be issued in parallel, they
are executed and retired in-order leading to lower average
IPC. IO cores consume significantly less power than OOO
cores. The emergence of asymmetric multicore processors
reflect an architectural approach to constraining the rapidly
growing power densities of future processors. OOO cores
can provide high performance for critical single thread or
serial segments of code (at higher power) while the IO cores
can provide significantly better energy and power efficiency
by executing parallel segments of code.
There are two issues when applying contemporary control
techniques. First, a single controller for all types of cores
is ineffective in such an architecture since the consequences
of changing the voltage-frequency setting is very different
for different types of cores. The natural choice is to have
each core and its private caches be separately controlled.
This implies that each core is in a separate voltage island
which is quite common. For example, Intel’s 48 core single
chip cloud computer has 8 voltage domains and 28 frequency
domains [8]. However, even then we have the second issue
- contemporary power capping controller designs that rely
on extensive offline analysis of applications to determine
parameters of the model present practical impediments for
deployment. The off-line analysis must be completed for all
combinations of core types and applications. Different core-
application combinations will most likely lead to controller
designs with different gain parameters and convergence
properties. This leads to the need for either i) restricting
the cores on which specific applications can be executed or
ii) the ability to change the controller gain as application
threads are scheduled on different cores. The former defeats
the purpose of having asymmetric multicore processors. The
latter is an ad-hoc solution that is still limited since all gain
values must be statically known.
We believe that the approach and design proposed here is
superior in that our controller does not rely on extensive off-
line analysis. We would observe that the controller design
is based on fundamental frequency-power relationships that
is experienced across all application and core combinations.
Thus, the controller can be an integral part of the multicore
design and be applicable across a wide range of applications
and core types. The following sections provide the details of
our approach.
III. CONTROL LAW
fashion based only on measurements but not on any off-
line analysis. This is a significant advantage since unlike
previous technique , this controller can be packaged as part
of a multicore bl de s rver without any a priori knowledge of
the applic tions that are to be xecuted. Finally, as in [2] we
analyze the asymptotic convergence of the control algorithm
as well as its stability and robustness, but as we shall see,
this analysis is quite different from the one in [2]. Finally, we
demonstrate the efficacy of our control law for benchmark
programs executing on multicore processors.
The rest of the paper is organized as follows. Section
II presents some of the main challenges arising in power
control in multicore systems. Section III describes the pro-
posed control law in an abstract setting and analyzes its
asymptotic convergence. Section IV describes the specific
control problem that is addressed in this paper, and Section
V presents simulation results. Finally, Section VI concludes
the paper.
II. SYSTEM MODEL AND CHALLENGE
The target application domain is that of multicore pro-
cessors. An example is a four core processor where each
core has L1 instruction and data caches and a private L2
cache. Cores co municate with each other and ith e ory
controllers and I/O devices through an on-chip t r .
How ver, we are concerned with an emerging l
multicore processors that are asymmetric in the desi
the cores, i.e., not all cores on the chip are of t
design [5], [6], [7].
In th example we evaluate in Section V, there r t
types of cores. A complex out-of-order (OOO) core hich
employs aggressive pipelining and speculation to increase
the average number of instructions executed per clock cycle
(IPC). The second type of core is a simple in-order (IO)
core where instructions are executed and retired in order.
While multiple instructions can be issued in parallel, they
are executed and retired in-order leading to lower average
IPC. IO cores consume significantly less power than OOO
cores. The emergence of asymmetric multicore processors
reflect an architectural approach to constraining the rapidly
growing power densities of future processors. OOO cores
can provide high performance for critical single thread or
serial segments of code (at higher power) while the IO cores
can provide significantly better energy and power efficiency
by executing parallel segments of code.
There are two issues when applying contemporary control
techniques. First, a single controller for all types of cores
is ineffective in such an architecture since the consequences
of changing the voltage-frequency etting is very different
for different types of cores. The natural choice is to have
each core and its private caches be separately controlled.
This implies that each core is in a separate voltage is and
which is quite common. For example, Intel’s 48 core single
chip cloud computer has 8 voltage domains and 28 frequency
domains [8]. However, even then we have the second issue
- contemporary power capping controller designs that rely
on extensive offline analysis of applications to determine
parameters of t e model present practical impediments for
deployment. The off-line analysis must be completed for all
combinations f core types and applications. Different co e-
application combinatio s will most likely lead to troll r
designs with different gain parameters and convergen e
properti . This lead to the need for either i) r stricting
the cores on which specific applicati ns can be executed or
ii) the ability to change the controller gain as application
threads are scheduled on different cores. The former defeats
the purpose of having asymmetric multicore processors. The
latter is an ad-hoc solution that is still limited since all gain
values must be statically known.
We believe that the approach and design proposed here is
superior in that our controller does not rely on extensive off-
line analysis. We would observe that the controller design
is based on fundamental frequency-power relationships that
is experienced across all application and core combinations.
Thus, the controller can be an integral part of the multicore
design and be applicable across a wide range of applications
and core types. The following sections provide the details of
our approach.






Fig. 1. Power Control System
Consider the discrete-time scalar system shown in Figure
1, where the plant is modeled as a memoryless, time-
varying nonlinear system of the form P = g
n
( ); n denotes
(discrete) time and g
n
: R ! R is the function defining





denote frequency and power, and this is the reason we are
using the unusual notation for the control variable and the
output signal, respectively. Suppose that the functions g
n
,





where the following assumption is in force.
Assumption 1: Each one of the functions g
n
is con-
tinuously differentiable, convex, and monotone-increasing
throughout I . Furthermore, there exist constants  1 > 0 and











)   2 (‘prime’ denotes derivative with respect
to  ).
We implicitly assume that every point   mentioned in the
sequel is contained in I .
Let P
s
be a given reference input, and suppose that the
purpose of the controller is to regulate the output in the
sense that lim sup
n!1 Pn and lim infn!1 Pn are close to
P
s
within a certain tolerance. To this end we use an integral




n 1 + Knen 1 (1)
CONFIDENTIAL. Limited circulation. For review only.
Preprint submitted to 2012 American Control Conference.
Received September 22, 2011.
Fig. 1. Power Control System
Consider the discrete-time scalar system shown in Figure
1, where the plant is modeled as a memoryless, time-
varying nonlinear system of the form P = g
n
( ); n denotes
(discrete) time and g
n
: R ! R is the function defining





te frequency and power, and this is the reason we are
i the unusual notation for the control variable and the
t t signal, respectively. Su pose tha the functions g
n
,





re the fo lowing a sumption is in force.
ssu ption 1: Each one of the functions g
n
is con-
tinuously differentiable, convex, and monotone-increasing
throughout I . Furthermore, there exist constants  1 > 0 and











)   2 (‘prime’ denotes derivative with respect
to  ).
We implicitly assume that every point   mentioned in the
sequel is contained in I .
Let P
s
be a given reference input, and suppose that the
purpose of the controller is to regulate the output in the
sense that lim sup
n!1 Pn and lim infn!1 Pn are close to
P
s
within a certain tolerance. To this end we use an integral




n 1 + Knen 1 (1)
for a suitable gain K
n

















for all n = 1, . . .. Thus, once the gains K
n
, n = 1, 2, . . . are
specified, Equations (1)-(3) define the closed-loop system in
a recursive manner.
Suppose that at time n the function g
n
(·) is known, we
have a measurement of the control signal  
n 1, and are able




n 1). We now assume
that the latter computation is exact, and later will consider














We point out that if the plant is time invariant, namely
g
n
(·) = g(·) for some function g : R ! R satisfying
Assumption 1, then the recursive computation of e
n
, defined
by Equations (1) - (4), effectively is Newton’s method for
finding a zero of the equation e = P
s
  g( ) = 0. In this
case we have the following result.
Proposition 1: Suppose that the plant is time invariant.
Then there exists a positive constant   < 1 such that, for
every n = 1, 2, . . .,
1) If e
n 1   0 then en  0.
2) If e
n 1  0 then
 e
n 1  en  0. (5)
This result is a special case of Proposition 2, below,
concerning time-varying systems.
As a corollary, it follows that the output tracks the refer-
ence input, since lim
n!1 en = 0 and hence limn!1 Pn =
P
s
. Moreover, this convergence is exponential in the sense
that |e
n
|  A n for some A > 0 and   2 (0, 1).
Consider now the time-varying case, where the closed-
loop system is defined via Equations (1) - (4). The error
term e
n
satisfies the following inequalities.
Proposition 2: There exists a positive constant   < 1 such
that, for every n = 1, 2, . . .,
1) If e




n 1( n 1)   gn( n 1). (6)
2) If e










n 1( n 1)   gn( n 1). (7)
Proof: Consider a differentiable convex function g :
R ! R. By the definition of convexity, for every x 2 R and
 x   0, the following inequalities are in force:
g
0
(x) x  g(x +  x)   g(x)  g
0
(x +  x) x. (8)
By (4) and Assumption 1, K
n
> 0 for every n = 1, 2, . . ..
Consider first part (1) of the proposition. Suppose that
e








n 1)Knen 1, and hence, and


















n 1)   en 1. (9)
Subtracting and adding g
n 1( n 1) to the Right-Hand Side
(RHS) of (9), and using (3) weith n 1, Equation (6) follows.
Next, consider part (2) of the proposition. Suppose that
e




















We next apply Equation (8) with x =  
n 1 + Knen 1 and
x +  x =  
n 1; note that  x :=  Knen 1   0. The left






























Subtracting and adding g































n 1 + gn 1( n 1)   gn( n 1), (13)
where the last equality follows from (3) and (4). By (1),
 















 1. By Assumption 1 there exists ↵ 2 (0, 1),






  ↵. Defining   =
1   ↵, the left inequality of (7) follows from (13).
The right inequality of (7) is proved in a similar way to

































n 1( n 1) + gn 1( n 1)   gn( n 1)   en 1
= g
n 1( n 1)   gn( n 1), (14)
thereby establishing the right inequality of (7) and complet-
ing the proof.
Proposition 2 implies that P
n
converges exponentially fast
toward a band (tolerance) around the target level P
s
, and the
width of the band depends on how fast the plant-equation
(2) varies. To see this, suppose that there exists " > 0 such
that for every n = 1, 2, . . ., |g
n
( 
n 1)   gn( n)| < ". Then
Proposition 2 implies that, for every n   2,
  1





Certainly no perfect tracking can be obtained when the
system is time varying, but Equation (15) shows that when
the system varies slowly, namely " is small, a narrow band





Now suppose that the controller’s gain K
n
is not computed
exactly, but rather is estimated by a quantity K̄
n
> 0. In this





n 1 + K̄nen 1. (16)
The following result is an extension of Proposition 2 and its
proof is similar and hence omitted.
Proposition 3: Let ↵ 2 (0, 1] be as in the proof of
Proposition 2, namely, for every  1 2 I and  2 2 I such







Assumption 1 such ↵ exists. For every n = 1, 2, . . .,
1) If e


















































then Equations (17) and (18)
reduce to (6) and (7) (with   = 1   ↵), respectively.
Suppose that there exists numbers µ and ⌘ such that 0 <
µ < ⌘ < 2, and suppose that µ  K̄n
Kn
 ⌘ for all n =
1, 2, . . .. Suppose also that there exists " > 0 such that, for
every n = 2, . . ., |g
n
( 
n 1)   gn( n)|  ". Then simple
algebra yields the following inequalities,
  1
↵µ











Note that this equation reduces to (15) when the computation
of K
n




. Also, (19) is an extension
of one of the convergence results in [2] where the system is
linear and " = 0.
IV. MODELING AND CONTROL OF A MULTICORE POWER
REGULATION SYSTEM
Consider a processor driven by a supply voltage V and
operating at a frequency  . The power dissipating at the
processor is a function of both voltage and frequency as
well as the workload, and denoted by P ( , V, t), it has the
following form,
P ( , V, t) = ↵(t)CV 2  + P
L
. (20)
This equation, derived from basic physical principles, has
been established in the literature; see, e.g., [9]. The first
term in its RHS, , ↵(t)CV 2 , is the dynamic power com-
ponent resulting from the switching activity, and the second
term, P
L
, is the static leakage power. The term ↵(t) is a
time-varying workload parameter representing the switching
activity of the processor’s logic gates, and C is the total
processor capacitive load. The leakage power P
L
depends
on temperature and voltage, but its time-variations for the
considered voltage range are much smaller than those of
↵(t) and hence can be neglected, and P
L
is assumed to
have a constant value. Equation (20) presents an incentive
for selecting low supply voltages, since P depends on V
in a quadratic fashion. However, there exists a frequency-
dependent bound on how low V can be set. Reducing
the supply voltage of CMOS circuits generally increases
their propagation delay [14], and this may violate timing
constraints requiring all propagation delays to be less than the
clock period 1
 
. Therefore, manufacturers specify a mapping
V ( ), determined at design time, to guide the selection of
voltage levels as a function of frequency. This mapping
is nearly affine (linear plus a constant term) and can be
adequately approximated via the term
V ( ) = m  + V0; (21)
please see [10], [11]. With this equation we can write P as
a function of   and t, and Equation (20) becomes
P ( , t) = ↵(t)CV ( )2  + P
L
. (22)
The control law described in the previous section requires the
online calculation of the derivative term dP
d 
, which by (22)








it is impractical to measure or compute ↵(t), but possible
to measure the total power, voltage, and frequency, while
the term dV
d 
can be obtained from via manufacturer or via
simulation [10], [11], and P
L
can be measured at design time.








































where variations in ↵
n
correspond to the time-varying pro-





















and this equation was used in the simulations described in
the next section.
V. SIMULATION RESULTS
This section reports on the results of simulations of an
asymmetric multicore processor consisting of two architec-
turally distinct types of cores - a complex out-of-order core
and a simpler two-way superscalar in-order core.
A. Evaluation Platform
The evaluation platform consists of a cycle-level X86
processor simulator [12] integrated with the McPAT [13]
microarchitecture power models. The architectural and phys-
ical configurations of the simulated processor are provided in
Table I. We simulated the execution of benchmarks programs
from the SPEC2006 suite by extracting program traces to
drive a 4 core multicore processor interconnected in a 2x2
mesh configuration. The processor is an asymmetric proces-
sor with 2 out-of-order cores and 2 in-order cores. Power
measurements and controller invocations occur every 5ms.
We evaluated the proposed adaptive-gain integral controller
and a set of fixed-gain integral controllers with gain values
given in K = [25,50,75,100,150,270,385,500]e6. The initial
frequency and supply voltage for each tracking experiment
is set to the 3GHz and 0.9V, respectively.
TABLE I
SIMULATED PROCESSOR CONFIGURATION
Parameters Out-of-order Core In-order Core
Architectural Configuration
ISA x86 IA32
Pipeline Depth 20 stages 16 stages
Fetch/Decode 4 instructions 2 instructions
Execution 6 ports 3 ports
L1 Cache 4-way 32KB 4-way 32KB






POWER TRACKING PHASE FOR ASYMMETRIC PROCESSOR
Core Phase 1 Phase 2 Phase 3
Core0 (in-order) 6.5 W 5.5W 7.5W
Core1 (in-order) 6.5 W 7.5W 5.5W
Core2 (out-of-order) 12 W 10W 12W
Core3 (out-of-order) 12 W 14W 12W
B. Tracking Analysis
Equations 1-4 were implemented within the simulation
model configured as noted in Table I. Figure 2 shows rep-
resentative runtime power tracking results of the SPEC2006
milc benchmark for i) adaptive gain, ii) high fixed gain
(K = 500), and iii) low fixed gain (K = 25) controllers.
Each core executed the same benchmark and the execution
was partitioned into three phases. For each phase the power
budget was changed for each core as shown in Table II.
The power budgets are shown as dotted lines in the figure.
We can observe how well the adaptive gain and static gain
controllers track and maintain new power budgets.
The adaptive gain controller tracked the varying reference
signals with a time of around 15ms for both in-order and out-
of-order cores. The high fixed gain controller is as effective
A. Evaluation Platform
The evaluation platform consists of a cycle-level X86
processor simulator [12] integrated with the McPAT [13]
microarchitecture power models. The archit ctural and phys-
ical configurations of the simulat d processo are p ovided n
Table I. We simulated the execution of benchmarks programs
from the SPEC2006 suite by extracting program traces to
drive a 4 core multicore processor interconnected in a 2x2
mesh configuration. The processor is an asymmetric proces-
sor with 2 out-of-order cores and 2 in-order cores. Power
measurements and controller invocations occur ev ry 5ms.
We evaluated the pr pose adaptiv -gain integral controller
and a set of fixed-gain integral controllers with gain values
given in K = [25,50,75,100,150,270,385,500]e6. The initial
frequency and supply voltage for each tracking experiment
is set to the 3GHz and 0.9V, respectively.
TABLE I
SIMULATED PROCESSOR CONFIGURATION
Parameters Out-of-order Core In-order Core
Archit tural Configuration
ISA x86 IA32
Pipeline Depth 20 stages 16 stages
Fetch/Decode 4 instructions 2 instructions
Execution 6 ports 3 ports
L1 Cache 4-way 32KB 4-way 32KB






POWER TRACKING PHASE FOR ASYMMETRIC PROCESSOR
Core Phase 1 Phase 2 Phase 3
Core0 (in-order) 6.5 W 5.5W 7.5W
Core1 (in-order) 6.5 W 7.5W 5.5W
Core2 (out-of-order) 12 W 10W 12W
Core3 (out-of-order) 12 W 14W 12W
B. Tracking Analysis
Equations 1-4 were implemented within the simulation
model configured as noted in Table I. The activity factors
were estimated by counting the number of ex cuted instruc-
tions in every sampling perio . Figure 2 shows representative
runtime power tracking results of the SPEC2006 milc
benchmark for i) adaptive gain, ii) high fixed gain (K =
500), and iii) low fixed gain (K = 25) controllers. Each
core executed the same benchmark and the execution was
partitioned into three phases. For each phase the power
budget was changed for each core as shown in Table II.
The power budgets are shown as dotted lines in the figure.
We can observe how well the adaptive gain and static gain
controllers track and maintain new power budgets.
(a) Adaptive gain controller
(b) High fixed gain controller (K = 500e6)
(c) Low fixed gain controller (K = 25e6)
Fig. 2. Runtime power tracking results of asymmetric cores.
The adaptive gain controller tracked the varying reference
signals with a time of around 15ms for both in-order and out-
of-order cores. The high fixed gain controller is as effective
as the adaptive gain controller for the in-order cores but inef-
ficient for the out-of-order cores. The performance difference
is due to the microarchitecture heterogeneity between the two
cores. The out-of-order core which has a wider and deeper
pipeline can execute more instructions. Thus when the power
budget is increased (and hence voltage-frequency) the high
gain causes significant overshoot. In contrast the in-order
core is limited in its ability to increase its execution capacity
and therefore the high gain is not disruptive as the power
budgets are increased. On the contrary, the inertia of the low
CONFIDENTIAL. Limited circulation. For review only.
Preprint submitted to 2012 American Control Conference.
Received September 22, 2011.
(a) Adaptive gain controller
A. Evaluation Platform
The evaluation platform consists of a cycle-level X86
processor simulator [12] integrated with the McPAT [13]
microarchitecture power models. The architectural and phys-
ical configurations of the simulated processor are provided in
Table I. We simulated the execution of benchmarks programs
from the SPEC2006 suite by extracting program traces to
drive a 4 core ulticore proce sor interconnected in a 2x2
mesh configuration. The processor is an asymmetric proces-
sor with 2 out-of-order cores and 2 in-order cores. Power
measurements and controller invocations occur every 5ms.
We evaluated the proposed adaptive-gain integral controller
and a set of fixed-gain integr l controllers with gain values
given in K = [25,50,75,100,150,27 385,5 0]e6. The initial
frequency and supply voltage for each tracking exp riment
is set to the 3GHz and 0.9V, respectively.
TABLE I
SIMULATED PROCESSOR CONFIGURATION
Parameters Out-of-ord r Core In-order Core
Architectural Configuration
ISA x86 IA32
Pipeline Depth 20 stages 16 stages
Fetch/Decode 4 instructions 2 instructions
Execution 6 ports 3 ports
L1 Cache 4-way 32KB 4-way 32KB






POWER TRACKING PHASE FOR ASYMMETRIC PROCESSOR
Core Phase 1 Phase 2 Phase 3
Core0 (in-order) 6.5 W 5.5W 7.5W
Core1 (in-order) 6.5 7. 5.
Core2 (out-of-order) 12 10 12
Core3 (out-of-order) 12 4
B. Tracking Analysis
Equations 1-4 were implemented within the simulation
model config red as noted in Table I. Th activity factors
were estimated by counting the number of executed i struc-
tions in every sampling period. Figure 2 shows representative
runtime power tracking results of the SPEC2006 milc
benchmark for i) adaptive gain, ii) high fixed gain (K =
500), and iii) low fixed gain (K = 25) controllers. Each
core executed the same benchmark and the execution was
partitioned into three phases. For each phase the power
budget was changed for each core as shown in Table II.
The power budgets are shown as dotted lines in the figure.
We can observe how well the adaptive gain and static gain
controllers track and maintain new power budgets.
(a) Adaptive gain controller
(b) High fixed gain controller (K = 500e6)
(c) Low fixed gain controller (K = 25e6)
Fig. 2. Runtime power tracking results of asymmetric cores.
The adaptive gain controller tracked the varying reference
signals with a time of around 15ms for both in-order and out-
of-order cores. The high fixed gain controller is as effective
as the adaptive gain controller for the in-order cores but inef-
ficient for the out-of-order cores. The performance difference
is due to the microarchitecture heterogeneity between the two
cores. The out-of-order core which has a wider and deeper
pipeline can execute more instructions. Thus when the power
budget is increased (and hence voltage-frequency) the high
gain causes significant overshoot. In contrast the in-order
core is limited in its ability to increase its execution capacity
and therefore the high gain is not disruptive as the power
budgets are increased. On the contrary, the inertia of the low
CONFIDENTIAL. Limited circulation. For review only.
Preprint submitted to 2012 American Control Conference.
Received September 22, 2011.
(b) High fixed gain controller (K = 500e6)
A. Evaluation Platform
The evaluation platform consists of a cycle-level X86
processor simulator [12] integrated with the McPAT [13]
microarchitecture power models. The architectural and phys-
ical configurations of the simul ted processor are provided in
Table I. We simulated the execution of benchmarks programs
from the SPEC2006 suite by extracting program traces to
drive a 4 core multicore processor interconnected in a 2x2
mesh configuration. The processor is an asymmetric proces-
sor with 2 out-of-order cores and 2 in-order cores. Power
measurements and controller invocations occur every 5ms.
We evaluated the prop sed adaptive-gai integral cont oller
and a set of fixed-gain integral controllers with gain values
given in K = [25,50,75,100,150,270,385,500]e6. The initial
frequency and supply voltage for each tracking experiment
is set to the 3GHz and 0.9V, respectively.
TABLE I
SIMULATED PROCESSOR CONFIGURATION
Parameters Out-of-order Core In-order Core
Architectural Configuration
ISA x86 IA32
Pipeline Depth 20 stages 16 stages
Fetch/Decode 4 instructions 2 instructions
Execution 6 ports 3 ports
L1 Cache 4-way 32KB 4-way 32KB






POWER TRACKING PHASE FOR ASYMMETRIC PROCESSOR
Core Phase 1 Phase 2 Phase 3
Core0 (in-order) 6.5 W 5.5W 7.5W
Core1 (in-order) 6.5 W 7.5W 5.5W
Core2 (out-of-order) 12 W 10W 12W
Core3 (out-of-order) 12 W 14W 12W
B. Tracking Analysis
Equations 1-4 were implemented within the simulation
model configured as noted in Table I. The activity factors
were estimated by counting the number of executed instruc-
tions in every sampling period. Figur 2 shows representative
runtim pow r tracking results of the SPEC2006 mil
benchmark for i) adaptive gain, ii) hig fixed gain (K =
500), and iii) low fixed ain (K = 25) controllers. Each
core executed the same b nchmark and the xecut on was
partitione nto three phases. For each phase the power
budget was changed for each co e s shown in Table II.
The pow r budge s are s own as dotted lines in figur .
We can observe how well the adaptive gain and static gain
controllers track and maintain new power budgets.
(a) Adaptive gain controller
(b) High fixed gain controller (K = 500e6)
(c) Low fixed gain controller (K = 25e6)
Fig. 2. Runtime power tracking results of asymmetric cores.
The adaptive gain controller tracked the varying reference
signals with a time of around 15ms for both in-order and out-
of-order cores. The high fix d g in controller is as effective
as the adaptive gain controller for the in-order cor s but inef-
ficient for the out-of-order cores. The pe formance di rence
is du to the microar hi ecture heterogeneity between the two
cor s. The out-of-order core which has a wider and d ep r
pipeline can execute more instructi ns. Thus when the po er
budget is increased (and h nce voltage-frequ ncy) th high
gain causes significant overshoot. In contrast the in-order
core is limited in its bility to increase its execution capacity
and therefore the high gain is not disruptive as the power
budgets are increased. On the contrary, th iner a of the low
CONFIDENTIAL. Limited circulation. For review only.
Preprint submitted to 2012 American Control Conference.
Received September 22, 2011.
(c) Low fixed gain controller (K = 25e6)
Fig. 2. Runtime power tracking results of asymmetric cores.
as the adaptive gain controller for the in-order cores but inef-
ficient for the out-of-order cores. The performance difference
is due to the microarchitecture heterogeneity between the
two cores. Under the same workload, dP
d 
is greater in the
out-of-order cas , since it has a highe capac tance C, a
can execute more instructions per unit time resulting (larger
↵(t)) mpared to the in-order processor. When the pow r
budge is increased, th o t-o -order proc sso will always
require a smaller frequency correction Ke
n
compared to the
in-order case by virtue of it’s steeper power vs. frequency
relationship. Thus, it expected that a high-enough gain value
may cause significant overshoot in the out-of-order case
while being beneficial in the in-order case as shown in Fig.
2. The fact that power budgets can change in unexpected
manner as a function of workload demand or even electricity
prices (for example based on time of day as is done in
data centers), further limits the applicability of fixed gain
controllers.
Finally, in principle it may be observed that other static
gain values could have produced good tracking properties
and possibly better than the examples we have shown. The
general observation that a particular controller with a specific
gain value can provide good tracking, is of limited value
for several reasons. First, from a practical point of view one
cannot in general know the applications that will be executed
on a platform. Second, the operating system controls where
and when threads and processes are scheduled. Extending
thread and process schedulers with controller information
to constrain scheduling decisions is feasible, but must be
weighed against the loss in flexibility and performance that
is experienced by limiting the operating system’s choices.
Third, multicore processors will be executing parallel and not
serial applications. Characterization of the power behavior of
multithreaded applications on asymmetric processors is still
an area of research. Finally, we note that the power and exe-
cution time properties of an application can be significantly
affected by the input data sets that the applications process.
To be practical, extensive off-line analysis of applications
to determine the controller gain must cover all possible
combinations of core types, applications, and input data sets
(the sets that have significant impact on power behavior).
We argue that this requirement is limiting and impractical in
practice.
By focusing on i) how applications affect fundamen-
tal, technology dependent behaviors, namely the frequency-
power relationship, and ii) on-line measurement, the adaptive
gain controller presented here suffers from none of the
preceding drawbacks. Its operation is agnostic to specific
applications and core types and thus is a candidate for
integration into hardware platforms. However, it does rely on
the capability of on-line power measurements. Currently, this
is generally not available to user programs at a fine enough
granularity in commodity processors. However, there is no
significant technical impediment to doing so.
VI. CONCLUSIONS
This paper introduced an online controller for processor-
power tracking using dynamic voltage-frequency scaling.
The proposed control law comprises an integral controller
that adjusts its gain in response to changes in the workload to
ensure effective regulation and fast settling time. Gain adjust-
ment relies on a novel application-agnostic characterization
of the derivative of power that can be cost-effectively done
online using power measurements and offline knowledge
of the platform’s voltage frequency relationship. Tracking
property of the proposed algorithm is shown to hold provided
that the voltage versus frequency relationship is convex.
Simulation results using a cycle-accurate microprocessor
demonstrate that the proposed algorithm achieves faster
settling times than integral controllers with static gains.
The approach holds out promise of applications in the new
generation of multicore processor that are asymmetric in the
designs of the cores.
REFERENCES
[1] R.H. Katz, “Tech titans building boom,” IEEE Spectrum, Vol. 46, no.
2, 2009.
[2] C. Lefurgy, X. Wang, and M. Ware, “Power capping: A prelude to
power shifting,”Cluster Computing, vol. 11, no. 2, June 2008.
[3] R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, and X. Zhu,
“No power struggles: coordinated multi-level power management for
the data center,” SIGARCH Comput. Architecture News, Vol. 36, pp.
48-59, 2008.
[4] A.K. Mishra, S. Srikantaiah, M. Kandemir, and C.R. Das, “CPM in
CMPs: Coordinated power management in chip-multiprocessors,” in
Proc. Intl. Conference on High Performance Computing, Networking,
Storage and Analysis, pp. 1-12, 2010.
[5] R. Kumar et al. Heterogeneous chip multiprocessors. IEEE Computer,
38(11), 2005.
[6] T. Morad et al., “Performance, power efficiency and scala-
bility of asymmetric cluster chip multiprocessors,” Compute
Architecture Letters, 2006.
[7] M. Hill and M. Marty, “Amdahls law in the multicore era. IEEE
Computer,” IEEE Computer, 41(7), 2008.
[8] M. Baron, “The Single Chip Cloud Computer,”, in Micrprocessor
Report, April 2010.
[9] M. Floyd, S. Ghiasi, T. Keller, K. Rajamani, J. Rawson, F. Rubio,
and M. Ware, “System power management support in the ibm power6
microprocessor,” IBM Journal of Research and Development, Vol. 51,
no. 6, 2007.
[10] T. Burd, T. Pering, A. Stratakos, and R. Brodersen, “A dynamic
voltage scaled microprocessor system,” in Proc. Solid-State Circuits
Conference, 2000.
[11] R. McGowen, C.A. Poirier, C. Bostak, J. Ignowski, M. Millican, W.H.
Parks, and S. Naffziger, “Power and temperature control on a 90-nm
itanium family processor,” IEEE JSSC Vol. 41, pp. 229-237, 2006.
[12] G.H. Loh, S. Subramaniam, and X. Yuejian, “Zesto: A cycle-level
simulator for highly detailed microarchitecture exploration,” in Pro-
ceedings IEEE International Symposium on Performance Analysis of
Software and Systems pp. 53-64, 2009.
[13] S. Li, J. Ho Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and
N. P. Jouppi, “Mcpat: an integrated power, area, and timing modeling
framework for multicore and manycore architectures,” in Proc. IEEE
MICRO, pp. 469-480, 2009.
[14] J.M. Rabaey, Digital Integrated Circuits: A Design Perspective, Pren-
tice Hall, 1995.
