Modern multiprocessor system-on-chips (SoCs) integrate multiple heterogeneous cores to achieve high energy e ciency. The power consumption of each core contributes to an increase in the temperature across the chip oorplan. In turn, higher temperature increases the leakage power exponentially, and leads to a positive feedback with nonlinear dynamics. This paper presents a power-temperature stability and safety analysis technique for multiprocessor systems. This analysis reveals the conditions under which the power-temperature trajectory converges to a stable xed point. We also present a simple formula to compute the stable xed point and maximum thermally-safe power consumption at runtime. Hardware measurements on a state-of-the-art mobile processor show that our analytical formulation can predict the stable xed point with an average error of 2.6%. Hence, our approach can be used at runtime to ensure thermally safe operation and guard against thermal threats.
. Illustration of the power consumption temperature trajectory for two power consumption levels using experiments performed on Odroid XU3 board [12] .
the computational resources or shuts down the platform depending on the severity of the violation. However, these techniques are triggered only after the damage becomes observable.
There is a well-known positive feedback between the power consumption and the temperature [23, 32] . Power consumption drives the chip temperature up through thermal resistance and capacitance networks. Higher temperature, in turn, leads to an exponential increase in leakage power. This nonlinear dynamics leads to a positive feedback which increases both the temperature and power consumption. When a stable xed point exists, it attracts all the temperature trajectories within the region of convergence to itself. Therefore, the steady increase in power consumption and temperature continues until the stable xed point is reached. Otherwise, a thermal runaway occurs, as we prove in this paper.
Although the consequences of thermal runaway are detrimental, to the best of our knowledge, there are no models that can analyze the existence and stability of xed points at runtime. More importantly, even if a stable xed point exists, it may be well above the maximum safe temperature limit. A static bound on the maximum power consumption can neither avoid thermal violations, nor provide insight on the expected time before a thermal violation occurs. Therefore, it is critical to monitor the stability and safety of power-temperature dynamics at runtime, and detect potential violations before any damage is done.
To illustrate the problem addressed by this paper, we measured the power-temperature pro le of a commercial SoC for a complete heat-up/cool-down cycle, as shown in Figure 1 . In particular, the inner trajectory, denoted by black markers, starts with a power consumption slightly larger than 0.2 W. When we increase the dynamic activity, the power consumption quickly rises to 0.9 W. As a result, the temperature starts ramping up during this period marked as "1". Then, we keep the dynamic activity constant, but the temperature continues to increase towards a xed point (segment 2). The corresponding rise in the power consumption reveals the impact of the leakage power, since the dynamic activity is kept constant. Eventually, the power consumption and temperature converge to (1.2 W, 69 • C) 1 . The existence of this xed point (i.e., the upper right corner) is necessary to avoid a thermal runaway, but it is not su cient to ensure a thermally safe operation. For example, when Power-Temperature Stability and Safety Analysis for Multiprocessor Systems 1:3 we repeat the same experiment with a higher dynamic activity, we observe the other trajectory denoted by markers. The second trajectory also converges to a stable xed point given by (2.6 W, 93 • C). However, this point is larger than the temperature which triggers throttling. If a demanding application or a power virus drive the system to this xed point, throttling can deteriorate the performance. In contrast, thermally safe operation without performance penalties can be achieved, if we can compute the xed point and the expected time to reach it as a function of the dynamic activity. The major contributions of this paper are as follows:
(1) We rst show that the power-temperature dynamics have either no xed point or two xed points, as a function of the system parameters and the dynamic power consumption (Section 4.1). (2) We prove that the no-xed-point case is unstable and causes thermal runaway (Section 4.2). (3) When there are two xed points, we prove that one of these xed points is stable and we give the region of convergence, i.e., the temperature interval where any temperature inside it converges to the stable xed point. We also prove that the second xed point is unstable. We derive the region of convergence and the intervals for which the temperature diverges (Section 4.2). (4) We derive an analytical formula to compute the maximum dynamic power consumption allowed to guarantee that the temperature does not exceed a thermally safe value (Section 4.3). (5) To validate the proposed approach, we present thorough experimental evaluations on an 8-core big.LITTLE platform [12] using single-threaded, multi-threaded and concurrent applications. We demonstrate that the average and maximum prediction errors are 2.6% and 6.2%, respectively. We also show that the total computational overhead of all proposed computations is 75.2 µs of 100 ms control interval, i.e., ≈ 0.075% (Section 5). Potential Impact: This paper lays the theoretical foundation for power-temperature stability analysis and presents experimental validation of the contributions summarized in the enumerated list above. As we demonstrate in Section 5, our power-temperature stability analysis has a very e cient and practical implementation despite the complexity of the derivations. Therefore, it can be employed by other researchers in dynamic thermal and power management (DTPM) algorithms to determine if the power-temperature dynamics is stable or not. If any instability is detected, immediate corrective actions can be taken by the system. Otherwise, the proposed approach can be used to predict the stable xed point and the expected time to reach it. DTPM algorithms can use this prediction to determine the urgency and degree of the response. Finally, the proposed approach can accurately compute the maximum power consumption that can be tolerated before violating the thermal constraints. This insight can be used by DTPM algorithms to make informed decisions. For example, if a power-hungry application is driving the system beyond a safe temperature, the DTPM algorithm can selectively isolate the application or terminate it. The proposed approach can also be applied as an e ective built-in test to determine reliability and thermal violation risks.
The rest of this paper is organized as follows. We present the related work in Section 2. We give an overview of the proposed methodology and detail the theoretical derivations in Section 3 and Section 4, respectively. Finally, we present the experimental results in Section 5, and summarize our conclusions in Section 6.
RELATED WORK AND NOVELTY
Thermal modeling and analysis have recently attracted signi cant attention due to large power densities and the impact of temperature on reliability [22, 32] . These studies can be broadly classi ed as design time and runtime approaches. Design time approaches primarily focus on a full-chip thermal analysis such that parameters like thermal design power can be determined [16, 17, 34, 35] . For instance, Hotspot [17] models the thermal behavior of the entire chip as a function of the oorplan, technological parameters and packaging. Then, power consumption traces obtained using common benchmarks are used to simulate the thermal behavior. Similarly, authors in [34] propose a tool which does the full-chip thermal analysis during the synthesis of a chip. These models are very useful for early design stages, however, the high delity of these approaches comes at the expense of computational complexity. It is possible to extract high-level models from these tools, and evaluate them iteratively to analyze the thermal behavior. However, iterative approaches are time-consuming and not accurate as we demonstrate in Section 5.7.
Virtually all commercial products have a mechanism to throttle performance, or shut down the whole system in case of thermal violations. However, reactive approaches penalize performance, and respond only after the fact [18, 28] . This led to predictive approaches for dynamic thermal and power management. Predictive approaches rst develop computationally e cient thermal models which can be used at runtime [2, 11] . These models are used to predict the temperature as a function of the power consumption to guide the DTPM algorithms [6, 21, 27, 31] . For instance, the authors in [21] propose a hybrid thermal management algorithm which uses hardware and software techniques for temperature control. In particular, they employ clock gating and thermalaware scheduling to improve the performance of the algorithm. Similarly, the work presented in [33] characterizes the thermal system parameters o ine, considering the coupling between various components. Then, this model is used at runtime to predict the temperature, and control the frequency to minimize thermal violations. The authors in [4, 25] propose methods that consider the transient thermal e ects and use them for thermal management [25] . While these models work for short prediction intervals, the error increases considerably when larger prediction windows are used [31] . Furthermore, they do not analyze the existence and stability of thermal xed points. In contrast, our approach can accurately estimate the xed point and maximum allowed power consumption at runtime with a low computational overhead. Therefore, it can be utilized by DTPM algorithms.
Recent studies have also proposed techniques to calculate a thermally sustainable power budget [5, 26] , and maintain it at runtime [9] . In particular, the authors in [26] propose a method to calculate a thermally safe power such that thermal constraints of the system are not violated. However, it does not consider the positive feedback between leakage power and temperature. The work in [5] proposes a framework called TSocket which evaluates the sustainable power budget for di erent threading strategies in a multiprocessor system. These studies do not address the problem of calculating the thermal xed point and the conditions on existence of a xed point. They also employ mainly simulation tools such as HotSpot. In contrast, our approach of nding the maximum safe power is implemented and validated on a real hardware platform.
A number of studies analyze the positive feedback e ect between power consumption and temperature [15, 23, 32] . In particular, the authors of [23] show that a thermal runaway is implied when the second order derivative of temperature with respect to time is positive. As pointed out by the authors, this criterion can be successfully applied during design time analysis and simulation. However, it cannot be used as a preventive measure at runtime, since it is satis ed only after the thermal runaway kicks o . Similarly, the technique presented in [32] uses a simple junction-toambient heat removal model to predict a thermal runaway during burn-in reliability screening before shipping the chip. The authors in [15] use the temperature dependence of leakage to increase thermally sustainable power dissipation through activity migration. However, neither one of these techniques addresses the problem of calculating the xed point when there is no thermal runaway. Our work addresses this problem by rst deriving the conditions for the existence of a xed point. When the xed point exists, we provide the region of convergence for the power-temperature dynamics. Then, we predict the stable xed point of the system. Hence, it can be used to guard against power attacks that aim to induce damage by elevating the temperature [7, 13, 20] .
PRELIMINARIES AND OVERVIEW
This section rst presents the power consumption and temperature models required for the proposed analysis. Readers familiar with these models can jump to Section 3.2, where we summarize the challenges and give an overview of the proposed approach.
Power and Temperature Models
Suppose that there are M processors in the target system, as summarized in Table 1 . We can express the power consumption of processor i as the sum of dynamic and leakage power consumption:
where C sw,i is the switching capacitance, V i is the supply voltage and f i is the operating frequency. The leakage current I l eak,i , which depends on the temperature T , can be approximated as the sum of the gate leakage and subthreshold current as:
where I ,i is the gate leakage, A s is a technology constant, W i /L is the ratio of the e ective channel width to channel length, k is the Boltzmann constant, q is the electron charge, V GS,i is the gate to source voltage, V t h,i is the threshold voltage, and n is the sub-threshold swing coe cient [19] . For notational convenience, we consolidate the technology and design parameters by introducing the following constants:
Note that κ 1,i > 0, while κ 2,i < 0 since V GS,i < V t h,i for sub-threshold voltages. In summary, the power consumptions of the processors in the target system can be denoted by the M × 1 vector
, where P i is obtained using Equations 1-3 as:
Suppose that there are N thermal hotspots of interest. The dynamics of the temperature can be expressed using the power consumption vector P and thermal capacitance and conductance matrices [17, 30] . We employ the following discrete-time state-space system to model the temperature, since the measurements and control decisions are made in periodic intervals:
This equation expresses the temperature of N cores as a function of the power consumption of M sources, e.g., the big/little CPU clusters, GPU and memory. The N × N matrix A describes the e ect of temperature in time step k on the temperature in the next time step. The N × M matrix B, on the other hand, describes the e ect of each power source on the temperature in the next time step. When we plug the M × 1 power consumption vector P from Equation 4 into Equation 5 , we obtain the following system of nonlinear equations: 
The number of thermal hotspots and processing elements (resources) in the SoC, respectively.
T[k]
N × 1 array where T i [k], 1 ≤ j ≤ N denotes the temperature of the i th hotspot.
T
The maximum (scalar) steady state temperature over all thermal hotspots.
The maximum thermally safe temperature and power, respectively.
the power consumption of the i th resource.
Technology dependent parameters of the leakage power for the i th resource.
a, b
Parameters of the single input single output model which describes the thermal dynamics.
T , α, β Auxiliary parameters introduced in Equation 10.
Challenges and Needs Assessment
The nonlinear system given in Equation 6 shows the positive feedback between the power consumption of M processors and N temperature hotspots. Solving this problem at runtime presents three major challenges. First, a stable solution may not even exist due to the nonlinear positive feedback. Second, the power consumption and temperature at time k depend on their values at time k − 1 due to the temperature dependence of the leakage power. This dependency requires an iterative solution, which is intractable for runtime analysis, as shown in Section 5.7. Finally, we need to nd the maximum power consumption P * i that guarantees a thermally safe operation. A simple iteration can nd temperature using the power inputs, but the opposite direction requires a rigorous approach.
The proposed approach addresses each of these challenges one by one as outlined in Figure 2 . We rst determine whether a stable xed point exists using the current power and temperature measurements. Then, we describe how the stable xed points can be computed e ciently. Finally, we derive compact analytical formulae to compute the time to reach the xed point and the constraints on the maximum power consumption P * i to avoid thermal violations. The proposed approach is called with the default frequency governors, which typically have a period of 100 ms.
THERMAL FIXED POINT ANALYSIS
Let the scalar T denote the maximum steady state temperature over all thermal hotspots, i.e.:
Since the thermal safety is determined by the maximum temperature, we focus on the hotspot with the highest temperature. At steady state (k → ∞), we can model the temperature of each hotspot using the following single input single output (SISO) system: where 0 < a < 1 and b > 0 are the parameters of the reduced order system. We emphasize that the reduced order model is obtained through system identi cation, as described in Section 5.2. Using a SISO model does not mean that we consider only one core. Unlike a crude approximation that directly uses the corresponding entries in A and B matrices, the coe cient a in our model re ects the thermal coupling between di erent hotspots, and coe cient b re ects the impact of multiple power sources. We employ a SISO model, since it enables an in-depth theoretical analysis with powerful insights for practical scenarios. We discuss the solution to the multi-input multi-output (MIMO) case at the end of this section.
To isolate the impact of the temperature, we rewrite the total power consumption given in Equation 4 as:
where P C = C sw V 2 f + V I represents the temperature-independent component, and subscript i is dropped to simplify the notation. Substituting Equation 8 into Equation 7 gives:
If this equation has feasible solution(s), we can say that xed points exist. Since Equation 9 is the focal point of the subsequent analysis, we introduce the following change of variables to leave the exponential term alone and facilitate the subsequent analysis:
With this change of variables, we rewrite Equation 9 as:
where α > 0, β > 0. We will rst derive the conditions on α and β such that Equation 11 has a solution.
Then, we will go back from the transformed domain to the original parameters. Finally, we will show how to compute the constraint on the maximum power consumption P * C required to avoid thermal violations, given a maximum temperature constraint T * .
Necessary and Su icient Conditions for the Existence of Fixed Point(s)
The domain of the auxiliary temperature is given byT ∈ (0, ∞), sinceT = −κ 2 /T where κ 2 < 0. Hence, the right-hand side of Equation 11 lies in the interval (0, 1). That is, 0 < βT (1−αT ) = e −T < 1.
Since this condition requires thatT < 1/α, we can take the logarithm of both sides while the equality holds, i.e., ln β + lnT + ln(1 − αT ) = −T Equation 11 has the same xed points as the following equation:
The important properties of F (T ) employed in our analysis are summarized in the following lemma. L 4.1. F (T ) given in Equation 12 satis es the following properties: (1) F (T ) is a concave function in the intervalT ∈ (0, 1/α).
(2) F (T ) has a unique maxima atT m , which is given by:
is an increasing function in the interval (0,T m ) and a decreasing function in (T m , 1/α).
P
. The proof is provided in Appendix A.1, while an informal explanation is provided below to maintain a smooth ow. 
Hence, Equation 11 has two xed points if and only if β ≥ 2 T m + 1 e −T m whereT m depends only the parameter α and it is de ned in Equation 13 . Otherwise, it has no solution.
P . The proof is provided in Appendix A.2.
At runtime, we rst computeT m using Equation 13 . Then, we check the condition given in Theorem 4.1. If it is not satis ed, we conclude that there will be a thermal runaway. This knowledge can be used to throttle the cores aggressively or enter an emergency state. Otherwise, we proceed to compute the maximum allowed power consumption that will avoid thermal violations. 
Stability of the Fixed Points
The stability of the xed points is determined by the behavior ofT as function of F (T ). To provide a smooth ow, we summarize this behavior using the following lemma. This lemma allows us to determine the stability characteristics of xed points by inspecting the sign of the function F (T ). The following theorem summarizes the stability results using this lemma, which is also illustrated with the arrows in Figure 3(a) . 
P
. The proof is provided in Appendix A.4, while an informal explanation is provided below.
Theorem 4.2 states that the temperature proceeds along the arrows shown in Figure 3 during xed point iterations. When F (T m ) < 0, i.e., no xed point exists,T decreases at each temperature iteration no matter where it starts. Therefore, there is a thermal runaway as illustrated in Figure 3 (a). When F (T m ) > 0, there are two xed points denoted byT u andT s . Any iteration starting in the interval (0,T u ) diverges toT → 0, since F (T ) < 0 in that interval. Conversely, any iteration starting in the interval (T u ,T s ) will converge toT s , since F (T ) > 0. That is, any iteration starting in the interval (T u , 1 α ) will also converge toT s , as illustrated in Figure 3(b) . Therefore, we conclude that T u ∈ (0,T m ) is unstable, whileT s ∈ (T m , 1 α ) is stable. In summary, we derived the conditions for the existence of xed points and their stability regions in terms of the auxiliary temperatureT , α and β. Next, we will derive the constraints on the dynamic power consumption required for thermally safe operation.
From Temperature to Power Constraint
Suppose that the thermally safe temperature is given by T * . We need to work backwards starting with this constraint to nd the maximum allowable P * C (i.e., the sum of dynamic and gate leakage power consumption). To this end, we rst show the existence of P * C , and then, we provide a compact formula to compute P * C in terms of T * and the system parameters. Existence of P * C : We know that the system converges to a stable xed point when there is no dynamic activity (P C → 0). This means that the necessary and su cient condition given in Theorem 4.1 is satis ed, i.e., β ≥ 2 T m + 1 e −T m . Now, consider the other extreme where P C grows inde nitely due to heavy dynamic activity. Equation 10 shows that α will also grow, since it increases linearly with P C . Growing α, in turn, implies thatT m → 0 according to Equation 13 . Hence, as the dynamic activity (P C ) increases, the term Figure 3(a) . Therefore, F (T ) decreases monotonically as P C increases, and crosses the x-axis zero at P * C . Due to the monotonic behavior of stable xed points as a function of P C , we conclude that there exists a maximum allowable P * C whose thermal xed point does not exceed T * . P * C for a given T * is computed as follows at runtime:
(1) Compute the auxiliary temperature that corresponds to T * using Equation 10 asT * = −κ 2 /T * (2) GivenT * , nd the corresponding α * using Equation 11:
In summary, we conclude that any power value such that P C < P * C has a thermal stable xed point less than T * .
Time to Reach the Stable Fixed Point
The existence and speci c value of the xed point can enable DTPM algorithms to determine whether the power/temperature trajectory moves towards a dangerous operating point. In addition to this, the expected time to reach to that point reveals how soon the dangerous zone will be reached. Therefore, DTPM algorithms can utilize the timing information to determine if there is any imminent possibility of a thermal violation. Moreover, this estimate can also be used to decide how long the current power consumption can be sustained without violating the thermal limit. To obtain a computationally e cient estimation, we employ the following exponential model:
where T init is the initial temperature, T f ix is the stable xed point and τ is the time constant. We use the discrete time stamp kT s , since the temperature is sampled with a period of T s . Two temperature readings separated by a delay DT s , i.e., T [kT s ] and T [(k − D)T s ], can be used to estimate the time constant τ as:
In our experiments, D = 10 and the time to xed point computation, like the other computations in the proposed technique, are repeated with T s = 100 ms.
Solution for the MIMO Case
The SISO model given in Equation 7 can be extended to explicitly show the impact of each power source on the thermal hotspot of interest T i , as follows:
where P i terms are de ned in Equation 4 . In order to nd the xed point temperatures, we need to solve the set of equations given by:
where
This is a nonlinear equation due to the exponential terms in P i . Generalizing our solution to obtain similar closed-form formulae for the MIMO model requires solving the nonlinear equations in Equation 17 . The standard approach to this problem is to nd a good initial point, and use a root search algorithm for nonlinear equations. By following this approach, we use our SISO solution for each hotspot as the "initial point". Then, we employ an iterative search to nd the roots of nonlinear equations. More speci cally, we solve Equation 17 via Newton's method [1] where our SISO solution is used as the initial point. Our experimental measurements show that the closed loop solutions presented in Sections 4.1-4.4 are very close to the roots of the MIMO system. Hence, iterations converged to the roots of the MIMO system in less than ve iterations in the worst case. Moreover, our simulations show that the region of convergence is quite large. This means that even if our initial points are not accurate, we will still converge to the solution.
EXPERIMENTAL EVALUATION

Experimental Setup
The proposed xed point prediction scheme is evaluated on the Odroid XU3 board which employs Samsung Exynos 5422 SoC [12] . Exynos 5422 is a single ISA big.LITTLE SoC consisting of 4 Cortex-A15 (big) cores, 4 Cortex-A7 (little) cores and a Mali GPU. The Odroid XU3 board also comes with thermal sensors to measure the temperature of each big core and the GPU. It also includes current sensors to measure the power consumption of the big cluster, little cluster, the memory and the GPU. The proposed approach is invoked with the default frequency governors every 100 ms. That is, we periodically sample four power consumption and ve temperature values. The proposed technique takes 75.2 µs of this 100 ms period, as detailed in Section 5.7. We evaluated the proposed approach on commonly used benchmarks, including four from the MiBench suite [10] , three from the PARSEC suite [3] , one SPEC [14] , and a matrix multiplication kernel. These benchmarks are listed in the rst column of Table 2 .
Parameter Characterization
The stability and safety analysis presented in Section 4 does not make any assumptions about speci c parameter values. However, we need to characterize the system parameters, such as the leakage current and the state-space model parameters, in order to evaluate the accuracy of our analysis. As the rst step, we characterized the leakage current parameters I , κ 1 , κ 2 for the Odroid XU3 board using a methodology similar to those presented in [24, 31] . More precisely, we placed the target device in a furnace and collected data at ve di erent temperatures ranging from 40 • C to 80 • C, while running a light workload to make sure that the temperature is not increased due to the dynamic activity. Since the dynamic power consumption remained constant, the measured power consumption di erence between two experiments gives the di erence in leakage power at two known temperatures. We used measurements at ve di erent temperatures to nd the unknowns I ,i , κ 1,i , κ 2,i for the big core cluster and the GPU as these are the dominant sources of leakage in our board.
The second step is estimating the A and B matrices used in Equation 5 . In order to characterize the A matrix, we ran the big cluster at a xed frequency level until the temperature rises to a steady state value. Once the steady state was reached, we turned o the big core and let the device run until the board cools down. We repeated this for multiple frequency levels as shown in Figure 4 . We also repeated the complete experiment multiple times and then averaged the results to account for any variations in the system behavior over time. The data from this experiment helps us determine how the temperature at the current time step a ects the temperature in the next time step. In a typical run, di erent workloads lead to varying utilization of system resources, which lead to di erent operating frequencies. To determine the e ect of each power source on the temperature, we excited all the frequency levels of the A7 cores, A15 cores and the GPU in a pseudo-random bit sequence (PRBS), and recorded the temperature. This helps us to identify the e ect of each power source on the temperature under varying conditions, i.e., characterizing the B matrix. This data is then used in conjunction with the data from the previous experiment to jointly identify A and B. The A and B matrices are used to predict the temperature given an initial temperature and power consumption at each time step. As we can see in Figure 4 , the computed temperature closely follows the measured temperature. In this work, we performed characterization on one Samsung Exynos 5422 chip. The parameters obtained for leakage current and the thermal model are used to evaluate the accuracy of the xed point analysis on three di erent instances. In general, the chip manufacturers divide their products into multiple bins, such as low power and high performance parts, and sell them using di erent stock keeping units (SKUs). Therefore, our analysis can be performed for each SKU of a given commercial chip.
Validation of Fixed Point Approximation
Evaluation at xed frequency: To evaluate the accuracy of the proposed xed point analysis, we rst performed experiments at various power levels ranging from 0.38 W to 1.01 W, while running a light load on the CPU. After running the system until a steady-state xed point is attained, we recorded the average power consumption and nal temperature. We also estimated the average dynamic power consumption for each experiment by subtracting the estimated leakage power from the measured total power. The rst three columns in Table 2 provide these values. Then, we used the estimated dynamic power consumption and the ambient temperature to analyze whether a stable xed point exists or not. After con rming that the analysis result is correct, we computed the xed point temperature using the proposed technique. The fourth column of Table 2 , labeled as "Comput. Fixed Point Temp.", lists the results for each power consumption level. As summarized in Table 2 , the xed point prediction is within 1 • C of the empirical result for four power levels. The largest observed di erence was 1.1 • C, which implies 2.0% error with respect to the measured xed point. This error is quite acceptable for our system, as the temperature sensors operate at an integer precision, which can introduce an error in the measurement.
Evaluation on benchmarks:
We also evaluated our analysis technique on commonly used benchmarks which represent real-world applications. We ran these applications for minutes to capture the temperature dynamics, as summarized in the last column of Table 2 . During these experiments, we overwrote the safe temperature limit on the board such that the temperature could rise beyond 110 • C. The results of the xed point evaluation for the benchmarks are summarized in the lower part of Table 2 . We observe that the xed point prediction error increases slightly compared to the xed frequency experiments. This is expected, since there are larger variations in the power consumption. However, the average prediction error is still only 3.0 • C, and in the worst case, the error is 5.8 • C. Finally, the average prediction error across all experiments is 2.3 • C.
Power-Temperature Trajectory
We also performed experiments to compare the measured power-temperature trajectory against the xed point predicted by our analysis. Trajectories for di erent initial power consumption and temperature are shown in Figure 5a . The initial points are denoted by black • markers and simulated trajectories are plotted using dashed black lines. For each of the initial points, the dynamic power starts with a small value and then rises to a steady value depending on the workload. For example, consider the trajectory that starts from the initial point (0.5 W, 50 • C). The dynamic power increases until about 0.87 W, and then it remains steady at 0.87 W. The arrows show how each trajectory converges to the stable xed point at (1.02 W, 66.6 • C) shown by a larger black . These trajectories are obtained by solving the set of nonlinear equations given in Equation 6 iteratively. Given the same assumptions, the proposed approach nds the stable xed point as (1.02 W, 66.6 • C), (a) Analytical, experimental and simulation results when the fixed point is lower than the thermally safe temperature. (b) Analytical, experimental and simulation results when the fixed point is higher than the thermally safe temperature. as denoted by the red marker on the same gure. Hence, the prediction exactly matches the simulated trajectory. Furthermore, we performed one more set of experiments on the board by imposing a dynamic power consumption close to the value used in the simulation. The trajectory measured during this experiment is plotted using solid blue line in Figure 5a . This trajectory almost overlaps with the simulation and converges to (1.01 W, 66.0 • C). As a result, the di erence between the empirical and theoretical xed points is only 0.6 • C, which agrees with the results in Table 2 . In order to illustrate the case where the temperature converges to a value beyond the thermal limit of our board, we performed one more set of experiment and simulation, as shown in Figure 5b . We disabled the thermal throttling to let the temperature rise beyond 110 • C. Similar to Figure 5a , the dynamic power starts with a small value until it increases to 4.23 W. The dashed black lines show the simulated trajectory followed by the power and temperature for each initial point. We note that the simulation for each initial point converges to the xed point at (5.14 W, 149.5 • C) marked with a larger black . With these conditions, the proposed approach nds the xed point as (5.14 W, 149.6 • C), as denoted by the red marker. The di erence between the analytical solution and simulated trajectory is 0.1 • C, hence, they almost overlap in Figure 5b . Moreover, we performed one instance of the experiment on the board by choosing the initial point as (0.4 W, 40 • C). We also forced the dynamic power consumption close to 4.20 W, i.e., the value used in the simulation. The solid blue line in Figure 5b shows that the actual trajectory closely follows the simulated trajectory until the temperature reaches 93 • C. At that point, we stopped the experiment to avoid damage to our board. Power-Temperature with Thermal Throttling: To further evaluate the validity of our xed point prediction, we performed two sets of experiments: one with and the other without throttling. We used the FFT benchmark from the MiBench suite, as it exhibits a representative behavior and leads to higher temperature xed point than the other applications. The proposed xed point calculation is performed every 100 ms in the Linux kernel. This allows us to analyze how the xed point prediction evolves over time. Red markers in Figure 6 show that the xed point predictions vary from about 95.0 • C to 105.0 • C. This variation is expected, since the dynamic power changes during the execution of the application. However, our analysis results still match closely with the measured trajectory which approaches the predicted xed point. We repeated the previous experiment, this time by incorporating a thermal throttling policy. The power consumption starts from a small value of 0.48 W and then increases to about 2.40 W after the benchmark starts execution. When the power consumption is 2.40 W, the xed point is predicted as 99.0 • C. As the power consumption steadily increases due to the increased leakage, the xed point prediction increases to 102.0 • C, as shown by the region A in Figure 7 . The solid blue line shows that the measured trajectory advances towards the predicted xed point, as in the previous experiment. This time, the DTPM policy is triggered to throttle the frequency as soon as the temperature reaches 85.0 • C. Throttling starts reducing the power consumption, which in turn slows down the temperature increase. Figure 7 shows that the reduction in the power consumption is re ected in our xed point prediction. More precisely, the proposed technique updates the xed point prediction as (2.07 W, 87.7 • C). At the same time, the measured power-temperature trajectory changes its course. We observe that it starts converging to (2.06 W, 86.0 • C), which matches very well with our prediction. This experiment illustrates that our analysis can adapt to changes in the dynamic power consumption and predict the xed point accurately.
Power Constraint from Temperature
We also evaluated the change in the maximum power constraint P * C when the temperature constraint T * is varied. In particular, we swept the value of T * from 50 • C to 105 • C and calculated the value of P * C . The black line in Figure 8 shows the maximum the power constraint P * C found using the analytical approach outlined in Section 4.3. To validate the analysis results, we set the temperature constraint T * as the theoretical xed point. Then, we empirically found the power constraint P * C on the target board. The red markers in Figure 8 show that the measured results are indeed on the trend found by the proposed technique. This result can be easily used as a guideline to decide the maximum power level that a chip can be operated based on a given temperature constraint.
Evaluation of the Time to Reach Fixed Point
We used Equation 16 to estimate the time at which the temperature will reach the xed point for the FFT benchmark. Then, we compared this estimate to the actual time to reach the xed point. Figure 9 shows that our estimate provides a lower bound for the time to reach the xed point. A lower bound is useful, since it can be used safely to avoid thermal violations. We also see that the estimation improves in accuracy as the benchmark continues to run. In summary, this estimation can be used by DTPM policies to decide how long the current power consumption level can be sustained without violating the thermal limit.
Implementation Overhead
Our theoretical analysis and proofs enable us to derive analytical solutions that can be implemented with negligible overhead. To quantify this overhead, we implemented the proposed solutions on Android 4.4.4 / Linux 3.10.9 kernel user space, and measured the overhead on the Odroid XU3 mobile platform. Our implementations are invoked periodically with the default frequency governors, i.e., every 100 ms. We observed that reading the sensors takes 13.8 µs, while computing the xed point estimate for the SISO case takes 6.8 µs. We can achieve this low overhead since both 1/α andT m have closed form solutions. Once we have the xed point estimate, the Newton's method to solve the MIMO case takes about 50 µs. Similarly, it takes 1.1 µs to compute the maximum allowable power consumption P * C given a temperature constraint. This small overhead is enabled by the three closed form equations framed at the end of Section 5.7. Finally, computing the time to reach the stable xed point takes 3.5 µs. The combined overhead of all three computations is about 75.2 µs out of 100 ms, i.e., 0.075%. When the implementation is moved to the kernel, the execution time of the functions reduces by about 30%. In contrast, an iterative approach cannot determine the existence and stability of xed points. Furthermore, it can predict the temperature given the power consumption, but it cannot compute the maximum allowable power consumption P * C . Finally, temperature prediction over an interval of 1000 s alone takes about 550 µs. The iterative approach also has an average error of more than 10 • C, which is higher than that of our approach.
CONCLUSION
This paper presents a theoretical analysis of the stability of the power consumption and temperature dynamics. First, we show that the power-temperature dynamics have either no xed point or two xed points, as a function of the system parameters and the dynamic power consumption. When there are two xed points, we prove that one of the xed points is stable, while the second one is unstable. We also determine the region of convergence, which is important for safe thermal operation. Third, we derive an analytical formula to compute the maximum dynamic power consumption that guarantees a thermally safe operation. Experiments and simulation results show that our analysis can be used to predict the xed point within 0.1 • C to 5.8 • C accuracy with only 0.075 ms computational overhead. Hence, the proposed approach can be used to take proactive DTPM decisions, and detect security threats which force the system to operate beyond the thermal limit.
A APPENDIX
A.1 Proof for Lemma 4.1
Note that F (T ) approaches −∞ at both end points. Now, the rst and second derivatives of F (T ) with respect toT are evaluated as:
Since F (T ) < 0 forT > 0, this function is concave. By setting F (T ) = 0, we can show that the maxima of F (T ) occurs whenT m = 
Moreover, due to the concavity of F (T ) is an increasing function on (0,T m ) and decreasing function on (T m , 1 α ), as depicted in Figure 3 .
A.2 Proof for Theorem 4.1 P . Function F reaches its peak atT =T m , and its solution contains two points if and only if F (T m ) ≥ 0 (Figure 3(b) ). Otherwise, if F (T m ) < 0, it does not intersect the x-axis and there is no solution (Figure 3(a) 
Hence, β ≥ 
We can rewrite this equation by a change of variable, i.e.,
, as:
After substituting the de nitions of α and β in 10 and rearrangement of terms, we obtain: Equation 12 ) has a solution, there are two xed points of the function F (T ), i.e.,T u andT s such that 0 <T u <T s < 1 α . Since F (T ) is a concave down function, the sign of this function atT ∈ (0,T u ) is negative and by Lemma 4.2, the value ofT decreases to 0 onT ∈ (0,T u ) just like the no-solution case, and results in thermal runaway. On the other hand, the sign of the function F (T ) is positive onT ∈ (T u ,T s ) and negative onT ∈ (T s , 1 α ). By Lemma 4.2, the value ofT increases inT ∈ (T u ,T s ) and decreases inT ∈ (T s , 1 α ) towards toT s on both intervals. Hence, we conclude thatT s is a stable andT u is unstable xed point.
