Abstract: The rapid growth in microprocessor's performance increases the power consumption significantly, resulting in rising microprocessor and system temperatures. High temperatures and eminent thermal variations of processors create severe challenges in system reliability, and cooling costs. Therefore, power and thermal management have become prominent issues of portable computer systems. Dynamic threshold hopping and supply voltage scaling (DTSVS) is an effective low-power design technique for reducing dissipated power. The technique adjusts the body bias (V BB ) or threshold voltage (V th ) and, correspondingly, the supply voltage (V dd ) adaptively in order to obtain a minimum power consumption and extend the processor's lifetime on a revolutionary scale. In this study, the optimal simultaneous combination set of the threshold and supply voltage (V th -V dd ) scaling is proposed to reduce power dissipation in the core of high-performance portable processors. Analysis and SPICE simulations are used to evaluate the presented theoretical basics and fundamentals. To ensure optimal V th -V dd sets; Particle Swarm Optimization (PSO) algorithm and Pareto Front (PF) solution are used. Both theoretical and simulation results show that, optimal amount of power consumption reduction has been obtained for different temperatures and workload environments.
Introduction
In recent years, portable computer systems, such as PDAs (Portable Digital Assistants), notebook computers, communications devices, cellphones, etc., have undergone a massive sustained growth in performance, functionality, and complexity because more CMOS transistors are packed on a single chip. This advanced technology translates into more power dissipation, bringing forth-extra temperature generation, constituting a vast demand for low-power microprocessor design. Therefore, power and temperature management has become significant concerns and a critical component of portable devices [1] .
The scaling of supply voltage (V dd ) reduces dynamic power consumption to very low limits but reduces the system performance simultaneously. V dd scaling requires a reduction in V th and consequently increasing leakage power exponentially [2] . In reverse order; when V th increases to minimize the leakage power stand in need to rise V dd to maintain the performance and consequently increasing dynamic power dissipation.
Portable devices operate under dynamic workloads, it is therefore crucial to minimize the dynamic and leakage power concurrently for high performance low-power applications. In order to ensuring low power operation for efficient power and temperature aware design, finding optimal combination set of V th -V dd that ensures the required performance of the portable processors is compulsory [3] .
The theoretical concept and design issues of optimal low power limits has strong face validity in recent publications such as; thermal aware dynamic voltage frequency scaling for energy consumption reduction [4] , optimal combination of supply-body bias voltage to the core and memory for a real micro-controller chip [5] , determining a set of supply-body bias voltage combinations to achieve the minimum energy consumption for a target frequency [6] , analysis and determination of the best switching management strategy for dynamic set of operating environment in terms of process choices, circuit activity, and temperatures [7] , an architecture based on the fully depleted silicon on insulator (FD-SOI) process technology for server operation near threshold region [8] , standby leakage power consumption using the body bias and pin reordering technique for nanometer-scale CMOS circuits [9] , finally, Adaptive supply and body voltage control for ultra-low power microprocessors using 22-nm technology model with considerable improvement in power and temperatures [10] .
The priority of low power operation for dynamic workloads and temperatures aware design for high performance portable systems, this work presented theoretical and practical limits of combined V th -V dd scaling for optimal power dissipation contribution. It endeavors an optimal solution for power and thermal issues. Using 22 nm, Low Power Predictive Technology Model V2.1 (PTM-LP), SPICE simulation results show that, considerable amount of power consumption reduction as well as thermal-aware challenges have been obtained.
2 Optimal threshold and supply voltage (V th-opt -V dd-opt ) sets for optimal power dissipation
Techniques for lowering power dissipation of today's high performance and portable computation processors are the most important issues facing manufactures. Power, temperature and performance are steadily correlated. Achieving ultra-low power is essential concurrently with maintaining the target performance for different workloads characterization. The two main sources of power dissipation in CMOS circuits are static power; which results from resistive paths between power supply and ground, and dynamic power; which results from switching capacitive loads between different voltage levels. There is a third source of power dissipation in CMOS circuits, short-circuit power, when both transistors in a CMOS inverter being ON at the same time while the input switches. The short-circuit component is small, therefore can be ignored. Static power is due to current sources and to leakage current when a transistor is nominally off. The total energy per operation of a chip thus can be written as presented in Eq. (1).
Where, E dynamic is the dynamic energy, E static is the static energy, a is the activity factor of the output node, C is load capacitance, i is an index that runs over all gates in the circuit, V dd is the supply voltage, W is the effective transistor width, I s is the zero-threshold leakage current, V th is the threshold voltage and V o is the subthreshold slope and T C is the cycle time [11] .
Optimizing the energy of average inverters of a gate will yield an optimal operating point for the chip. Optimal point remains unchanged; therefore, the average energy consumed per micron of transistor width is shown in Eq. (2) .
Where, C eff is the average capacitance switched every cycle per micron of transistor width (typically C eff ¼ 2 fF/µm).
To determine how the delay (T d ) of an inverter varies with operating conditions using α power model for MOS device, as illustrated in Eq. (3) [12] :
Where, k 1 is a proportionality constant specific to a given technology node. The energy delay product (EDP) is a metric multiplied the average energy per instruction by the average inter-instruction delay, and their product reflects the success of the optimal design of low power MOSFET circuits. For optimal supply and threshold voltage scaling, EDP expression can be differentiate with respect to V dd and V th and set to zero to obtain V dd-opt as given in Eq. (4).
This equation provides the optimal supply voltage (V dd-opt ) as a function of V th . The V dd-opt can be easily regulate for each corresponding V th value. While, the optimal threshold voltage (V th-opt ) varies due to the corresponding body bias voltage (V BB ) and it can be approximated as presented in Eq. (5).
Where, V th0 is the threshold voltage with the zero body bias, and k 2 is a coefficient of the back gate bias [11, 12] .
The maximum operating frequency (f max ) for the optimal supply voltage is proportional to the reciprocal of the Delay, as given in the Eq. (6).
Where k 3 is are fitting coefficients. These equations are crucial, it provide the combination of V th-opt , V dd-opt and maximum possible clock f max yielding an efficient optimal power design.
Implementation of the DTSVS loop
The block diagram of the proposed combined V th -V dd tuning loop for optimal power reduction in microprocessor's unit is shown in Fig. 1 . Which describes a new power-minimizing loop by optimizing threshold voltage/body-bias and supply voltage control for minimizing leakage and active power dissipation in active and standby operation modes. It is applicable in nanoscale CMOS technology based on power and performance tradeoff to determine optimal threshold voltage V th or optimal body-bias voltages V BB (V BB-N =V BB-P ) and optimal supply voltages V dd (V th-opt -V dd-opt ).
The proposed loop is an adaptive V th -V dd tuning scheme that dynamically generates and adjusts the body bias voltages V BB-N and V BB-P combined in consonance with adaptive tuning of the supply voltage V dd situated on the power and temperature variation of the operating microprocessor's chip. It matching the body-bias with supply voltages for a real time track changes in power dissipation and desired performance.
The main feature of the loop is; generating a reference voltages (V ref ) and reference frequencies (f ref ) to compare with the two frequencies f 1 , and f 2 from LTS through CS-VSO circuits. The operation of this control loop is depending on the difference between these frequencies and the reference frequency (f ref ). Based on the frequency difference between given reference clock frequency, and f 1 , and f 2 , the DTSVS tuning loop modulate the transistor V th via V BB as well as the supply voltage V dd to reduce the runtime power dissipation. Since, Nano-technology requires the simultaneous scaling of supply and threshold voltages to meet everhigh performance goals while keeping ever-tight power constraints, the DTSVS loop continuous tunes the body bias control to track the optimal V th -V dd for a given workload. Instead of an external references clock frequency, a clock speed scheduler, which is embedded in the operating system or a workload monitor, the proposed loop determines the reference clock frequency at runtime.
The self-biased reference voltage generator operates thermal distribution detection and generates a corresponded reference voltage, and then use a CS-VCO to generate an output reference frequency correlated with the reference voltage. The configuration of the self-biased reference voltage generator is shown in Fig. 2 .
The V ref is a function of the ratio of two transistors widths, V th and V in . The circuit requires a stable reference to maintain a linear output over temperature with a great temperature sensitivity. To provide the body bias dependency as well as temperature variations for lower power microprocessors under dynamic workloads; The V BB is adopted to transistor M1 and M2 to produce different output voltages accordingly. The simulation results confirm a possible variation of the reference output voltage over the entire temperature ranges.
In order to ensuring the correct design specification, correct operation, and optimal power dissipation reduction as well as temperatures; all blocks of the loop are designed, connected altogether and simulated for different body bias and different temperatures. However, the design specification of the DTSVS tuning loop shown in Fig. 1 are given in the following figures.
The LTS circuit (Fig. 3) is monitoring the relative contribution of the leakage current and generates optimal body bias voltage by adjusting this value according to the saturated voltage. To make the controller system operate with accurate response of V A , V B , and V D variations, these voltages have to be converted to a higher frequencies using CS-VCO (Fig. 4) . The frequency detector (Fig. 5) detects any difference between the input frequencies to generates the control signals UP and DN to the charge pump to modulate the amount of charge stored in the loop filter (Fig. 6) . The LF circuit converts the current from CP circuit to a stable voltage to be rising/falling according to microprocessor's temperatures. Microprocessors have different ranges of temperature: idle, normal range, and maximum range, a (Fig. 7) is required to generate eight states in the output, for introducing all temperature ranges (32°C to 72°C). The digital pulses of ADC (B0, B1, and B2) become inputs to DAC circuit (Fig. 8) , which converts the digital pulses to analogue voltage; consequently, eight levels of voltages are generated depending on thermal status of the chip. The level converter circuit (Fig. 9 ) is used to generate V BBP from the V BBN depending on the relation V BB-P ¼ V dd þ jV BB-N j. Therefore, V BBP and V BBN are desired body bias to modulate threshold voltage.
PSO algorithm and PF solutions
In order to ensure the DTSVS tuning loop in finding optimal V th -V dd set Particle Swarm Optimization (PSO) algorithm and Pareto Front (PF) solution are used for optimal amount of power consumption reduction for different temperatures and workload environments.
The main problem and challenges of the DTSVS loop is enormous number of V th -V dd combination set for optimal power dissipation in microprocessors under dynamic workloads. The V th -V dd set conflict and compete with one another. The PSO is a stochastic population-based optimization method that had its origin in the works by Kennedy and Eberhart [13] . The algorithm involves solving multicombination set problem. However, the classical derivative based optimization cannot provide a suitable solution for large search spaces. The flowchart of the PSO algorithm is presented in Fig. 10 .
The PSO algorithm inputs are stipulation and fundamental values of different framework parameters. It handling number of agents in particles form, which compose a swarm. This swarm of particles stirring around the search space to find the best solution [14] . Each particle keeps track of its own best solution achieved called personal best (p best ) and best solution by neighborhood particles which is called as global best (g best ). The fundamental concept is moving a particle from p best to g best position using position and velocity updating. A particle moves towards a best solution called g best in a search space by updating its velocity and position as illustrated in Eq. (7) [15] .
Where, V kþ1 is the velocity of the particle at ðk þ 1Þ th iteration, X k is the current position of the particle, X kþ1 is the position of the particle at ðk þ 1Þ th iteration after updating from current position. r 1 and r 2 are the random numbers initial position values generated in the predetermined search space. Initial position vector is the initial p best value. P best at iteration ðk þ 1Þ is updated according as shown in Eq. (8) [16] .
The Value of g best is updated from minimum of all p best values as given in Eq. (9).
The total power dissipation and temperature are the objective functions that are optimized using V th -V dd scaling, which can be constructed from the transistor model and modes of operation or from dynamic workload lookup table. These values can be measures experimentally or estimated through simulation using different benchmark programs for all possible power-temperature combinations. The object functions employ fewer terms with strong ability to direct the optimization search to a predefined design specifications.
However, dynamic workload generates different temperature ranges for different power dissipation values. Basically for Intel Core i7 processors, temperatures are classified into temperatures between (32-72)°C to specify idle, normal and maximum temperatures respectively. This indicate that, for a specific load, in order to maintain optimal power and temperatures, there are a huge number of V th -V dd combination sets and the search space of the PSO becomes an acceptable very broad and the computation time increased dramatically [17] .
To minimize the computation time as well as the search space, Pareto dominance principles are used with the PSO algorithm. The Pareto optimal set is used for determining all possible of V th -V dd combination that can fetch thousands of possibilities [18] . To employ these combinations sets: it is clear that, none of the V th and V dd can be improved in value without degrading some of the other objective values. The Pareto front of non-dominated solutions is evaluated for the design space from all combination sets; it is a minimized set to limit the search space with new limited constraints. Evaluating the weighting factors is based on the multiobjective Pareto front for normalized power and temperature as objective functions with a good convergence.
Simulation results
All internal blocks of the block diagram of the tuning loop shown in Fig. 1 are designed using 22 nm Low Power Predictive Model (22 nm-LPTM) for different temperatures and simulated to generate optimal bias/threshold and supply voltages (V th-opt =V BB-opt & V dd-opt ) using specification of the Intel Core i7 microprocessor chip. Table I shows optimal detailed V BB -V dd sets for different temperatures. Table II shows V BB-opt -V dd-opt sets by simulation and PSO-PF for all temperature ranges. Fig. 12 Shows optimal power dissipation for V th-opt -V dd-opt sets and all temperature ranges. The proposed DTSVS loop has been designed and tested with different simulation environments under different temperatures as shown in the above figures and tables that are categorized into eight possible levels. The results have shown strict limitations of both power and temperature. It has provided a low-power/ temperature solution using combined threshold hopping "body bias" and supply voltage (DTSVS) technique.
The proposed DTSVS loop for power management and temperature was entirely confirmed based on PSO-PF through theoretical and software evaluation basics. Both results are valid and similar in relation to each other, but differing in absolute magnitude; because of parameters taken into consideration.
Conclusion
The proposed design of the tuning loop will be one of the major challenges for low power microprocessors for 22 nm designs. The design introduces the idea of power and temperature-aware design.
This study has provided an optimal power/temperature solution; the proposed loop has been designed and tested with different simulation environments, and for different temperatures.
Particle Swarm Optimization (PSO) algorithm and Pareto Front (PF) solution are used to evaluate and confirm the simulated V th -V dd combination sets.
It has been confirmed that a potential saving of percentages of the power dissipation reduction has been obtained; starting from 10% up to 25% for each body bias step voltage. The thermal reduction was in the range of 8°C-12°C for each body bias step voltage.
Therefore, the DTSVS tuning loop proves its superiority in reducing energy consumption as well as temperature reduction mechanisms, and it shows the greatest promise in effectively optimizing the processor's power and temperatures. 
