Abstract: Wide voltage range circuit has got widespread attention where in-situ timing monitoring based adaptive voltage scaling (AVS) becomes necessary to reduce the design margin. However, the severe PVT variations across near-threshold to super-threshold cause too many critical paths to be monitored. Here activation oriented monitoring paths selection method is proposed to reduce the monitored paths for wide voltage IC. The minimum delay value of the longest activated path is found by dynamic timing analysis and set as the selection threshold. Those paths longer than this threshold by STA analysis are selected to be monitored. Applied on a 40 nm AVS Systemon-Chip, it reduces the monitoring paths to only 22% of all critical paths with remarkable power gains under 0.6 V-1.1 V.
Introduction
Recently, on-chip timing monitors and adaptive voltage scaling (AVS) circuits [1, 2, 3, 4, 5, 6, 7, 8, 9 , 10] had been proposed to reduce the design-time margin caused by severe PVT (Process, Voltage, and Temperature) variations, such as error detection and correction techniques of Razor [1, 10] , ARM error detection [2] , Razor-lite [3] , Bubble Razor [4] and improved monitor [8] , as well as errorprediction techniques of Canary Flip-flop [5] , HEPP [6] and replica circuits in reconfigurable devices [9] . They use timing monitors to detect or predict timing errors, and then adaptively tune the supply voltage accordingly to reduce power. However, wide voltage range circuits [7] (from super-threshold down to nearthreshold voltage, NTV) have brought new challenges. First, the timing variation caused by PVT variations becomes much more severe under NTV, which might cause more potential critical paths. Plus, many non-critical paths will become critical under NTV, resulting in a large number of paths to be monitored. For example, HEPP works under NTV but monitors 70% of the critical paths [6] , bringing a lot of power consumption and area cost. However, seldom researches addressed this problem. Here a timing monitoring paths selection method for wide voltage IC is proposed based on real path activation. It utilizes both static and dynamic timing analysis to find the minimum delay value of the longest paths, and monitor those paths with longer delay than this value. This method is applied on a 40 nm Systemon-chip (SoC) based on Cortex-M3 CPU with supply voltage between 0.6 V-1.1 V. After these paths selection procedures, the number of inserted monitors under wide operating range is decreased to 22% of total paths, much smaller than the 70% of NTV HEPP [6] . Simulation results show that at Super-Vt region, 28.5%-54.3% of power saving is obtained according to different PVT corners, and at NTV, 24.5% to 73.2% power saving is obtained, with only 4.7% area overhead.
Monitoring paths selection method
Traditional monitoring paths selection depends entirely on the static timing analysis (STA) results. In fact, by now, there are seldom specific methods on how many timing paths should be monitored. As for applications as processors, usually parts of or all registers in pipeline stages are replaced by monitors, such as 100% monitors inserting rate for Bubble razor [4] , and 17% for ARM processor [2] in Super-Vt region. For NTV, much more paths need to be monitored since path delay distribution variation is much worse, such as 70% for NTV HEPP [6] . However, STA results are only based on topology structure that the longest paths might not be activated to be real critical paths.
Therefore, here the activation of critical paths is also considered. The activated critical paths are found by dynamic simulation using typical working scenarios and a large number of random input vectors. Then the minimum delay of these activated paths is selected as the threshold. By monitoring paths with longer delay than it from STA results, the most likely effective critical paths are all monitored. Therefore, the number of monitored paths is reduced with a practical guidance.
The whole working process includes the following five steps, and the first four steps are shown in Fig. 1 . Firstly, select those paths with endpoint flip-flops, and obtain their static paths delay distribution by STA under one kind of PVT, as shown in Fig. 1(a) . Secondly, simulate the actual dynamic delay time of all the above paths in every clock cycle by random input vectors, with the dynamic delay curve of one path shown in Fig. 1(b) as an illustration.
Thirdly, obtain the envelope of all total N paths delay curves as Fig. 1(c) . It represents the maximum dynamic delay value of N timing paths in each clock cycle, which also stands for the most critical timing paths in each moment. Fourthly, combine the dynamic simulation and STA as shown in Fig. 1(d) , where the results of the dynamic envelope are anti-labeled to static path delay distribution. The vertical bar in Fig. 1(d) represents the minimum delay value of the longest delay path at each clock cycle during dynamic simulation. Suppose i represents the i-th clock cycle, j represents the j-th path, and dij represents the dynamic delay length of the j-th path at i-th clock cycle, we define: Here the critical selection point represents the most critical path that is activated during circuit operation. Therefore, those paths longer than this one are in need of being monitored, which is in Set B1, as the shaded portion in Fig. 1(d) . Finally, repeat the above analysis to select the path sets fB1; B2; B3 . . . ; Bng under other PVT conditions. These subsets might be a little different due to different PVT variations. Therefore we then take the union of these subsets to find a final set B of monitoring paths. Therefore, longest delay paths are in the selected collection B at any time, which ensures the effectiveness of the in-situ monitoring AVS. Although this proposed method may be failing in some unfortunate cases if the input vectors of dynamic simulation are not enough, this failing probability is quite small since the longest paths longer than the threshold are all monitored.
This method is applied on an AVS SoC circuit consisted of on-chip timing prediction monitors inserted in Cortex-M3 CPU and peripheral modules. It is designed under SMIC 40 nm CMOS process with a sign-off frequency of 250 MHz in super-Vt and 20 MHz in NTV. To obtain precise path delay with RC parameters, circuit physical design is finished first for extracting SPEF file to accurately position the paths. Here a special IC design step of ECO (Engineering Change Order) is used to insert the timing monitors into the circuit. The total number of inserted monitors and the AVFS module results in only 4.7% area overhead. 
Simulation results

Monitoring paths optimization results
Firstly, STA and dynamic timing analysis based path selection is performed according to Section 2. Delay envelope of all paths is obtained after many times of simulations with a number of random inputs under SS corner, 1.1 V and 125°C, as shown in Fig. 2(a) . It can be found that the minimum delay for all clock cycles is 2.28 ns. Then those paths whose delay time is longer than this threshold are selected for monitoring, as shown in Fig. 2(b) , which are those paths at the right part of 2.28 ns (Set B as shown in the figure) , with selected path number of 462. Then by repeating this selection procedure under other PVT corners including NTV, as shown in Table I , the final paths is 708 out of total 1917 critical paths by combination of all these sets.
It should be noted that the threshold value is very critical that a small variation may cause large difference in the STA distribution figure for path selection. But it can also be utilized to reduce the selected path number more aggressively. Based on the observation that the minimum delay peaks in the dynamic delay envelop figure are only a few, therefore, instead of setting the threshold as the most minimum value of the longest path, we can choose the second or third minimum value, and at the same time add these specific longer paths (those paths corresponding to the first, the second and/or the third minimum delay value) into the final monitored set. Therefore, monitored paths number is reduced further while the activated longer paths are also monitored. For example, the second minimum value in Fig. 2(a) is 2.30 ns, and the third minimum value is 2.34 ns with four times. Here we choose 2.34 ns as the threshold, and at the same time, add these paths corresponding to 2.28 ns, 2.30 ns and 2.34 ns to the monitoring set, we obtain a small-sized set C in Fig. 2(c) with 345 paths for SS corner, 1.1 V and 125°C. The selected paths numbers under other corners also shrink a lot. And finally we decreased the monitoring paths to 22% (422) paths of the total critical paths under wide voltage range and all PVT corners.
Adaptive voltage scaling verification
First, chip functions are verified by VCS under all corners. Then HSIM-VCS cosimulation is used to verify the voltage tuning effect, where critical paths part is Final paths 708 Number simulated by HSIM and other part by VCS. The supply voltage is modeled by C language according to a real DC/DC chip parameters with output range of 0.6 V-1.2 V and a tuning step of 20 mV. AVS tuning results in Super-Vt is shown in Fig. 3 for TT corner 25°C, where Freq_Slow is the half frequency division signal, Clk_slow is the system clock controlled by Freq_Slow, Pre_Error is the timing warning signal, Vout is the operating voltage, Volt_ctrl[0] is the voltage decrease signal and Volt_ctrl [1] is the voltage increase signal used for voltage control. Here the working frequency is set to maintain 250 MHz with 1.1 V initial voltage. At this time, there is timing margin, thus voltage is decreased step by step (20 mV/step) until 0.84 V when the first timing warning appears, then the frequency is turned to half immediately at the next clock cycle to prevent real timing errors. And when there are three consecutive timing warnings, the voltage needs to be increased. Finally, the stable voltage maintains around 0.84 V. During voltage regulation, the working frequency (Clk_slow) remains to be half to avoid real error caused by fast variation. The NTV AVS tuning is similar but not shown here due to limited space.
The total circuit power and AVS power gains under different corners are shown as Fig. 4(a) and (b) for Super-Vt and NTV separately. Here each working frequency is the signoff frequency, and worst case power consumptions for both high and low voltages are set to be the baselines. For Super-Vt, the baseline is SS Corner, 0.99 V and 125°C with power consumption of 12.46 mW. Our AVS obtains considerable power gains between 34.3% (SS, 125°C) to 55.2% (FF, −25°C). For NTV region, it can save up to 24.5% (SS, −25°C) to 72.8% (FF, 125°C) power consumption compared to baseline (SS corner, 0.54 V, −25°C).
Table II compares our method with three representative adaptive techniques. ARM [2] and Razor-lite [3] work at Super-Vt voltage region, their monitoring paths insertion rate is low (17% or 20%) at Super-Vt, but they might increase dramatically under NTV. Compared with HEPP, it is only designed for NTV and needs to monitor up to 70% paths. 
Conclusions
In this paper, an in-situ timing monitoring paths selection method for wide voltage range circuit is proposed. It is paths activation oriented, which uses both static timing analysis and dynamic timing analysis to find the minimum delay value of the longest paths, and monitors those paths with longer delay. Since one cannot apply all combinations, we cannot ensure all the possible activated critical paths are included. However, this does not affect the monitoring effect, because hundreds of the longest paths from STA are monitored and the possibility that none of these paths is activated during circuit run time is too small that can be omitted. In another word, the accuracy of the proposed selection is not a problem.
In conclusion, our method can reduce the number of monitoring units by proper paths selection optimization, which reduces the additional power and area overhead caused by in-situ monitoring, and still gets remarkable power gains under both Super-Vt and NTV supply voltages.
Acknowledgments
Supported by National Natural Science Foundation of China (61574033), National 863 project (2015AA016601) and Open Research funding of State Key Laboratory of ASIC and System, Fudan University (2015KF010). 
