ABSTRACT In recent years, the wide-voltage-operating-range circuit has drawn great attention because of its ad-hoc performance and energy efficiency to meet the demands of various applications. The circuit can either obtain the best possible energy efficiency at low voltage or achieve high performance at nominal voltage. A big challenge is the severe Process, Voltage, and Temperature variations under the nanometer process. Thus, when working at the near-threshold region, it may result in timing failure and fails to achieve the possible high-energy efficiency. In this paper, an Adaptive Voltage Scaling (AVS) method based on in-suit timing monitor is proposed with a tunable detection window. It resolves the above problem by monitoring the paths' timing and adjusts the supply voltage adaptively. It is applied on a system-on-chip circuit consistig of a CPU, ESRAM, an AES cryptographic circuit, and peripherals. Fabricated using the SMIC 40nm CMOS process, it can work at 0.6 to 1.1 V with remarkable power savings. Simulation results show that in the superthreshold voltage region, the supply voltage can be reduced from 1.1 to 0.86 V, enabling a maximum of 50% power saving at the FF corner, −25 • C as compared to conventional non-AVS design. In the near-threshold region, the supply voltage is reduced to 0.48 V, with a power saving up to 70% at the FF corner, 125 • C as compared to a non-AVS design.
I. Introduction
With technology scaling and chip integration increasing, power consumption has become an important issue for IC design. In recent years, wide operating voltage range circuit has drawn great attention from industry and academia [1] , which expanded the chip operating voltage from superthreshold region down to near-threshold region or even subthreshold region. By enabling the circuit to operate over a wide voltage range, it helps to achieve the best possible energy efficiency while satisfying varying performance demands of the applications. This means, the chip can either work in super-threshold region to achieve high performance, or work in near-threshold region for low performance but high energy efficiency. However, there is still a significant challenge unsolved. Severer PVT (Process, Voltage and Temperature) variations in near-threshold region cause larger timing margins for digital circuit design [2] , [3] , which leads to a large power waste in near-threshold voltage region.
Many interesting research on adaptive circuits were proposed to solve the excess timing margins, such as in situ timing error monitoring, Adaptive Voltage Scaling (AVS), adaptive clock distribution or clock stretching methods [4] - [24] . Among which, the AVS technique based on in-situ timing monitor can adjust the supply voltage according to the monitored real-time timing of critical paths due to PVT variations. Therefore, the chip's power consumption can be decreased mostly by decreasing the excess timing margin reserved in design time.
Generally, there are two kinds of timing monitors. One is timing error detection and correction (EDAC). The other is timing error prediction. Razor [4] is a classic EDAC monitor to control supply voltage through real-time detecting the chip's working status. It is inserted at the endpoints of the chip's critical paths instead of the regular D flip-flops (DFFs). However, these kinds of EDAC methods need either clock gating or architecture-level error recovery mechanism to recover from the incurred timing margin, which is complex and will lose throughput rate during the correction period. Another in-situ timing monitor is based on timing violation prediction, which will generate an alarm signal before the real timing violation happens. It is advantageous in that it does not need the complex error correction parts.
Currently, most of the AVS circuits are designed for superthreshold region [4] - [13] . Some are designed for ultra-lowvoltage/near-threshold voltage [14] [15] [16] , but few mentioned the wide-voltage-range circuits (from near-threshold to superthreshold). If we use the ultra-low-voltage solutions in the wide-voltage-range circuits, there will be a huge waste of timing margin or power consumption when working in superthreshold region. Therefore, the existed in-situ monitors are rarely suitable for the wide-voltage-range applications. To this end, an in-situ timing monitor suitable for widevoltage-range and its AVS control system are proposed in this paper.
There are two main contributions in this paper: 1) An in-situ timing monitor with tunable delay chain length suitable for wide-voltage-range is proposed;
2) Its corresponding adaptive frequency and voltage control system for wide-voltage-range is studied, including an optimal AVS control module with the proper design of timing error history register and a state machine.
The proposed wide-voltage-range AVS technology is applied on a microprocessor based System-on-chip (SoC) under SMIC 40nm CMOS process. Its operating voltage range is from 0.6V to 1.1V. HSIM-VCS co-simulation results show that, in super-threshold region, the proposed AVS technology enables a maximum of 50% power saving as compared with a conventional design working at fixed 1.1V supply voltage. While in near-threshold region, the power saving of this design can be up to 70% compared with a constant 0.6V supply voltage situation. Plus, the total area cost of the AVS technology is 6.2%, which is relatively small compared with other in-situ monitoring based AVS systems. This paper is organized as follows. Section II describes the problems of wide-voltage-range timing monitoring. Section III details the circuit design of the timing monitor. Section IV presents the design of the AVS control system. Section V shows the system design details and AVS verification results. Section VI concludes this paper.
II. Problems of Wide-Voltage-Range Timing Monitoring
Here a predicting type of timing monitor is adopted, because it will generate an alarm signal when timing path is critical before real timing violation occurs. Therefore, it needs no error correction mechanism. Generally, there are three key features of a well-designed timing monitor. Firstly, the timing of the circuit should be effectively monitored, which is the basic function for a timing monitor. Secondly, the timing impact on the original design should be as small as possible. Some timing monitors should be inserted into the design, which needs some modification of the original design and will have some influence on the inserted timing paths. Lastly, area cost of the AVS should be as small as possible.
However, the wide-voltage range application is therefore difficult to meet all the requirements. Severe PVT variations in near-threshold region make the timing distribution more scattered in near-threshold region than super-threshold region. Here we simulate the delay time of a typical path using Monte-Carlo simulation of 10000 times. As shown in Fig. 1 , the delay distribution in super-threshold region (the left one) looks like symmetric Gaussian distribution with µ = 0.43 ns, 3σ /µ = 20%. But the delay distribution of the near-threshold region (the right one) is more scattered with a long tail, with µ = 3.50 ns, 3σ /µ = 82% (increases to more than 4 times). Therefore, much more timing margin needs to be reserved in near-threshold digital IC design. However, since the worst case rarely happens, the conventional IC design method always reserves too much margin, causing too much power waste.
Many in-situ timing monitor based AVS techniques were proposed to solve the timing margin problem, but most of them focused on either super-threshold or ultra-lowvoltage/near-threshold voltage [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] , few mentioned the wide-voltage-range circuits from near-threshold to superthreshold. For wide-range-voltage applications, the timing detection window is critical. A too small timing window will cause miss predictions, which means, sometimes real timing errors occur but no pre-error signals are generated. On the other hand, a too large timing window will directly reduce the design margin that might be eliminated. It will also increase the area and power consumption of the timing monitor. There are two basic principles of T W (timing window) design: (1) the size of T W should be large enough to make sure the monitor function correctly. (2) At the same time, the size of T W should be as small as possible to reduce the timing margin. Therefore, designing a timing monitor with suitable timing detection window for wide-voltage-range application is the first key point of this paper. The other key point is the design of AVS control module, which also influences the circuit area and final power gains directly.
III. In-Situ Timing Monitor Circuit Design
Here a double-sampling timing prediction monitor based on shadow latch is used. It is consisted of a regular flip-flop, a shadow latch, a XOR gate and some delay cells, as depicted in Fig. 2 (a) . The shadow flip-flop in traditional Canary FlipFlop is replaced with a negative level-sensitive latch, which can play the same role of monitoring the critical path but with a smaller area cost in comparison with the Canary Flip-Flop. We improve its structure for wide-voltage range application, by adding a tunable delay chain to form a tunable timing detection window, as shown in Fig. 2 (b) .
The timing diagram of the latch-based timing monitor is illustrated as Fig. 3 . During the 1 st clock period, the data Din arrives just a little bit before the setup time of the Flip-flop, which means the timing is critical but not violated. Thus it generates a high signal of Pre-error by a XOR gate as the alarm signal since the delayed input of the shadow latch is different from the input of main DFF. At this time, the output data Q F is still correct. There might be an unwanted preerror signal (called pseudo pre-error here), see as the second half of the 2nd clock period. Because the negative levelsensitive latch makes the shadow latch transparent during the negative clock cycle, while the main flip-flop is positive edge sensitive, there will be a pseudo pre-error signal produced by a XOR gate with Q F and Q LATCH as its two inputs. But the pseudo pre-error signal can be filtered out later in the total Pre_error signals sum period. At this period, the Pre_error_sum signal is sampled by a flip-flop at the negative clock edge (as the dotted line at the bottom of Fig. 3 ). Thus we can ignore this kind of pseudo pre-errors.
The AVS detection window is the timing range when timing errors can be detected. If data arrives in this range, it will be regarded as a potential timing error, so the timing monitor will generate an alarm signal of Pre-error. Therefore, the length of the detection window is critical for wide voltage applications.
A too small detection window will probably cause miss predictions, meaning that there will be no pre-error signals generated when some real timing errors occur. Whereas, a too large detection window will directly reduce the achievable power gains by AVS control. While for wide-voltage-range ICs, since there are severer PVT variations when working in low voltage region, it needs a larger detection window in nearthreshold region than in super-threshold region. Therefore, conventional monitors with fixed delay length are not suitable for wide-range-voltage applications. The deviation of detection window length with VDD under different process corners and temperature is simulated and shown as Fig. 4 . It can be seen that the Tw length varies quite a lot, not only with VDD, but also with process corner and temperature.
In order to cope with different detection window requirements for wide-voltage-range, a tunable delay chain is proposed here for the delay cell design in the timing monitor, as shown in Fig. 2 (b) .
It is consisted of a number of buffers and a multi selector (MUX). Thus its delay length can be configured by choosing the number of buffers used in this delay chain. When working in near-threshold, the required detection window should be the largest, thus it needs the longest delay chain. While working in 1.1V super-threshold region, it requires the shortest delay chain length. Furthermore, its configuration can also be calibrated after silicon. This configuration can be achieved using a look-up table, obtained by simulating/measuring the relationship between the needed detection window and the delay time of the delay chain under different voltages and process corners. 
IV. AVS Control System
The AVS control system is illustrated in Fig. 5 . A series of timing monitors are inserted in selected critical paths to replace the original DFFs in the main SoC chip, as shown in the lower left part of Fig. 5 . Plus, a well-designed AVS control system is needed in the chip to cope with the generated Pre-error signals by those monitors inserted in the critical paths. It generates a voltage up/down signal to decrease VDD when there is no timing error in M cycles. It will decrease the frequency to half immediately when there is a predicted timing error (with Pre_error signal pulled high), to avoid further real timing errors. And when there are three consecutive timing errors (viewed as quite critical situation), it will increase VDD supply voltage step by step. And the voltage control signal is forwarded to a PMC (Power Mode Controller) which is connected to an outside DC/DC for voltage regulation through an I2C interface.
A. THE AVS CONTROL SYSTEM FRAMEWORK
The AVS control system shown in Fig. 5 is mainly composed of three parts: OR-Tree and PHR, a state machine, a frequency control module and a voltage control module. The OR-Tree and PHR block collects all the warning signals from all monitors by an OR-tree and then samples the OR-ed signal of ''Pre_error_sum''. In addition, it contains a PHR (Pattern History Register) and a ''Pre_error_sum'' signal counter. The sampled ''Pre_error_sum'' signal is sent to the state machine, where a Finite State Machine (FSM) is used. The state machine will change the working state according to the ''Pre_error_sum'' signal. The control module controls the frequency tuning and voltage regulation.
B. PRE-ERROR SIGNALS SUM AND PRE HISTORY REGISTER
Due to a number of critical paths needed to be monitored, those generated Pre_error signals needed to be summed up and counted. As the Pre_error signals processing module shown in Fig. 6 , it is composed of three parts: an OR-tree with falling edge sampling flip-flop, a PHR (Pattern History Register) and a Pre-error signal counter.
The falling edge sampling Or-Tree is used to gather each Pre_error signals from the monitors by using serials of ''OR'' gates to generate a total error signal of ''Pre_error_sum''. Here a flip-flop with falling edge sampling can be used to filter the ''pseudo Pre_error'' signals, as stated in section III.
The delay of the OR-tree needs to be sufficiently small in order to change the clock frequency before the next cycle comes. Therefore, the length of OR-tree is reduced by selecting the most critical paths to insert the monitor cells according to our previous work. And a balanced design of the OR-tree is utilized to keep its delay time acceptable.
Furthermore, we design a PHR (Pattern History Register) to record the history of the Pre_error_sum signal. 
C. AVS CONTROL STATE MACHINE
AVS control state machine is presented in Fig. 7 with totally 5 states of S0 to S4. Its output signal is only related to the current state.
1) S0: NORMAL STATE
Normal State means that the chip is working with no timing errors, so that its working frequency and supply voltage do not need to be adjusted. When a Pre_error signal appears, the working frequency needs to be decreased immediately to avoid further real timing errors and the state is changed to ''Temp_Slow''. If there are no Pre_error signals for N consecutive cycles, the state machine changes to ''Voltage_Down'' state, indicating that the voltage can be decreased for lower power consumption. 
2) S1: TEMP_SLOW STATE
Temp_Slow is a temporary state, meaning that the chip needs to decrease the frequency at this time. It will only last for a clock cycle, and then jump to either ''Normal State'' or ''Voltage_Up''. If there are continuous Pre_error signals in M cycles, the output of ''&PHR [3:0]'' will be high, then FSM jumps to ''Voltage_Up'' state. Otherwise it jumps to Normal state. We design this ''Temp_Slow'' state to filter out very fast false warnings that last within a cycle, i.e., clock jitter or other very fast variations.
3) S2: VOLTAGE_UP STATE
Voltage_Up state is also a temporary state, which means that the chip needs to increase the supply voltage due to potential timing violations. In this state, volt_up signal is pulled high as an indication of the outside voltage regulation to increase the supply voltage. Because the outside DC/DC converter needs a certain time to finish one round of voltage regulation, once the FSM switches to this state, it needs to switch to ''Wait state'' in the next cycle to wait for the completion of the voltage regulation.
4) S3: WAIT STATE
Wait state indicates that it is required to wait for the voltage regulation to finish. When it is finished, the feedback signal volt_done of the voltage control interface module is pulled high, making the state switch back to the Normal state.
5) S4: VOLTAGE_DOWN STATE
Voltage_Down state is to tell the voltage control to reduce the supply voltage. When the AVS system has no Pre_error for a long time (for N consecutive clock cycles), it means that the circuit still has a certain timing margin. Therefore the state switches to ''Voltage_Down'' to reduce the supply voltage. This state lasts only one clock cycle, and then switches back to Normal state.
V. System Design And AVS Verification

A. SYSTEM PLATFORM ARCHITECTURE
Our proposed wide-voltage AVS method is applied on a SoC in SMIC 40nm CMOS process. It is mainly composed of a 32-bit CPU, an AES (Advanced Encryption Standard) cryptographic circuit, APB and AHB buses, a UART interface and an AVS control module, as the circuit architecture shown in Fig. 8 .
A near-threshold library is first built for synthesizing the design at 0.6V, by re-characterizing the selected standard cell library. Its circuit design details are shown in Table 1 . The supply voltage of the chip ranges from 0.6V to 1.1V. Its signoff frequencies are 250 MHz at 1.1V and 50 MHz at 0.6V. In total, 560 critical paths have been selected to insert the error monitor circuits. The total area overhead of the AVS system is increased by 6.2%.
B. AVS FUNCTION AND EFFECTIVENESS VERIFICATION
First, the function of the chip is verified using conventional SoC verification methods, omitted here since it is not the key point of this paper. We then verify whether the AVS system works well with AVS function enabled. By doing so, the minimum working voltage and power savings due to AVS tuning are obtained to represent the effect of our AVS scheme.
After floor-planning and place & route are completed, the chip layout is designed. Then the post-simulation including the whole clock-tree expansion and interconnect parasitic would provide a practical design time evaluation. However, traditional digital simulation tool (such as VCS from Synopsys) could not do the simulation with supply voltage changing. On the other hand, SPICE simulation is too slow for large scale digital circuit. Therefore, we use HSIM-VCS co-simulation here to verify the voltage tuning effect, which offers both acceptable accuracy and simulation time. Here the critical paths part is simulated by HSIM with high precision. The other part is simulated by VCS tool with SDF parameters back-annotated as the real paths delay after layout design. Then the supply voltage is modeled by C language according to a real DC/DC chip parameters with output range of 0.4V-1.1V and a tuning step of 20mV. Since all the critical paths are simulated by HSIM, the timing information of the critical paths at different VDDs can be obtained. As or the non-critical parts, they are simulated by VCS without changing VDD, under proper assumption that they will not cause timing errors.
In super-threshold voltage region, the main purpose of our AVS system is to reduce the time margin to get more power saving. Therefore, the function is verified under different PVT cases, where the voltage variation is represented by a 100 MHZ, random noise added to the supply voltage. And at the same time, its AVS effect is obtained represented as the maximum power savings under different PVT circumstances. While in near-threshold voltage region, the main function of the AVS system is to tolerate the severe PVT variations. Therefore, a fast voltage IR-droop is added to the supply voltage beside the voltage noise, to verify the AVS functionality and effect. Fig. 9 shows the AVS system adaptive adjustment process and its effect at an initial voltage of 1.1V, TT process, 25 • C, with an initial frequency of 250MHz. Here the Clk_slow signal is the system clock to the core circuits. It might be divided to half due to the occurrence of Pre_error. Pre_error is the summed warning signal of all timing monitors by the Or-tree module. Slow signal is used for frequency division when there is a high signal of Pre_error. At this time, the frequency needs to be decreased immediately to avoid further real timing errors. Volt_Done is the completion signal of DC/DC voltage adjustment. Volt_up and Volt_down are the voltage control signal to control the DC/DC module to increase or decrease the output voltage separately.
1) AVS VERIFICATION EFFECT IN SUPER-THRESHOLD VOLTAGE REGION
As shown in Fig. 9 , at the initial state, the circuit timing is quite loose, so the AVS control module gradually decreases the supply voltage. Until the supply voltage is reduced to 0.86V, the timing becomes tense that an early warning signal starts to appear occasionally. Then the Clk frequency is divided accordingly. After a few clock cycles, the supply voltage is reduced enough so that there are three consecutive clock cycles of Pre_error high, which is considered being timing critical. At this time, the voltage control signal Volt_Up is pulled high to increase the supply voltage, and the history register PHR is pulled high to control the state machine to jump to the ''Voltage_Up'' state. This state lasts for a period until ''Volt_done'' signal pulled high, indicating the completion of the voltage regulation. And during this period, Slow signal remains high to keep the system clock divided to half to avoid real timing errors.
Power consumption savings in super-threshold voltage region of our AVS system are obtained as shown in And at 25 • C, it has power benefits of 37%, 42% and 45% at SS, TT and FF corners respectively. At 125 • C, it has power savings of 27%, 32% and 36% at SS, TT, FF corners respectively. Therefore, our adaptive technology can get considerable power savings in super-threshold voltage region, from 27% to 50% under different working conditions.
2) AVS VERIFICATION EFFECT IN NEAR-THRESHOLD VOLTAGE REGION
When working in near-threshold region, the circuit is affected by fast variations. More timing margins need to be reserved in conventional IC design due to severe PVT variations in nearthreshold region. Here detection window of the monitor is set to be large, by configuring the MUX to tune the length of the delay chain, as illustrated in Fig. 2 (b) . Still, it makes it possible to obtain more power savings by our AVS control.
As shown in Fig. 11 , begins at the initial state of 0.6V, TT corner and 25 • C, the AVS system automatically decreases the supply voltage from 0.6V to 0.48V and then becomes stable.
In order to verify the ability of the AVS system to adapt to fast PVT variations, a 50mW voltage disturbance is added to the power supply besides the noise. Its response is shown in the right part of Fig. 11 , which is zoomed in for a better view. Due to the external voltage disturbance, the power supply voltage has a sharp decline, making the circuit generate three consecutive warning signals of Pre_Errors. Thus the AVS system controls the external DC/DC to increase the supply voltage, together with frequency slowed down and voltage up control signal pulled high. Therefore, in spite of the fast variation, the chip can still work correctly due to our on-chip timing monitor based AVS control.
The power savings in near-threshold voltage region are more obvious as shown in Fig. 12 , compared with non-AVS condition at TT corner, 25 • C and 0.6V. It can be seen that under different conditions, the power consumption savings at −25 • C are 6%, 22%, 44% at SS, TT, FF process respectively, at 25 • C are 17%, 36%, 63% at SS, TT, FF corners respectively, and at 125 • C are 28%, 53%, 70% at SS, TT, FF process corners respectively. Since the circuit reserves more timing margin in the low voltage region due to the severer PVT variations, this AVS method gets more power reduction than in super-threshold voltage region.
Wide-operating-voltage circuit has a great advantage in that it has high performance in super-threshold voltage and low performance but high energy efficiency in near-threshold region. TABLE 2 shows the energy efficiency benefits of the SoC circuit when AVS is enabled at 0.6V and different process corners and temperatures, compared with the energy efficiency of TT, 1.1V, 25 • C. It can be seen that the largest energy efficiency increase is 6.87 times at FF corner, 0.6V and 125 • C, which is the best case where a large timing margin is wasted. The smallest energy efficiency still increases 2.48 times at SS corner, 0.6V and −25 • C, which is the worst case with temperature reverse effect. Finally, we compare our results with other work as for the max power savings and area overhead of the adaptive voltage scaling techniques, as shown in TABLE 3. From this table, it can be seen that our method achieves relatively high power savings in both super-threshold region and near-threshold region, with relatively low area overhead. Plus, our method is predictive technique without the need of error correction part.
VI. Summary And Conclusions
A wide-voltage-range in-situ timing monitor based AVS circuit with tunable detection window is designed using SMIC 40 nm CMOS process with operating voltage range from 0.6V to 1.1V. The designed timing monitor has a tunable timing detection window, which is suitable for wide-voltagerange applications. HSIM-VCS co-simulation results show that this system can obtain good power savings in both super-threshold region and near-threshold region. Especially in near-threshold region, this design has a maximum of 70% power saving compared with the constant 0.6V supply voltage case. And with the AVS enabled, the energy efficiency is greatly improved. In addition, the total area cost of the AVS technology is relatively small compared with other in-situ monitoring based AVS systems.
