Abstract-This paper discusses adaptive performance control with two types of on-chip variation sensors. The first sensor aims to extract several device-parameters for performance adaptation from a set of on-chip ring-oscillators with different sensitivities to device-parameters, and the device-parameter decomposition is discussed. The second sensors, which are embedded into functional circuits, predict timing errors due to PVT variations and aging. By controlling circuit performance according to the sensor outputs, PVT worst-case design can be overcome and power dissipation can be reduced while satisfying performance requirements. Measurement results of a subthreshold adder on 65-nm test chips show that the adaptive speed control can compensate PVT variations and improve energy efficiency by up to 46% compared to the worst-case design and operation with guardbanding.
I. INTRODUCTION
As manufacturing technology advances and supply voltage is lowered, circuit speed is becoming more sensitive to man ufacturing variability, operating environment, such as supply voltage and temperature, and aging due to NBTI (negative bias temperature instability) and HCI (hot carrier injection).
Thus, timing margin of a chip varies chip by chip due to manufacturing variability, and it also depends on its operating environment and age. For a certain chip, large timing margin exists and it is desirable to slow down the chip for reducing power dissipation with dynamic voltage scaling or body biasing [1 ]- [4] . In an operating condition, the timing margin is not enough and the circuit should be speeded up. The adaptive speed control is believed to be promising. This paper reviews post-silicon performance tuning tech niques at various phases, and introduces on-chip variation sensors for shipping test. We also discuss run-time adaptive speed control using on-chip sensors for timing error prediction.
Measurement results of a subthreshold circuit on 65-nm test chips demonstrate that the run-time adaptive speed control overcomes PVT variations and eliminates large design margin for guardbanding.
II. POST-SILICON TUNING
Post-silicon performance tuning is often carried out in the following four phases.
•
Shipping test
• Power-on test 
Run-time
For high-end microprocessors for super-computers and servers, intensive delay tests are carried out on an LSI tester before the shipment, and the necessary supply voltage is carefully evaluated and recorded using fuse or flush for each chip. This approach requires an expensive test cost, and hence it has been applicable only for high-end products. On the other hand, by using on-chip variation sensor, it would be possible to simplify the testing and reduce the tuning cost. Section III will discuss the variation sensors for such a purpose.
As aging effects become significant, field test that aims to detect gradual performance degradation and wearing-out failures is drawing an attention. An approach to tackle this problem is to carry out a test when a chip is powered on [5] . Good points of this power-on test approach are that the time for test is almost invisible for users and relatively long test patterns can be applied compared to the off-line test. However, the power-on test is not applicable to the chips running continuously without power-off, and it does not capture environmental fluctuation.
To overcome the drawbacks of the power-on test, off-line test has been studied. This approach is well matched with multi-/many-core chips, since all the cores are not running all the time and some cores are temporally idle. Exploiting this temporal idol time or forgiving a slight performance degrada tion due to decrease in the number of cores running, functional and delay tests can be executed [6] . Thus, this approach is called pseudo on-line test as well. In this approach, there is a tradeoff between the idol/down time and test coverage. On the other hand, it is difficult to apply off-line test to uncore circuits and SoCs in general, because hardware redundancy is not usually available, although [7] tests uncore circuits with a special hardware support.
The last one is run-time adaptation that can cope with manu facturing variability, environmental fluctuation and aging. The run-time speed adaptation requires sensing the timing margin of the circuit. For this purpose, critical path replica [8] has been traditionally used. However, its efficiency is deteriorating because the performance mismatch between the replica and the actual critical path tends to be significant due to increasing within-die variation and aging. To more efficiently sense the timing margin, in-situ techniques have been studied [9] - [12] .
Nevertheless, this scheme inherently involves a critical risk of timing error occurrence. When the circuit is slowed down, it is not possible to perfectly predict whether the enough timing margin exists after slowed down.
The run-time adaptation is classified into two groups, error correction approach and error prediction approach. " Razor I" in [9] and "Razor II" [10] are the first approach that detects timing errors with a delayed clock in a processor and corrects the errors using extra recovery logic or re-execution of in structions. They control supply voltage monitoring the timing error rate and reduce power dissipation. The error recovery is performed exploiting a function commonly implemented in high-performance processors, and hence it is not easy to apply it to general sequential circuits. In addition, Razor FF requires the timing window of error detection just after the clock edge in order to detect a late-arriving signal as a timing error, which induces severer minimum path delay constraints.
In contrast, "Canary Flip-Flop" [11] and "Defect Prediction Flip-Flop (DPFF)" [12] have been proposed that aim not to detect timing errors but to predict them. When the timing margin is not enough, they capture wrong values, whereas the main flip-flops capture correct values. The difference of cap tured values gives a timing warning. Timing error prediction is superior to timing error detection in terms of applicability since error recovery mechanism is not necessary as long as a timing warning can be generated before a timing error occurs.
The adaptive speed control with timing error prediction will be introduced in Section IV.
III. ON-CHIP VARIATION SENSORS FOR DEVICE-PARAMETER EXTRACTION
To adapt the performance efficiently at shipping test phase, it is required to estimate for every chip how device-parameters varied from their typical values during the manufacturing pro cess. Then, estimates of device-parameters are used to obtain an appropriate tuning. For example, when the magnitude of PMOS threshold voltage is high and NMOS threshold voltage is typical, forward body bias should be given to PMOSs, not to NMOSs. Otherwise, large increase in leakage current would be introduced. As an application of this type of sensors, clock skew reduction is investigated in [13] .
For such a purpose, RO (Ring-Oscillator)-based sensors have been intensively studied [14] - [17] . They can be easily implemented in a chip and can be used to obtain variability information even after the product shipment, because the oscillating frequencies of ROs can be easily measured with a simple circuit structure. Besides, ROs consisting of ordinary standard cells give the speed variation. However, they are not capable of decomposition of the speed variation into device parameters, such as threshold voltages and channel lengths of PMOS and NMOS, because the speed sensitivities of the ROs to device-parameters are similar.
To extract not only the speed variation but also device parameters, sophisticated ROs have been proposed [16] , [17] .
In these ROs, the sensitivity to a single or two device parameters is intensified, and the sensitivities to other device parameters are suppressed. Using a set of these ROs with different sensitivities, device-parameters are estimated.
However, when using such ROs, random variations might not be canceled out. An example of this phenomenon is demonstrated here when using such a highly-sensitive RO to device-parameters in 65-nm process. Random variations of and the shift of the average must be considered in the device parameter estimation from the measured sensor outputs.
To deal with the above fact, we have proposed a device parameter extraction method considering random variations explicitly and demonstrated that the proposed method can accurately estimate both of global and random variations [18] . This method is based on MLE (Maximum Likelihood Estimation) and extracts not only D2D parameter variations but also standard deviations of random variations.
Let us shows an example of device-parameter extraction result in 65-nm process. This evaluation was experimentally performed using virtually-fabricated chips (simulated data). It is assumed that variation sources to be extracted are Vt hnl P and Ln/p, and (T!'J.G / RVth = 20 mV and (T!'J.G / RL I = 2 nm. n� np Here, I1Gx denotes global variability, and I1Rx corresponds to random variation of parameter x. Please see [18] for details including sensor structures. Table I lists the averages of the absolute estimation errors of AGx from the given variations.
The proposed method reduces the estimate error by 11.1 %-73.4% and provides more accurate estimation thanks to the consideration of random variations. Figure 2 shows a circuit that adaptively controls the speed and power dissipation using a warning signal generated by a timing-error predictive (TEP) FF [19] - [21] . The TEP-FF consists of a normal flip-flop, a delay buffer and a comparator (XOR gate). When the timing margin is gradually decreasing, a timing error occurs at the TEP-FF before the main FF captures a wrong value due to the delay buffer, which enables us to predict that the timing margin of the main FF is not large enough. A warning signal is generated to predict the timing errors, and it is monitored during a specified period. Once a warning signal is observed, the circuit is controlled to speed up. If no warning signals are observed during the monitoring period, the circuit is slowed down for power reduction. This speed control overcomes the variation of the timing margin which is different chip by chip and varies depending on operating condition and aging.
Even when the TEP-FF is well configured to generate the warning signal, the occurrence of timing error cannot be reduced to zero. This is because when critical paths are not activated for a long time in the circuit operation, the circuit might be slowed down excessively. If a critical path is activated in this condition, a timing error necessarily happens, which is believed to be a critical problem that prevents a practical use. To reduce and manage the error occurrence, we have to examine and tune the following design parameters on the basis of systematically estimated error rate.
• location where TEP-FF should be inserted For this purpose, we developed a framework that uses path activation probabilities to estimate the timing error rate and power dissipation. Please see [19] , [20] , [22] for the details. 
B. Silicon Results
We designed and fabricated a test circuit to validate the adaptive speed control with TEP-FF in a 6S-nm CMOS process [21] . Measurement results are shown in this section. Here, the run-time adaptive speed control is applied to subthresh old circuits which are very susceptive to manufacturing and environmental variations.
1) Circuit:
A 32-bit Kogge-Stone adder (KSA) was adopted as a circuit whose performance was controlled adaptively. The micrograph is shown in Fig. 3 . can be enabled or disabled individually.
2) Adaptive Compensation of Environmental Variability: 3) Comparison to Worst-case Design: We next demonstrate how inefficient the worst-case design for process variation is for subthreshold circuits, and clarifies how beneficial the adaptive performance control is.
We here discuss the worst-case design in terms of manu facturing variability. Assuming 2-MHz operation, the supply voltage must be 0.5 V or higher for a chip at the SS device comer, for example. In this case, all chips should operate at 
ACKNOWLEDGMENTS
This work is supported in part by New Energy and Industrial Technology Development Organization (NEDO).
