Index Terms-IR-drop, on-chip adaptation, on-chip monitoring, power supply noise.
I. INTRODUCTION
T ODAY'S advanced 40 nm and below technology nodes enable billions of CMOS gates to be integrated into a single chip, and achieve functions that would have required large number of independent chips in the past. However, this improvement has unavoidably made variabilities easier to degrade or fail IC functionality. The variabilities can be classified as IR-drop, process/temperature variations, and aging [1] , [2] . Due to the existence of power supply network (PSN) parasitics, as well as the high current consumption, IR-drop nowadays can reach as high as multiple hundreds millivolts [3] , which becomes one of the most significant challenges of IC design and test. It is common to observe power supply noise with a peak equaling to 10%-20% of the power supply voltage [4] , [5] . With power supply voltage level decreased by 10%-20%, the maximum operating frequency of voltage sensitive digital blocks reduces by nearly the same ratio [6] . Therefore, excessive timing margin has been applied to prevent functional failure, which becomes a big burden for timing closure and leaving much performance on the table.
To make the timing margin more reasonable and debug abnormal functional failures, the IR-drop waveform should be accurately obtained. However, due to the lack of knowing actual process variations in the chip, there is always a gap between the simulated and actual waveform. In other words, it is difficult to accurately simulate the in-field IR-drop during the design stage [7] . Therefore, silicon monitoring is of great need. It may seem obvious that IR-drop can be measured by high-end equipment such as oscilloscope or automatic test equipment (ATE). However, the off-chip equipment cannot perform in-field monitoring. Furthermore, in gigahertz era, the off-chip measurement accuracy can easily be affected by the parasitics of probe and transmission cable. Besides, due to the observation depth limitation, it is impossible to extract the IR-drop of a location deep in circuit far from VDD pins. Fortunately, with the lowered unit-area silicon cost, novel onchip sensors with the advantages of in-field monitoring, low parasitic and high observability have emerged over the past few years [8] . Additionally, with the help of on-chip sensors, in-field adaptation can now be realized.
A. Previous Work
IR-drop waveform can be potentially monitored by analogto-digital converter (ADC) [9] - [12] . By connecting the power supply grid with noise to ADC's frontend, the voltage drop is sampled and converted into a series of digital outputs. Later the IR-drop peak and width information can be obtained by analyzing the digital outputs. Although this solution is straight forward, the area overhead and analog-digital integration effort are usually high. Moreover, as the local IR-drop pattern is synchronized with the system clock edges, to obtain the pattern detail (i.e., the peak of IR-drop), ADC's sampling rate is required to be at least multiple times of system frequency. The extra high ADC sampling frequency may introduce significant design effort and power consumption.
Measuring IR-drop by all-digital components such as ring oscillators (ROs) are proposed in [13] - [16] . As the delay of digital gates are sensitive to power supply voltage level, the IR-drop information can be obtained through the variation of RO frequency. Since ROs are composed of all-digital elements, easy to be implemented, and area-efficient, they are the most attractive IR-drop monitors. ROs are usually placed across the chip to generate an IR-drop map [6] . However, as the RO frequency only reflects the average IR-drop level within the whole measurement window, the peak or duration details are usually missed.
To better describe the IR-drop waveform, [17] - [21] employ delay-to-digital converters. Most of the delay-to-digital converters are composed of digital delay elements and signature flip-flops. During measurement, when IR-drop appears, the signal traveling speed along delay line is affected differently by different levels of IR-drop, which eventually influences the values flip-flops captured. After measurement, the captured values (signatures) are read for IR-drop analysis. The delayto-digital scheme successfully provides more details about cycle-wise IR-drop information. However, as the signature is generated every clock cycle, the transient IR-drop waveform peak/width within one clock cycle still cannot be fully rebuilt form the signatures.
In order to prevent IR-drop induced functional failure, several techniques have been proposed. The conventional approach is to place adequate decoupling capacitance in the circuit layout [22] , [23] . However, with the sharply increased IC integration density, it turns out there is insufficient space for enough decoupling capacitors, especially in the areas of high congestion. Some research then focused on alleviating the IR-drop problem by dynamically adapting the power supply, frequency, switching activity or path delay have been presented. Tschanz et al. [24] and Gupta et al. [25] adapted VDD, together with body bias in response to temperature, power supply noise, and transistor aging, to maximize performance. McGowen et al. [26] proposed a methodology named Foxton, which increases the power margin by frequency adaptation. The hierarchical dynamic guardbanding proposed in [27] and [28] reduces the otherwise conservative guardbanding for various functional units. Wang et al. [29] , [30] lended the unused timing margin of an upper stream path to a failing down stream path, at the same time, reduces the number of simultaneous switching clock branches. Modifications of data buses in circuit and coding level for dynamic noise suppression, are proposed in [31] and [32] , respectively. However, the above systems need effective IR-drop sensors to trigger adaptation before or during the IR-drop peak to prevent functional failure.
According to the discussions above, average IR-drop information can be easily obtained by multiple types of economical sensors. However, transient details of IR-drop, including peak and width, which are critical for in-field adaptation and IC failure analysis, need low-cost methodologies to obtain. Thus, new monitoring system with the following features are needed: 1) ability to measure IR-drop waveform peak/width; 2) low fabrication and test cost; 3) high accuracy and sensitivity; and 4) ability to enable early adaptation.
B. Contributions and Paper Organization
In this paper, a novel on-chip transient IR-drop monitor suitable for GHz IC called TRO is proposed, which has the following advantages.
1) It can accurately measure the IR-drop waveform width and average. The IR-drop peak can be calculated through a model correlating IR-drop waveform width, average, and peak. No high frequency sampling clock is needed to obtain IR-drop waveform width or peak.
2) It is all-digital with low overhead, thus easy to be integrated into modern IC design flow. 3) It is robust against process variations. 4) It removes temperature's impact from IR-drop measurement results. 5) The IR-drop noise width detection resolution can reach 0.125 ns, with noise peak and width measurement error rate less than 6.8% and 9.0%, considering 1% oxide thickness t ox , 5% transistor width W, 10% transistor length L, and 25% threshold voltage V th process variations. The rest of this paper is organized as follows. Section II presents the detailed architecture and transient waveform reconstruction procedure of TRO. Section III provides the TRO-based measurement flow. Section IV presents the simulation results. Finally, concluding remarks are given in Section V.
II. TRO-BASED TRANSIENT IR-DROP MEASUREMENT
The TRO bases transient IR-drop measurement is composed by two parts. 1) During measurement, the TRO records IR-drop average, width, temperature data, and triggers existing dynamic voltage frequency scaling (DVFS) systems for adaptation if necessary. 2) After measurement, the recorded data are loaded from flash, and the IR-drop peak is calculated off-chip. The definitions of symbols used in this paper are given in Table I .
A. Components of TRO System
As shown in Fig. 1 
2) TRO:
The TRO, shown in Fig. 2 Fig. 2 . The AND gate inside the RO loop is controlled by Meas_En from timer. When the measurement window length M is satisfied, Meas_En switches to low to terminate the oscillation. If for a measurement window M containing IR-drop, RO's total oscillation number is N. Then the average IR-drop can be defined as a dc voltage level, which also makes the Fast RO to oscillate N times throughout M. Thus for a specific RO, the average IRdrop has an one-to-one mapping relationship with N. From (1), it can be seen that increasing oscillation number N helps to improve the observability of Fast RO oscillation period T, which is a function of IR-drop. Hence, the Fast RO needs to be designed as short as possible. Through length configuration, the minimum Fast RO length can be found
b) Edge detector: The Edge Detector measures the IRdrop waveform width. As shown in Fig. 3 , the delay chain at the bottom of the structure is composed of buffers. During measurement, a clock rising edge is transmitted along the delay chain, and arrives at each branch with a time interval equaling to the delay of the buffer between branches. Hence, the time interval works as the sampling window W. There are m (= M/W ) branches connected to the delay buffer chain, where M and W are the measurement and sampling window lengths, respectively. Upon reaching the beginning of the branch, the rising edges then are transmitted toward the signature flip-flop (i.e., FF i in Fig. 3 ). Each branch is composed of two sub-branches, the weak sub-branch and the strong sub-branch. The weak branch is composed of small buffers with large loads, while the strong branch is composed of large buffers with small loads. The rising edge is transmitted to both branches at the same time. As mentioned above, the length of the two branches are calibrated to be close to each other under normal power supply (VDD 0 ). However, with dropping power supply, the weak branch is delayed more significantly than the strong one, thus the delay difference increases (as shown in Fig. 4) . Hence, when the IR-drop is large enough, a glitch is generated at the output of AND gate as well as the clock input for the flip-flop, which turns the flip-flop into " During which, all branches of the Edge Detector are calibrated at the same time. As shown in Fig. 3 , the length of the strong sub-branch varies from one to four buffers, which can cover the strong/weak sub-branch imbalance caused by process variations and temperature. A single 2-bit counter supplies multiplexer selection values to the 2-bit selection registers of all branches. Once the calibration of a specific branch ends, the selection register is locked. To make counter and selection register fully settle down before the signature checking cycle, the calibration counter value increases every four clock cycles, which means the whole calibration procedure can be finished in 16 clock cycles. c) Fast RO counter: To count the Fast RO's oscillations number, the n-bit fast RO counter is suggested to be implemented as ripple counter including n pairs of flip-flops and inverters as shown in Fig. 2 . Due to the asynchronous characteristic, the ripple counter can work with RO with higher frequency. The counter value at the end of a measurement window is N. d) Timer: The timer circuitry controls the measurement window length, which starts and stops the Fast RO at the beginning and end of a measurement window. e) Differentiator: The Differentiator circuit is used to cooperate with DVFS for adaptation. As shown in Fig. 2 , the Differentiator is basically composed of data registers and subtractors. As the Differentiator evaluates whether adaptation is needed every clock cycle. The clock period of which is defined as Adaptation Grid as shown in Fig. 5 . To make the adaptation take place within one clock cycle, the Adaptation Grid is 1/4 of the system clock period. At the end of each Adaptation Grid, the most updated Fast RO Counter value N i−1 is piped IR-drop waveform width/average, measurement window, and adaptation grid. into the data register Reg 0 . And P i = N i − N i−1 can be calculated by Subtractor 0 and stored in Reg 1 . Then P i is compared with predefined adaptation threshold P thd by the comparator. Hence, excessive IR-drop in a specific Adaptation Grid can be found when P i < P thd .
3) Decision Logic: The Decision Logic generates the calibration termination and adaptation enable signals for the Edge Detector and DVFS, respectively. During Edge Detector calibration, the Decision Logic decides whether the strong and weak sub-branches are calibrated to the same length. Through in-field usage, the Decision Logic also determines whether adaptation is needed according to the output of the Differentiator.
B. TRO-Based IR-Drop Waveform Reconstruction
In comparison with directly measuring transient IR-drop with ADC or other types of analog circuits, TRO uses alldigital components to measure the average and width of IR-drop, and calculate peak, which reduces its design and power cost. In order to calculate the IR-drop peak based on the measured width and average, the IR-drop waveform should be modeled first. Fig. 6 shows the actual power supply with IR-drop in the circuit. It can be seen that the IR-drop appears synchronously with the edges of system clock. As shown in Fig. 7 , the PSN can be expressed by the integrated circuit electromagnetic model (ICEM) [33] , [34] . ICEM is composed of two parts: 1) the external parts representing the package PSN and 2) the local part representing the transistor switching current as well as the local PSN. Thus VDD(t) is the local IR-drop measured by TRO. By employing Kirchhoff's law, the relationship between the ICEM parameters can be expressed by
Furthermore, during gate switching, current pulse i lc is generated by CMOS devices. According to the linear region model of short channel devices, i lc increases linearly with gate input voltage, thus can be modeled as triangular with I p and t 0 representing the current peak and width as shown in Fig. 8 . Then by substituting Laplace transformed equation (2) and (3) into (4), the analytic solution of VDD(t) can be find as
where the symbols α, β, ω, γ , θ , and ϕ are
It can be seen that the four components belonging to (5) are constant, step, triangular, and sinusoidal, respectively. If the duration of transient peak current is short, that is, the value of t 0 is small. Then the term sin(ωt 0 /4) approximates to 0. Hence, the sinusoidal components can be neglected. Also as α β for advanced technology nodes, the first three components denote a triangle, for which the peak is βI p /t 0 and the width is t 0 . In order to validate the triangular local IR-drop model, 100 paths of ITC'99 b19 benchmark with clock frequency of 200 MHz are selected. Both actual IR-drop extracted from fullchip simulation and fitted triangular IR-drop waveforms are applied to the selected paths. Through HSPICE simulation, it can be found that for 93% of cases, the path delay differences between the above two conditions are less than 0.4% (shown in Fig. 9 ). Hence, the triangular model can effectively mimic shape of local IR-drop.
For a triangular IR-drop waveform as shown in Fig. 5 , if IR-drop average and width are both determined, then a unique IR-drop can be reconstructed. As introduced in Section II-A, the IR-drop average is measured by Fast RO, while the IRdrop width is measured by the Edge Detector. The counter value N for a specific measurement window M under power supply waveform VDD(t) can be expressed by (7) , and the normalized counter value N norm can be calculated by
Based on (8), the following conclusions can be made. 1) (∂T/∂VDD) determines Fast RO's oscillation number under given VDD(t) (IR-drop waveform width and peak).
2) The N norm -IR_width-IR_peak relationship of a specific
Fast RO used in-field can be constructed by knowing its own in-field (∂T/∂VDD).
3) The process variations and temperature's impacts on the N norm -IR_width-IR_peak relationship all locate inside the parameter of (∂T/∂VDD) for a specific Fast RO. The process variations are fixed after fabrication, however, the in-field temperature is not fixed. 
C. Analyses of TRO's Performance Under Process and Temperature Variations
The process variations and temperature affect measurement accuracy, and should be minimized. TRO uses three elements to reconstruct the transient IR-drop waveform off-chip: 1) the IR-drop width; 2) average; and 3) the N norm -IR width-IR peak lookup table correlating IR-drop peak to the width and average. The process variations and temperature's impacts on the above three elements are controlled and minimized as follows.
1) The IR-drop width is obtained by the Edge Detector, and calculated according to [(j − i) + 1]W, where i and j are the first and last flipping branch index and W is the sampling window length. Process variations and temperature influence the value of i and j by affecting the Edge Detector branch's IR-drop sensitivity. For example, if a strong sub-branch is too much longer than its pairing weak subbranch, due to large process or temperature variations, the branch signature cannot flip from 0 to 1 even with large IR-drop. To make the sensitivity stable, the branches are automatically calibrated, during which the length of the strong subbranch is increased one to four big buffer delay (t d ) until the branch signature flips from 1 to 0. Thus, after calibration, the delay of strong subbranch is 0-t d longer than that of weak subbranch. For advanced technologies, t d can easily be in the range of 10-15 ps. As mentioned above, the calibration process finishes within 16 clock cycles. It can be considered that the temperature keeps the same through measurement, and the length of sub-branches keeps the same through calibration and measurement.
The branches' sensitivity after calibration can be determined by keep reducing the global VDD to (VDD− S), until the branch signature flips to 1. Then S is the branch sensitivity. Fig. 10 The overall implementation, measurement, and adaptation flow of TRO system is shown in Fig. 11 . Since the TRO system is all-digital, the implementation and measurement flow can be easily integrated into current industrial design and test flow. The TRO-based IR-drop measurement can be performed during functional or test phases. The details of the flow are given in the following.
A. Sensitive Region Selection for TRO Insertion
The critical paths in high IR-drop regions are more prone to fail. Hence, TRO should be inserted into the above regions. The high IR-drop regions can be selected by applying functional or structural patterns and perform post layout IR-drop analysis.
B. TRO System Implementation
According to the design of TRO, no extra SoC-level constraints are needed. Hence, TRO system insertion should not impact global clock routing significantly. As for the Edge Detector, the strong branch is suggested to be constructed by placing large buffers closer to each other, while the weak branch is suggested to be constructed by small buffers far from each other. All the other parts of TRO can be automatically placed and routed by EDA tools [35] .
C. Measurement and Sampling Window Decision
As shown in Fig. 5 , each measurement window can be equally divided into multiple sampling windows. And the IRdrop waveform width measurement error rate is smaller than 2W/t w , where t w is the actual IR-drop width. For example, if by post-layout simulation, the IR-drop width is around 1.5 ns, and the target measurement error is 10%, then the maximum sampling window length is 75 ps. Hence, two to three large delay buffers are needed to be placed in between the branches of Edge Detector. In terms of measurement window, for a single clock system with only positive-edge devices, the measurement window can be as long as one system clock cycle. And the minimum measurement window length is bounded by the acceptable error of T calculated by (1) . For a region composed of both positive and negative-edge devices, double IR-drop peaks occur in each clock cycle, thus the measurement window should equal to half of system clock cycle. For multiple-clock region, the measurement window is suggested to be the same as the shortest clock cycle. The length of the Fast RO should be adjusted to cooperate with the measurement window.
D. TRO Lookup Table Generation
As the process variations and in-field temperature differ, lookup tables need to be generated for each Fast RO. To save memory and test time, the lookup tables can be stored offchip, and checked during data analysis or customer return. A four-step lookup table generation flow during design and manufacturing test is proposed as follows.
1) A N 0 -K lookup table [shown in Fig. 12(a) ] which correlates noise-free counter value N 0 and temperature K for each manufactured Fast RO is generated during production test under various K across [−40 • C, 100 Fig. 12(b) ] which correlates temperature K and (∂T/∂VDD) of a specific Fast RO is then generated during production test by stepping VDD and measuring the variation of Fast RO oscillation period T. lookup table is obtained through SPICE simulation. The lookup table checking can be performed automatically by software programs during data analysis or customer return.
E. Adaptation Threshold Decision
The adaptation decision is made according to the value of P i , which is the Fast RO Counter value increment within one Adaptation Grid. If P i is lower than Adaptation Threshold P thd , which shows excessive IR-drop, adaptation is triggered. During production test, by increasing the intensity of structural or functional patterns, when the first failures emerging, the P thd can be obtained by making P thd = P i .
F. Control Register Configuration
Before measurement, the Control Registers need to be configured. The configuration process includes writing the start clock cycle, measurement window length, Edge Detector branch configuration, adaptation threshold, and Fast RO length into the corresponding registers from memory or scan interfaces.
G. Edge Detector Calibration
Before measurement, the lengths of strong and weak subbranches inside each Edge Detector branch are automatically calibrated close to each other in-field. The decision logic checks the m-bit Edge_Indicator[m − 1 : 0]. If a specific bit flips while increasing the strong sub-branch's length, the calibration ends for that branch.
H. Temperature Measurement
The in-field temperature K needs to be measured and recorded to facilitate lookup table checking. To obtain local temperature, the Fast RO is enabled to oscillate for a time length of M with little circuit activities, when counter value N 0 is recorded.
I. In-Field Monitoring
During in-field monitoring, the sensor starts at the predefined clock cycle number, and work for a time length of M (measurement window length). The N 0 , N, P, and Edge_Indicator[m − 1 : 0] values need to be stored in on-chip Flash. In this step, functional, structural, or BIST patterns are applied.
J. In-Field Adaptation
The Decision Logic decides the action (i.e., no adaption or adaptation) needed according to the output of Differentiator, which compares P i with the predefined thresholds (P thd ). When the IR-drop is over threshold, the output Adapt_En of the decision logic switches to logic 1 to adjust the DVFS system.
K. IR-Drop Waveform Reconstruction During Customer Return
After in-field measurement, such as during customer return, the recorded N 0 , N, and Edge_Indicator[m − 1:0] values can be read out. Normalized oscillation number N norm and IR-drop waveform width t w can be calculated from the reading out values. Then in-field temperature K is recovered first by checking the prestored N 0 − K lookup table. The in-field (∂T/∂VDD) is found by checking the prestored K − (∂T/∂VDD) lookup table, which helps to locate proper V p −t w −N norm plane. With proper V p − t w − N norm plane, V p can be obtained by using the above calculated t w and N norm . Hence, a triangular shape IR-drop can be reconstructed.
IV. EXPERIMENTAL RESULTS
The TRO system has been implemented in 45 nm Nangate technology [36] . The Fast RO is of a minimum length of three inverters. The ripple counter is of 8-bit width. Six versions of Edge Detectors are implemented with 10-25 branches. The average TRO system area is 684.8 μm 2 . The simulation is performed within the temperature range from −40 • C to 100 • C. The standard power supply is 1.05 V, and the measurement window M is 7 ns. To verify TRO's performance across process variations, 1% t ox , 5% W, 10% L, and 25% V th variations have been applied across the chip. When the power supply is 1.05 V without any noise, the period of the frontend RO is around 56-66 ps depending on the variations. The proposed system has been inserted into ITC benchmark s15850, s13207, b14, b19, and the 64-bit floating point and graphics unit (FGU) set from the OpenSPARCT2 SPARC core. For sensor location selection, the IR-map of circuits under various functional patterns are obtained first as shown in Fig. 13 . The region with red color means there is potentially large IR-drop during operation, where the TRO system should be inserted. Due to the congestion level of the areas with higher IR-drop is high, the Control Registers and Decision Logic of TRO can be placed in the areas of lower congestion. Table II presents the number of TRO frontends as well TABLE II  AREA, IN-FIELD RUNNING, AND LOOKUP TABLE  GENERATION OVERHEADS OF TRO as the total area overhead for each benchmark. As shown in Table II , with the benchmark size increase, the area overhead per TRO reduces. The comparison results between TRO and other types of typical IR-drop monitors are shown in Table III . As expected, comparing with ADC [9] , TRO avoids the extra high frequency sampling clock and custom layout, which make advantages on area and power. While comparing with timedelay-converter [17] and VCO [13] , TRO can bring IR-drop width and peak information, with acceptable area and power overheads.
A. TRO Sampling and Measurement Windows Decision
As M/W branches are needed by the Edge Detector, a proper sampling window length W allows fewer numbers of branches thus lower overhead, while does not lose voltage falling and rising details. Table IV shows the impact of sampling window length on noise width measurement accuracy, in which six sampling window lengths from 0.7 to 0.2 ns are put forward for analysis. The noises under measurement have widths from 1 to 3.5 ns and a peak of 200 mV. Also notice from Table IV that for noises wider than 2.5 ns, by applying the 0.2-ns sampling window, the width error has been reduced to around 3%, which meets our requirement. Fig. 14 shows the value of Edge_Indicator bits for various sampling window scenarios. The stems are located at the boundary of each sampling window, and represents the value of a specific Edge_Indicator bit. Due to the delay caused by the branch transmission, the bit value represents the IR-drop scenario after the stem. Thus bit 2 of Fig. 14(a) flips, and bit 6 does not. According to our implementation, the 6 Edge Detectors shown in Fig. 3 are located with various distance to the IR-drop source. The actual IR-drop seeing at each Edge Detector is also shown in the figure.
The Edge Detector waveform for 1-ns width IR-drop with 0.125-ns sampling window is shown in Fig. 15 . The graph shows that Edge_Indicator [8] switches first when IR-drop comes, and Edge_Indicator [15] switches last before IR-drop disappears, which given an accurate IR-drop waveform width measurement result. It should be noted that, as the in-field temperature is recorded, the Edge Detector branch sensitivity can always be determined during customer return. Therefore, if an Edge_Indicator value like 11111011111 occurs, from the sensitivity test, we can judge whether the 0 is caused by a failure branch (caused by defect) or not. If the 0 is caused by a failure branch, then it is not counted. If not, double peaks may exist in the measurement window. For a region with falling edge cellls, double peak may occur after the rising and falling edges of system clock. It is suggested to Fig. 15 . Waveform of Edge Detector when measuring 1-ns width IR-drop noise using 0.125-ns sampling window. It should be noted that the small sampling window is generated without the need of high frequency sampling clock.
shrink the measurement window to half clock cycle. Therefore, only one peak is included in each measurement window. However, if more than two clocks are intensely used in the same region, three or more peaks may occur. The measurement window needs to be adjusted carefully to cover only one IR-drop peak.
To accurately measure average IR-drop, Fast RO has to make a good number of oscillation within the measurement window. According to (1) , with 1% allowable oscillation period error, the measurement window length should be 100 times of Fast RO period. Considering the maximum countable Fast RO period is 50-60 ps for 45-28 nm technologies, thus the measurement window is chosen to be 7 ns in this implementation. However, if 5% average IR-drop error is allowed, the measurement window can be reduced to 1.4 ns. As a lot of customer return is caused by excessive IR-drop during infield BIST, monitoring the IR-drop cases during BIST mode is necessary. The BIST clock is composed by 20-40 MHz shifting clock and a full at-speed clock cycle. Apparently, the 7-ns measurement window works fine for the shifting case. And as the at-speed capture cycle is single, a measurement window starts at the capture clock and ends after 7 ns, can still cover only one peak IR-drop event. However, to check the functional test cases with consecutive GHz clocks, fullchip SPICE simulations are applied with clock sweeping from 200 MHz to 2.5 GHz for ITC'99 s38417 with intense circuit activity, which mimics the regional (or IP) IR-drop of large industrial chip. IR-drop width at various frequency is shown in Fig. 16 . The graph shows that, due to the parasitic of power ground network, the width of IR-drop does not shrink proportionally with the operation frequency. Actually for 1.5 GHz and higher frequencies IR-drop lasts for the same clock cycle. As a result, at the end of a clock cycle only a part of the IRdrop can be recovered. For example, at 2.5 GHz, 30% of the IR-drop peak can be recovered. In this way, the transient IRdrop measurement is of low significant and accuracy. Average IR-drop obtained by ROs, and IR-drop peak obtained by architecture such as [37] , can be used to obtain a description of the IR-drop waveform. 
B. TRO Lookup Table Generation
For each specific Fast RO, the N 0 − K and K − (∂T/∂VDD) lookup tables are generated during production ATE test. To obtain the N 0 − K lookup table, the selected Fast RO is enabled to oscillate under various temperatures from −40 • C to 120 • C with a step of 10 • C with little circuit activity.
The measured N 0 − K lookup table for an Fast RO is shown in Fig. 17(a) . The K − (∂T/∂VDD) lookup table is obtained under same temperature settings, but with VDD ramps from 0.9 to 1.1 V. The measured (∂T/∂VDD) − K lookup table for the same Fast RO is shown in Fig. 17(b) . Under triangular IR-drop model, the relationship between IR-drop peak V p , width t w , and RO's oscillation speed N norm are fixed by (∂T/∂VDD), which contains all process variations and temperature's impacts. Therefore, the V p −t w −N norm lookup table set can be generated at various (∂T/∂VDD) by SPICE simulation in design stage without considering the process variations and temperature. The V p −t w −N norm plane set for the implemented Fast RO is shown in Fig. 18 .
It should be noted that, to speed up the lookup table checking process, the V p − t w − N norm lookup table can be expressed as a fitting formula, such as
where a, b, and c equaling to 1.03, −0.272, and −0.0364, for the ∂T/∂VDD = 0.52 case. The sum of square error and R-square are 0.0016 and 0.9802, which shows the fitting formula can accurately describe the V p − t w − N norm relationship. For cases with limited on/off-chip memory resource, only a, b, and c needs to be stored, which significantly reduces the memory usage.
C. TRO-Based IR-Drop Reconstruction and Accuracy Analysis
N 0 , N, and t w reflecting temperature, IR-drop average and width are measured in-field. Then during customer return or normal data analysis, IR-drop peak V p can be obtained by going through the lookup table checking flow. Fig. 19 shows the actual 2.6-ns width 100-mV peak IR-drop and the reconstructed IR-drop waveform when using 0.7-and 0.2-ns sampling windows. The reconstruction errors can be calculated by (10) and (11), where V p and t w are the actual IR-drop peak and width, while V p and t w are the TRO measured values, and Err width , Err peak are the reconstruction errors. Hence, the reconstructed waveform for the W = 0.2 case in Fig. 19 has 3.8% width error, and 9.7% peak error
Err width = t w − t w t w × 100%. Table V shows the reconstruction errors for IR-drop waveforms of various widths and peaks. With sampling window equals to 0.2 ns, the width and peak error are less than 3.20% and 4.75%, for all three IR-drop waveforms, but the reconstruction error increases with IR-drop peak/width shrinks to 1.1 ns/0.1 V. To verify TRO's performance under process variations, 1% t ox , 5% W, 10% L, and 25% V th variations have been added. Monte Carlo simulation is performed with a sample volume of 100 at 25 o C. The reconstruction error results, shown in Fig. 20 , clearly indicate that the IR-drop peak reconstruction error is kept less than 6.8% while the width error is less than 9.0% for 97% of the Monte Carlo samples.
D. TRO Sensitivity Analysis
The IR-drop waveform width measurement resolution depends on the buffer delay between the branches of Edge Detector. Hence, the minimum detectable width equals to the delay of 2-3 fastest buffers. However, a very narrow IR-drop with small peak cannot give significant impact on the measured oscillation period of an Fast RO. Therefore, the measurement resolution is limited. In this implementation, for the 0.4-ns IR-drop waveform width case, the peak sensitivity is 350 mV. 9: if (P i < P thd ) then 10: In-field adaptation (Adapt_En = 1) 11: else 12: No adaptation (Adapt_En = 0) 13: end if While the peak sensitivity for the 1-ns width case is increased to 100 mV. For 1-ns width noise with peaks more than 100-mV cases, the width and peak measurement error falls below 6.82% and 9%, respectively. It should be noted that the peak sensitivity can be improved by implementing faster RO in more advanced technology node.
E. In-Field Adaptation Analysis
When the current P i exceeds the predefined threshold P thd , adaptation is initiated. Then the Adapt_En signal is switched to high immediately, which triggers the DVFS to act (such as elevate supply voltage or degrade frequency) as shown in Fig. 21 . By employing the proposed Differentiator together with the adaptation grid equaling to 1/4 of system clock cycle, comparing with ordinary adaptations algorithms, the proposed system can generate Adapt_En within 25% system clock cycle. Depending on the occurrence of IR-drop peak inside a system clock cycle, the adaptation speed reaction time can be 25%-100% of a system clock cycle. Fig. 21 shows the generated Adapt_En signals for two noise cases, while the algorithm for the TRO-based implementation, measurement, and adaptation flow is shown in Algorithm 1.
V. CONCLUSION
In this paper, a novel transient IR-drop waveform monitoring system named TRO is proposed. Instead of directly measuring the transient IR-drop waveform with analog circuit, TRO measures the width and average of IR-drop waveform with all-digital circuit, then obtain the IR-drop peak through a lookup table checking process. Finally, IR-drop waveform can be reconstructed. The all-digital feature guarantees TRO's low area, power overhead, as well as low implementation effort. Through the proposed lookup table checking process, the temperature and process variation's impact on IR-drop measurement result are also removed from the IR-drop measurement result. Based on the reconstructions results, for 1-ns width noise with peaks more than 100-mV cases, the width and peak measurement error falls below 6.82% and 9%. As soon as excessive IR-drop is detected, the proposed system enables fast adaptation and mitigate the noise within one clock cycle to prevent functional failure.
