Abstract-the effect of DC BTI stress on the clock signal's dutycycle has been experimentally verified for the first time based on the precise frequency shift measurement from Ring OSCillators (ROSC). A simple and practical methodology based on the "silicon odometer" beat-frequency detection framework has been proposed for accurately measuring duty-cycle shifts while preventing unwanted BTI recovery. The measurement results from a 65nm test chip were used to further analyze the impact of asymmetric BTI aging during clock gated mode on SRAM timing signals.
INTRODUCTION
Low power SRAMs, dynamic register files, and domino gates typically rely on both the rising and falling edges of the clock to generate internal timing signals. Unlike standard flipflop or latch based pipelines where only the primary clock edge (e.g. rising edge) is utilized, the performance of the circuits mentioned above is directly affected by any change in the clock duty-cycle. Bias Temperature Instability (BTI) stress in the clock signal path during idle or clock gated mode results in an aging-induced duty-cycle shift. Fig. 1 illustrates this situation in a typical clock buffer chain scenario. In an idle mode or clock gated mode, the input clock signal is not switching which results in a DC stress condition with NBTI and PBTI occurring in alternative gates. When the circuit is switched back to an active mode, the first clock signal (e.g. the first rising edge in Fig. 1 ) propagates through unstressed fresh devices while the second edge (i.e. the first falling edge in Fig. 1 ) traverses through the stressed devices. Consequently, the delay of the second edge becomes longer compared to that of the first edge due to BTI under DC stress resulting in a duty-cycle shift. Simulation results based on a 540ps delay path driven by a 1GHz clock in Fig. 2 show that the delay of the first edge is almost constant while the delay of the second edge is degraded Figure 2 . Simulated delay and duty-cycle shifts of a 540ns signal delay (=td) path driven by a 1GHz clock signal (=1/TCLK). The duty-cycle shift is a function of the initial duty-cycle, td, TCLK, and degradation of the 2nd signal edge delay. by 110ps for a 20% V t shift causing the duty-cycle to change from 50% before stress to 61% after stress.
Even though this effect has drawn the attention of designers, none of the previous aging sensors [1] [2] [3] [4] [5] [6] [7] [8] [9] were able to verify it experimentally. In this work, we present a simple and practical duty-cycle characterization method based on the "silicon odometer" beat frequency detection framework [4, 5] .
II. UTILIZING THE SILICON ODOMETER FRAMEWORK FOR DUTY-CYCLE CALCULATION
Our silicon odometer beat frequency detector which is capable of measuring stress-induced percentage change in the period of a Ring OSCillator (ROSC), can readily be used to estimate percentage change in duty-cycle of a clock driven by a chain of inverters under stress. In this section, we provide the mathematical derivation which shows how the measurements from the odometer circuit can be used to calculate the duty cycle degradation.
Consider a ROSC with m inverter stages. During stress mode, the ROSC loop is open (NAND gate not shown in Fig.  3 ) and all inverters are exposed to a DC BTI stress. Since the NMOS and PMOS devices along the ROSC signal path are alternately stressed, the shift of the ROSC period can be expressed as:
where the degradation in the inverter pull-up and pull-down delays are ∆t inv,PU and ∆t inv,PD , respectively. Now consider an inverter chain with n number of stages that has undergone the same stress amount. The shift in the duty-cycle (D.C.) of the output signal becomes:
Here, t d ' and t d are the total propagation delays of the n stage inverter chain before and after stress, respectively (see Fig. 1 ), and T CLK is the period of the input clock signal to the inverter chain. Since t d is equal to half the period of an unstressed ROSC with the same number of stages as the inverter chain, we can rewrite (t d '-t d )/t d as ΔT ROSC /(T ROSC /2) as assuming the fresh pull-up and pull-down delays are the same. Note that this quantity is independent of the number of stages m. This means that the duty-cycle degradation of the output clock can be expressed as:
If we assume that the initial duty-cycle is 50%, the duty-cycle after stress can be described as:
Therefore, by using the measured data ΔT ROSC /T ROSC from the odometer circuit and design specific parameters t d and T CLK obtained from circuit simulations, we can accurately calculate the duty-cycle shift of an arbitrary signal path.
Similar to the inverter chain example described above, we can compute the duty-cycle shift of a random logic path consisting of arbitrary gates (e.g. NAND, NOR, INV). The propagation delay of a random logic path can be expressed as:
where i and j denote the stages with a pull-up and pull-down transition, respectively. Next, we assume that the amount of delay degradation is a linear function of the threshold voltage shift and that the ratio between PBTI and NBTI is α. That is,
Since the delay degradation depends on the type of gate [10] as well as the fanout (FO) [5] , we introduce a sensitivity parameter γ to map the delay degradation of an arbitrary gate and arbitrary FO to that of an inverter with a known fanout of one. The degradation of the path delay can be now written as: 
4A.5.2
Finally, the duty-cycle after stress can be expressed as:
which can be easily calculated using the measured ROSC data (ΔT ROSC, T ROSC ) and the various design specific parameters.
The block diagram of the silicon odometer beat-frequency detection system is shown in Fig. 3 [4, 5] . A D-flip-flop is used to sense the frequency difference (=beat frequency) between a stressed ROSC and an identical fresh reference ROSC. A counter records the beat frequency by counting the number of reference ROSC period during one period of beat output. The counts are then scanned out at different stress times for dutycycle calculation.
Simulation results in Fig. 4 show an excellent match between the duty-cycle calculated based on silicon odometer data and the actual value. Duty-cycle shifts based on 65nm odometer test chips [9] under different stress conditions are plotted in Fig. 5 . The amount of duty-cycle shift increases inversely with T CLK and linearly with t d as shown in Fig. 6 which was also predicted by the equation (4).
III. IMPACT OF DUTY-CYCLE SHIFT ON SRAM TIMING
With the proposed characterization method, we can estimate the duty-cycle of critical SRAM timing signals, which can be applied to investigate the performance degradation based on circuit level simulation. We focus on the read operation of a low-power 6T SRAM [11] with a 512x256 subarray configuration as shown in Fig. 7 . When the clock is gated during idle mode, the internal clock driven paths suffer DC BTI stress as shown in Fig. 8 . Because of the duty-cycle modulation due to stress, SRAM internal control signals corresponding to 'Phase '1'' (=address decoding, wordline driving, bitline discharging) and 'Phase '0'' (=sense amp sensing, bitline precharging, data latching) of clock get elongated and shortened, respectively. The simulation waveforms of the above mentioned signals with (dashed line) and without (solid line) stress are shown in Fig. 9 . We see that in presence of BTI stress, sense amplifier enable (SAEN) signal is further delayed inducing additional time in the clock (CLK) to dataout (DOUT) delay which increases the read access time. Read performance can also get affected by the shorter bitline bar (BLB) precharging time. Worst case corresponds to the scenario when a write operation is followed by a read operation, and write and read data are of opposite polarity. In this case, due to a shorter precharge time before the read operation, BL/BLB may not be fully charged which can lead to increased sensing time or even a read failure. This insight can be used to resize the precharge circuits to prevent read failures under extreme duty-cycle shifts. Based on our simulation results, we find that after stressed at 2.2V and 140°C for 2x10 6 seconds, the read access delay is increased by 3.5% (or 25ps), the wordline (WL) dutycycle is increased from 49% to 54%, and the precharge dutycycle is decreased from 41% to 36% as shown in Fig. 10 .
IV. CONCLUSIONS
DC BTI induced duty-cycle shift affects the performance of circuits relying on both of the clock rising and falling edges, particularly in various low-power ICs with clock gating techniques. This duty-cycle shift is caused by the BTI induced aging alternately occurring to pull-up and pull-down networks on the consecutive stages along a logic path during clock gating mode, therefore the rising and falling edges are undergone different delays. In this work, we proposed a simple and practical method to accurately measure the duty-cycle based on the silicon odometer beat-frequency detection framework. Up to 6% of duty-cycle shift is observed from 65nm testchip stressed at 2.2V, 140°C for 3hrs. Based on the hardware aging results, we further analyzed the impact of the aging on low power SRAM read performance and we found with 22 days stress under 2.2V, 140°C, the read access time degraded by 3.5%. 4A.5.4
