Abstract: Random telegraph noise (RTN) has become an important reliability issue in nanoscale circuits recently. This study proposes a simulation framework to evaluate the temporal performance of digital circuits under the impact of RTN at 16 nm technology node. Two fast algorithms with linear time complexity are proposed: statistical critical path analysis and normal distribution-based analysis. The simulation results reveal that the circuit delay degradation and variation induced by RTN are both >20% and the maximum degradation and variation can be >30%. The effect of power supply tuning and gate sizing techniques on mitigating RTN is also investigated.
Introduction
In recent years, as the channel length of MOSFETs continues to shrink into nanoscale, a variety of reliability mechanisms, such as negative bias temperature instability [1, 2] , time-dependent dielectric breakdown [3] and random telegraph noise (RTN) [4] , are becoming key challenges for circuit designers. During the working life of devices, these physical phenomena will degrade the electrical parameters such as the drain current (I d ) and the threshold voltage (V th ), leading to degradation of the circuit operation speed and logic failure. This paper addresses RTN since it is an emerging research topic.
RTN can cause electrical parameters (such as V th and I d ) to exhibit random fluctuations as a function of time [5] . Recent studies have shown that the RTN-induced fluctuation becomes quite large and can be more significant than the random dopant fluctuation at 22 nm technology node [6] . For example, the drain current fluctuation induced by RTN has been already identified as a large obstacle in both sub-V th and super-V th operation of digital circuits [7] . The variation of I d caused by RTN can be up to 40% for 30×30 nm devices [8] .
The physics of RTN has been widely investigated [7] [8] [9] [10] and the RTN effect on SRAM and flash memories has been also studied [11] [12] [13] [14] [15] [16] . Although some models which can be integrated into HSPICE analysis have been proposed [17] [18] [19] , the impact of RTN on the temporal performance of digital circuits has been rarely studied [20] . Therefore our contributions in this paper distinguish itself in the following aspects: † This paper proposes a simulation framework to evaluate the impact of RTN on the temporal performance of digital circuits. Two fast simulation methods are proposed: statistical critical path analysis (SCPA) and normal distribution-based analysis (NDA). The computational complexity of the two methods are both O(N). † The impact of RTN on circuit delay degradation and variation is investigated. The experimental results show that RTN degrades the circuit delay and increases the delay variation. The average delay degradation and variation are both > 20% at 16 nm technology node. The results also demonstrate that the performance degradation and variation will grow rapidly with supply voltage scaling down. † The effect of power supply tuning and gate sizing techniques on mitigating RTN is investigated. The simulation results show that gate sizing is better than power supply tuning.
The rest of the paper is organised as follows. Section 2 reviews some previous work on RTN. Section 3 introduces the RTN model used in this paper. Section 4 proposes the RTN simulation framework and the evaluation methods. The simulation results are presented in Section 5. The impact of design techniques on RTN mitigation is investigated in Section 6. Finally, Section 7 concludes the paper. from the capture and emission of the channel carriers by interface traps [9] . A systematic study of the channel length, width and gate overdrive dependencies of RTN effects was carried out in [7] . A new method to characterise the oxide traps considering the energy band structure of high-k/metal gate MOSFETs was proposed in [10] . In [21] , a method to determine whether an oxide trap leading to RTN was located in the high-k layer or the interface layer was proposed.
The RTN effect in SRAM and flash memories has been investigated recently. For example, the RTN effect in deca-nanometer flash memories was investigated in [11] and the statistical distribution of V th was also analysed. The read/write margins of scaled-down SRAM with/without RTN were simulated in [12] . In [14] , the impact of RTN on V min in scaled SRAM was analysed. It was reported that RTN-induced V min degradation could be up to 50 mV in 45 nm SRAM [13] . An accurate computational method for trap-level, non-stationary analysis of RTN in SRAMs was presented in [15] and a technique for predicting the impact of RTN on SRAMs/DRAMs in the presence of variability was further proposed in [16] . However, the continuous-time simulation approach used in [16] was too complex and not suitable for circuit-level performance evaluation.
It is believed that RTN can be also a serious issue in digital circuits. A Shockley-Read-Hall-based model to explain the RTN behaviour was proposed in [17] . A methodology to include RTN in circuit analysis was proposed in [18] and the transient analysis was applied on the four-quadrant Chible multiplier circuit. A two-stage L-shaped circuit to generate RTN signal which was fully compatible with SPICE was proposed in [19] . In [20] , a time-domain delay model was used to simulate and measure the fluctuation of RTN. However, this approach could be only applied to simple circuits such as SRAM cells and ring oscillators because of the extraordinary computational complexity. Hence in this paper, the delay characterisation of digital circuits is investigated and two fast algorithms are performed on circuit-level analysis for RTN. Design techniques for mitigating RTN are further studied, enabling time-domain analysis in nanoscale digital circuit design.
Modelling random telegraph noise
This section first presents the physics of RTN and then the RTN-induced ΔV th model for digital circuits is introduced.
Physics of RTN
The RTN effect is originated from the capture/emission of charge carriers by the oxide traps, which will induce correlated fluctuations of channel carrier number and mobility [9] . As shown in Fig. 1a , a carrier (the solid circle) is occasionally captured by a trap (the hollow circle) in the oxide and the carrier will be emitted back into the channel after a period of time. Multiple capture/emission events can occur at the same time, as shown in Fig. 1b [22] . The traps in the oxide have two states: the 'filled' state, which indicates the carrier is captured by the trap and the 'empty' state indicating the carrier is emitted back into the channel. For a given trap, the transition between the two states is inherently random and the activity of a single trap can be modelled as a two-state time-inhomogeneous Markov chain [15] .
In the time domain, because of the RTN effect, the drain current I d shows a fluctuational waveform as shown in Fig. 2a . The high level of I d corresponds to the low level of RTN, at which the trap is empty and the carrier is emitted back into the channel and the time spent in this state is the emission time τ e . At the other side, the low level of I d corresponds to the high level of RTN, at which the carrier is captured by the trap and the trap is filled and the time spent in this state is the capture time τ c [9] . Both the capture time τ c and emission time τ e are time-varying and they depend on the position of the traps, the trap energy level and the gate overdrive voltage V gs − V th [9, 15] . The typical values of τ c and τ e are about 1-1000 ms [9] .
In the frequency domain, the power spectral density of the drain current I d shows a Lorentzian shaped spectrum with the slope of 1/f 2 , as shown in Fig. 2b [10] . The cut-off frequency is
The time constant τ cut is defined as [19] 1
RTN-induced V th fluctuation in digital circuits
To model the RTN effect in digital circuits, the equivalent circuit is used [14] , as shown in Fig. 3 . The high current state in Fig. 2a corresponds to the left device in Fig. 3 and there is no shift in the threshold voltage. The right device shows the low-current state induced by RTN, which is modelled by a shift in the threshold voltage ΔV th and the shift is given by Ye et al. [19] DV th = nq C ox WL
where n is the number of oxide traps, q is the elementary charge, C ax is the unit area capacitance, whereas W and L are the channel width and channel length, respectively. Since the magnitude of single-trap-induced RTN sharply goes up as device shrinks [19] , this paper targets at the single-trap-induced RTN fluctuation as shown in Fig. 1a . Equation (3) indicates that RTN depends on the area of the device and experiments show that the gate overdrive voltage can also affect the RTN amplitude, and hence the V gs dependence of ΔV th is an approximate quadratic function [20] 
where λ is a constant that can be fitted by experimental data. It is shown that ΔV th can be > 70 mV for the smallest devices at 22 nm technology node [6, 23] shows that the RTN amplitude increases superlinearly with the scaling down of the device's size. Hence, ΔV th is expected to be as much as 130 mV at 16 nm technology node.
RTN evaluation in digital circuits
As described in Section 2, the capture time τ c and emission time τ e are both at millisecond-order [9] , whereas the clock cycle of a digital circuit is at nanosecond-order. The operation of a digital circuit is much faster than the transition between high-and low-current states, thus during the operation time [t, t + Δt) of the digital circuit, all the traps are considered to keep their filled/empty states. Therefore the 'sampling' method can be used as shown in Fig. 4 : the trap states at time t are sampled to evaluate the RTN-induced temporal performance of the digital circuit at t.
The trap state of a MOSFET at time t can be described by a random variable S(t), which has two discrete values: 0 corresponding to empty state and 1 corresponding to filled state. The probability distribution of S(t) is determined by the capture time and emission time, which is given by
where r = t c / t e , which is a constant only depending on the trap energy level and Fermi level and its typical value is from 0.1 to 10 [19] . Thus, when the circuit is 'sampled' at time t, the threshold voltage of a given MOSFET is V th (t) = V th 0 + S(t)DV th (6) where V th0 is the initial threshold voltage. Since all the traps in the device are independent, all S's are independent. Therefore by the 'sampling' method, Monte-Carlo (MC) simulations can be adopted to evaluate the circuit performance under RTN. One MC simulation can be considered as one 'sample' at some time node of the given circuit and the value of S can be randomly set to 0 or 1 according to the value of r. Then, traditional static timing analysis (STA) tools can be used for subsequent simulations. However, the MC simulations are time-consuming. Thus, new faster simulation algorithms will be proposed in the following sections.
RTN evaluation framework
The proposed framework for RTN evaluation is shown in Fig. 5 . First, HSPICE is used to create a gate library based on the 16 nm predictive technology model (PTM) [24] . The gate library includes delay, area and oxide capacitance of each gate type (i.e. NAND2X1, NAND2X4, OR2X1 etc). Then, a private STA tool written in C+ + is used to calculate the delay of all the paths in the circuit and find the critical paths. An RTN ΔV th calculator is used to calculate ΔV th of all the gates according to (4) . Finally, the delay distribution of the circuit is calculated by a delay distribution calculator. In the next two sections, we will introduce two algorithms to perform the distribution calculation step. The first method is called SCPA method and the second is called NDA method.
Statistical critical path analysis
The maximum circuit delay is determined by a set of critical paths in the circuit, which is described by
where d c is the maximum circuit delay and d cp,i is the delay of the ith critical path. The delay of a critical path is
where d j is the delay of the jth gate in the path. The propagation delay of a logic gate j is
where K j is a coefficient related with device physical parameters, A j is the equivalent area of the gate, C L,j is the load capacitance and α is the velocity saturation index. Combined with (6), the RTN-induced delay shift of gate j is
Hence Δd j is also a random variable and has a similar probability distribution as S, which is given by
For simplicity, let
, and
The delay shift of a critical path is also a random variable
where Δd cp varies from 0 to Σ j t j . The probability distribution of Δd cp can be calculated by convoluting all the probability distributions of Δd j 's in the path (i.e. first the convolution of d 1 and d 2 is calculated, then d 3 is added and finally all d j 's are summed up), since they are independent.
Finally, the delay shift of the circuit caused by RTN is the maximum distribution of all the critical paths
The cumulative distribution function (CDF) of Δd c is the product of all the CDF's of Δd cp,i . For a given critical path, since each Δd j has two discrete values: 0 and t j , Δd cp will have 2 N discrete values, where N is the number of gates in the path. This indicates that it is impractical to directly calculate the distribution of (12), since the time and space complexity are both O(2 N ). To reduce the complexity, we use a grouping method to construct the approximate distribution of the partial sum f L = L,N j=1 Dd j . First, a new random variable Φ is constructed, whose distribution is defined by
where
is the probability mass function (PMF) of φ L . Here, M is a user-defined parameter and larger M leads to better approximation. Second, the probability distribution of Φ is denoted by the probability of M discrete values, which is given by
This method redistributes 2 L discrete values into M discrete values. In this paper, M = 64 is adopted.
Obviously, by using the grouping method, the computational complexity reduces to O(2MN). Since M is a constant, the computational complexity is O(N ). This algorithm is described in Algorithm 1 (see Fig. 6 ).
Normal distribution-based analysis
This section presents another alternative method to calculate the delay distribution of the circuit, called NDA, which is based on the following theorem.
Theorem: For a given critical path that has N gates, the delay shift of each gate caused by RTN is described by (11) , then
where N(·,·) denotes the normal distribution, E(·) and D(·) are the expectation and variance, respectively.
Proof: Following (11), the expectation and variance of Δd j are
www.ietdl.org 
In practice, all t j 's are limited in a range [t min , t max ] (t max and t min are constants, t max > t min > 0), hence we have
This reveals that Dd cp = N j=1 Dd j satisfies the condition of Lyapunov's central limit theorem (CLT) [25] , hence Δd cp is a normal distribution when N is infinite. The expectation and variance are N j=1 E Dd j and N j=1 D Dd j , respectively. □ Based on the above theorem, we suppose that the delay of each critical path follows normal distribution since N is usually large enough to fit the CLT, then the distribution of circuit delay is the maximum distribution of several independent normal distributions, which can be calculated by Clark's formula [26] and the maximum distribution is still a normal distribution.
We believe that NDA is faster than SCPA, since the computation is much simpler. However, if N is small, NDA will get large error.
Experimental results

Experiment setup
The experiments are implemented on a PC with an Intel Q9550 CPU and 4 GB DRAM. 24 ISCAS85 and ALU circuits are used to evaluate the proposed algorithms. The device model is the 16 nm high-performance PTM model 
Comparison with MC
This section compares the results obtained from SCPA and NDA with MC simulation. Two examples (c3540 and log64) are shown in Figs. 7 and 8 . The X-axis is the delay values and the Y-axis is the probability.
For c3540, the expectation of the circuit delay is 2.89 ns, which is obtained by MC; whereas SCPA and NDA both get 2.85 ns, the relative error is only 1.4%. In addition, SCPA, NDA and MC all get similar distributions for c3540.
For log64, SCPA and MC obtain similar distributions. However, the distribution shape obtained by NDA is significantly different from that obtained by MC or SCPA. The reason is that NDA assumes the circuit delay is a normal distribution, but the maximum length of the critical paths of log64 is only 11, which does not fit the CLT. Fortunately, for most circuits, the maximum length of the critical paths are large enough to fit the CLT, and hence NDA is ineffective for only few circuits. Table 1 shows the simulation time of MC, SCPA and NDA, together with the setup time, number of gates and the maximum length of the critical paths. Obviously, SCPA and NDA are both much faster than MC. It shows that on average, SCPA is about 1000× faster than MC and NDA is about 50× faster than SCPA. Hence SCPA and NDA can be both used for larger-scale circuits. Table 2 , the average delay degradation and variation are both >20%. Meanwhile, the maximum delay degradation and variation can be >30%. The results demonstrate that RTN will be a very serious obstacle in circuit reliability in the deca-nanometer regime, which exhibits in the following two aspects: † RTN can cause significant circuit performance degradation, leading to serious timing violation. The possible minimum delay as shown in Figs. 7 and 8 is still greater than d 0 . Hence, the RTN effect must be considered in circuit design. † The RTN-induced delay variation can lead to greater non-determinacy on circuit delay. Thus, statistical analysis should be considered in RTN evaluation. www.ietdl.org
Circuit delay distribution analysis
Power supply scaling analysis
Equation (4) shows that the circuit delay degradation can be affected by the power supply voltage (V dd ) and scaling down of V dd decreases the RTN effect. The performance degradation and variation under different V dd for c1355 and c3540 are shown in Fig. 9 , which are obtained by NDA.
The results show that with V dd scaling down, both the temporal performance degradation and variation decrease. However, when V dd decreases, the intrinsic delay increases.
RTN mitigation in digital circuits
In this section, we apply power supply tuning and gate sizing techniques on digital circuits and simply demonstrate the efficiency of such techniques on mitigating the RTN-induced delay degradation and variation.
Power supply tuning
This section investigates the impact of V dd tuning on the maximum circuit delay under RTN. Although increasing V dd increases the delay degradation and variation (Fig. 9) , the circuit intrinsic delay is reduced and the maximum circuit delay under RTN still decreases, as shown in Fig. 10 . However, if the intrinsic delay at V dd = 09 V is chosen as the design specification (i.e. d 0 (V dd = 0.9 V)), the maximum circuit delay at V dd = 1.1 V can not satisfy the design specification. In addition, the dynamic power increases by 49.4% when V dd = 1.1 V.
To simultaneously reduce the RTN-induced maximum delay and the dynamic power overhead by V dd tuning, the dual V dd assignment technique can be adopted. In this method, only the gates along the critical paths are tuned to high V dd . The simulation results are shown in Table 3 , obtained by MC. In this table, 'full' means that all the gates are tuned to high V dd and 'critical' means the dual V dd method.
, and DP is the dynamic power overhead. In this experiment, high V dd is 1.1 V. The results reveal that average Δd max = 33.6% when V dd = 0.9 V (nominal design). By using the 'full' tuning method, the maximum delay is 12.8% larger than the design specification and the delay variation is increased to 27.2%. By using the dual V dd method, the maximum delay is 13.9% larger than the design specification and the power overhead is 20.3%, less than a half of that in the 'full' tuning method. This reveals that the V dd tuning method can reduce the RTN-induced maximum delay compared with the nominal design. However, the efficiency is very limited and the power overhead is large. Actually the effect of V dd tuning completely comes from the reduction of the intrinsic delay.
Gate sizing and replacement
Equation (4) indicates that RTN strongly depends on the area of the device. Thus, this section investigates the effect of the Assuming that the area of a gate j in (9) becomes rA j (ρ > 1 is the sizing coefficient), according to (4), the RTN-induced delay of this gate becomes
Thus, the delay will degrade by
Compared with (10), sizing can mitigate the RTN-induced delay degradation. Meanwhile, the term (1/ρ 2 ) indicates that the delay variation can be also reduced.
The gate sizing technology on an 'AND2X1' gate is shown in Fig. 11 . The intrinsic delay is 0.63 ns when driving an 1 fF load capacitance. The delay varies from 0.63 to 0.763 ns without sizing. When ρ = 1.1, the delay varies from 0.573 to 0.691 ns; whereas for ρ = 1.2, the delay varies from 0.525 to 0.567 ns.
The above results show that a larger gate has smaller RTN-induced delay degradation and variation, thus in the standard cell design flow, the original gates can be replaced by the corresponding larger gates in the library. Two replacement strategies are applied: 'full' replacement (replace all the gates) or 'critical' replacement (only replace the gates along the critical paths). Fig. 12 shows the sizing results for c1355 and c3540, using the 'critical' replacement method. The intrinsic delay is still chosen as the design specification. It indicates that when ρ = 1.15, the maximum delay under RTN is almost below the specification line. Hence, ρ = 1.15 is chosen for the subsequent experiments.
The results of gate sizing for all the benchmarks are shown in Table 4 , where ΔA is the area overhead. The results reveal that by using the 'full' replacement method, the maximum delay is on average 6% smaller than the design specification and the delay variation is 6.2%, which is much smaller than the results without sizing. By using the 'critical' replacement method, the maximum delay still satisfies the design specification and the area overhead is only on average 5.7%. Compared with V dd tuning, gate sizing is much better: the efficiency is higher and the overhead is smaller.
Conclusions
This paper proposes a simulation framework to evaluate the RTN-induced temporal performance degradation and variation of digital circuits. Two fast evaluation methods with linear time complexity are proposed. The experimental results show that the average degradation and variation at 16 nm can be both >20%. Two design techniques, power supply tuning and gate sizing, are applied to mitigate the RTN effect and the simulation results show that gate sizing is better than power supply tuning.
The RTN-induced fluctuations are independent in all the devices, which causes very random performance distribution. Enough performance margin should be reserved to compensate the impact of RTN and design techniques, such as power supply tuning and gate sizing, should be investigated to mitigate the RTN effect. In addition, more efficient circuit-level and architectural-level techniques with less overheads should be investigated in future work. 
Acknowledgments
