Spin transfer torque (STT) switching realized using a magnetic tunnel junction (MTJ) device has shown great potential for low power and non-volatile storage. A prime application of MTJs is in building non-volatile look up tables (LUT) used in reconfigurable logic. Such LUTs use a hybrid integration of CMOS transistors and MTJ devices. This paper discusses the reliability of STT based LUTs under transistor and MTJ variations in nanoscale. The sources of process variations include both the CMOS device related variations and the MTJ variations. A key part of the STT based LUTs is the sense amplifier needed for reading out the MTJ state. We compare the voltage and current based sensing schemes in terms of the power, performance, and reliability metrics. Based on our simulation results in a 16 nm bulk CMOS, for the same total device area, the voltage sensing scheme offers 17% to 28% lower failure rates under combined intra-die transistor and MTJ variations, comparable delay, and 56% lower active power compared to the current sensing scheme. Moreover, we compare the reliability of the two sensing schemes under negative bias temperature instability (NBTI) of PMOS transistors. Our results indicate that the failures rates increase over time by transistor aging for both designs, and the voltage sensing scheme maintains its improved failure rate over to the current sensing scheme.
Introduction
Spin transfer torque (STT) refers to a switching mechanism resulting in change of magnetic state in a magnetic tunnel junction (MTJ) device [1] . The MTJ is composed of a fixed and free magnetic layer isolated by a thin insulator (Fig. 1) [2] . The parallel and anti-parallel magnetic state of the two layers, representing binary states, is sensed by the resulting low and high resistance across the two terminals of the MTJ [1, 2] . The current passed through the MTJ for sensing its resistance (i.e. read current) has to be less than the current needed for changing its state (i.e. write or critical current) [1, 2] .
Due to its non-volatile nature and CMOS compatibility, STT-based memory (STT-RAM) has shown great promise in addressing the leakage barrier for SRAM. While the high write power still remains to be a major obstacle for STT-RAM [3] , the application of STT-based memory in reconfigurable logic, as in field programmable gate arrays (FPGA) or reconfigurable functional units, seems more promising due to the low frequency of reconfiguration where the write power occurs [4, 25, 26] . Reconfigurable logic relies on implementing logic in small look-uptables (LUT). STT-based LUTs are realized by using MTJs as storage elements and using CMOS for interface circuitry needed for read and write operations [4] [5] [6] [7] . The CMOS interface includes a decoder/ multiplexer for selecting a unique MTJ for read/write and a sense amplifier for sensing the resistance of the selected MTJ in the read mode [4] [5] [6] [7] .
Scaling of the CMOS technology to ever smaller dimensions has posed serious reliability challenge to designs. The main cause of the issue is increasing process variations (both spatial and temporal) affecting transistor characteristics, and especially the threshold voltage (V th ). Such variations include both inter-and intra-die variations. Some causes of variations such as random dopant fluctuations (RDF) exhibit uncorrelated variations from one device to another and hence fall into the intra-die category, whereas other sources such as oxide-thickness variations tend to exhibit correlations among adjacent devices and hence fall more into the inter-die variations. In addition to transistor variations, MTJs exhibit variations in their geometrical parameters such as insulator thickness and 2D area [8] . Such variations result in variations in resistance of an MTJ [9] .
Negative bias temperature instability (NBTI) is another important reliability concern resulting in V th increase for PMOS transistors over life-time and hence impacting circuit power, performance, and reliability [12] [13] [14] [15] [16] [17] .
Microelectronics Reliability 62 (2016) [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] In this paper, we analyze the impact of CMOS/MTJ process variations and NBTI on the reliability of the STT-based LUTs. We present a comparative analysis of voltage vs. current mode sensing schemes in such LUTs. The contributions of this paper are as follows:
• Comparative reliability analysis of voltage vs. current mode sensing schemes in STT-based LUTs considering both CMOS and MTJ variations; • Statistical transistor sizing of the designs for fair comparison under same area; • Comparative reliability analysis of voltage vs. current mode sensing schemes in STT-based LUTs against NBTI aging.
The remainder of the paper is organized as follows. Section 2 introduces the voltage and current mode sensing schemes for STTLUTs. Section 3 presents the modeling of process variations and statistical sizing of the designs. The results of the process variation analysis and comparisons are discussed in Section 4. Section 5 presents the NBTI modeling and analysis and comparison. Section 6 concludes the paper.
Sense amplifier schemes for STT-LUT
An n-input LUT contains 2 n storage elements that are accessed via a decoder/multiplexer. However, in STT-LUTs, since the storage elements are MTJs that exhibit high and low resistance states, there is also need for a sense amplifier stage to compare the resistance of the selected MTJ with a reference resistance to produce full voltage swing logic one or zero signal depending on the MTJ being in the high or low resistance state (Fig. 2 ) [4] [5] [6] [7] . For high read performance and enhanced noise margin, greater difference between the low and high resistances of the MTJ is desired. This resistance differential is quantified by the tunnel magneto resistance (TMR), defined as:
where R P and R AP are the resistances of the MTJ in the parallel and antiparallel states, respectively. TMR is a technology parameter dependent on the MTJ geometries and materials.
To translate the R P and R AP into a binary full swing voltage signal in the read mode, a sense amplifier is used to compare the resistance of the selected MTJ against a reference resistor (Fig. 2) . The value of the reference resistor should be set to maximize the sensing margin of the sense amplifier for both MTJ states. In the read mode, the selected MTJ and the reference resistors are biased and their currents are passed to the sense amplifier stage. The sense amplifier can be designed to either directly amplify the current differential (i.e. current mode sensing) or a current-to-voltage conversion stage may precede a voltage mode sense amplifier. These two styles of sensing the MTJ resistance are discussed in further detail in the remainder of this section. Since the write paths remain identical, we will only discuss the read paths and compare the read performance of the two styles. Fig. 3 shows the schematic of a voltage sensing mode (VSM) 2-input (4-bit) STT-LUT [6] . This is a dynamic circuit that operates in a precharge (CLK = 0) and evaluate (CLK = 1) fashion. The MTJ selection is performed via a pass-transistor decoder/multiplexer (selection tree). To balance the transistor paths of the MTJs and the reference resistor (R REF ), similar transistors are inserted above the reference resistor. When CLK switches high, the current provided by the dynamic current source is divided between the selected MTJ and the reference resistor, resulting in a current differential that will be drained from the nodes DEC and REF. This current differential is converted to a low swing voltage differential on the nodes DEC and REF by the current-to-voltage converter circuit which is composed of the two cross coupled PMOSes. This voltage differential is then amplified by a voltage-mode sense amplifier to produce full swing differential outputs (Z and Z′).
Voltage sensing mode STT-LUT
Sensing margin is one of the metrics used to measure the reliability of reading the state of an MTJ cell [20, 21] . The sensing margin for this scheme is defined as the minimum voltage differential between the inputs of the sense amplifier (nodes DEC and REF) in the evaluation phase (when CLK switches high), when sensing R AP and R P :
This quantity right at the beginning of the evaluation cycle is zero and increases as time passes. Since the sense amplifier does not start sensing until one of the voltages, V DEC or V REF , falls below the PMOS threshold voltage, we measure the sensing margin at that time which is found by simulating the design under nominal process conditions. Fig. 4 shows the schematic of a current sensing mode (CSM) 2-input (4-bit) STT-LUT [7] . The design is similar to the VSM version except that the current differential is directly applied to a current mode sense amplifier, and hence the current-to-voltage convertor circuit is eliminated. This is also a dynamic design. When the clock is high, the sense amplifier is biased in a metastable state by shorting its outputs. The outputs approach a voltage of about V dd /2 in this case and this voltage is also applied as bias to the MTJ and reference resistors. When the clock switches low, the cross-couple inverter in the sense amplifier will switch to one of the stable states and the direction of this switching will be determined by the current differential between the MTJ and the reference resistor. Since during the biasing of the sense amplifier in the meta-stable condition (i.e. when CLK is high), the outputs are shorted, there is considerable static short circuit power dissipated on the sense amplifier. In order to reduce this short circuit power, the CLK duty cycle (high duration) should be reduced. In this research CLK has a duty cycle of 50% to maintain uniformity in the analysis and comparisons. The sensing margin for this design is defined as the minimum current differential between the two legs of the sense amplifier (I MTJ and I REF ), when sensing R AP and R P :
Current sensing mode STT-LUT
Since the sensing starts as soon as the evaluation phase starts (CLK switches low), this quantity is measured right before or at the very beginning of the start of the evaluation phase.
Process variation modeling and analysis

CMOS variations
CMOS process variations have various causes that affect transistor performance. The effect of most causes of variations can be captured as V th variation. Some sources of variations such as RDF are random (uncorrelated) in nature, whereas some other such as oxide thickness variations are correlated. The variations can be divided into two groups of inter and intra-die variations [27] . The uncorrelated and random causes belong to the intra-die category and the correlated ones to the inter-die category. We model the V th variation of a transistor by adding a DC voltage source in series with the gate terminal with a parameterized voltage level that represents the total V th shift for a transistor. This modeling allows us to do both inter and intra-die V th variation analysis. The intra-die variation considered in this study is RDF due to its prominence in scaled bulk CMOS transistors. The V th shift by RDF is inversely related to the square root of the device area (W × L) as follows [28] :
where all the technology parameters are lumped into σ vt0 which represents the standard deviation of V th variation of a minimum sized transistor with dimensions L min and W min . L and W are channel length and width of the given transistor. Sense amplifier circuits utilize differential pair transistors to do analog voltage or current comparison and hence are more sensitive to intradie V th variations that cause mismatch among neighboring transistors such as those in a differential pair. Given that bigger transistors exhibit less intra-die V th variations (Eq. (4)), it is expected that by increasing transistor sizes (W) in the LUT designs, the delay variation and failure probability should be reduced. Hence, a fair comparison between the two LUT designs should be made under same total transistor (active) area. Moreover, for a given total area constraint, it is not optimal to uniformly allocate area to all transistors, given that the V th variation of some might have more influence than others on the overall failure probability. For example, it is expected that the variations of the transistors in the differential paths of the sense amplifier to be more influential than the precharge transistors. To address this problem more formally, we define the delay to V th sensitivity metric for a given transistor, M i , in a circuit as:
where dVti is the V th variation applied to the transistor M i , T p0 is the nominal delay of the design, and T pi and T p′i represents the delay to the OUT and OUT′ of the LUT design after applying the V th variation to the given transistor, M i . Tables 1 and 2 summarize the sensitivity measurements in descending order for transistors of both LUT designs obtained by spice simulations in a predictive 16 nm bulk CMOS technology [10] . A transistor with higher sensitivity is given higher area (W) than another one with a lower sensitivity. The channel length of all transistors is kept at minimum. The width of the transistor M i (W i ) is found as follows:
where, W total represents the total width allocated for the circuit and n is total number of transistors in the circuit. We compare the designs under same total area, and transistors are sized according to their sensitivity. This ensures that for a given total area, we have optimally sized transistors for best reliability. From the V th sensitivity results, it is observed that the sense amplifier transistors are the most sensitive ones and need to be given the highest portion of the area.
MTJ variations
Besides transistors, MTJs also exhibit variations in their geometries, namely, the insulator thickness (t ox ) and the cross-section area (A) [8] . Such variations result in variations in critical write current as well as high and low state resistances during the read mode [9] . Since we are concerned about read failures for the LUTs, we model the MTJ variations as variations in high and low state resistances (R AP and R P ). Considering a 2D circular shape of radius r, the MTJ area A is expressed as πr 2 . The relation between the resistance of the MTJ and t ox and r can be expressed as [18, 29] :
where K 1 and K 2 represents all the remaining process parameters [29] . Notice that t ox and r are the major but not the only sources of variability in MTJs [20] . Here, to simplify the analysis we focus on these two primary sources. Assume t ox and r exhibit variations represented by dt and dr from their nominal values, respectively. Then, the MTJ resistance value R with respect to its nominal value R 0 can be expressed as:
where t ox0 and r 0 are the nominal values of t ox and r, respectively. K 2 is 0.6483 based on [29] . Treating dt/t ox0 and dr/r 0 as uncorrelated normal random variables with mean value of zero and standard deviation between 0 and 1, the statistical distributions of MTJ resistances can be obtained. Fig. 5 shows the distributions of R P and R AP and the reference resistor (R REF ) with 10% standard deviation applied to dt and dr (i.e. σ/ Fig. 5 , σ/μ ratio for resistance variation is 0.245. Since the variations in t ox and r affect both R P and R AP in the same manner as described in Eq. (8), the TMR is not affected by such variations and remains constant.
Results and discussions
The MTJs in both LUTs are programmed to have 50% of the MTJs in the R P and the rest in the R AP state. Simulations are performed to apply all inputs combinations and measure read delay, power, and failure rates. Fig. 6 shows typical simulation waveforms of the LUTs at the nominal process corner. Fig. 7 shows the output waveform plots obtained by Monte Carlo simulations of intra-die V th variation. These waveforms clearly show delay variations and failures caused by V th variations. Fig. 8 shows the sensing margin distributions under intra-die MTJ variations for both STT-LUT schemes. Sensing margin falling below some minimum value (i.e. the sense amplifier offset) results in failure in the sense amplifier. The sense amplifier offset itself is subject to variations. That is why the region with negative sense margin in Fig. 8 is labeled as possible rather than definite failure region. Nonetheless, the CSM styles exhibits a wider distribution than the VSM styles, indicating that the CSM style is more sensitive to process variation and should exhibit more failures. 9 shows the delay distributions obtained by Monte Carlo simulations of intra-die V th variations, where σ Vt0 is set at 23 mV, which is 10% of the NMOS nominal threshold voltage for a minimum sized transistor and after the influence of short channel effects in the 16 nm process used. The plots are obtained for both STT-LUTs optimally designed according to the V th sensitivity method for the same total active (transistor) areas of 0.02856 μm 2 (1×) and 0.04284 μm 2 (50% larger or 1.5×).
Any delay greater than 500 ps (half the total evaluation period) is considered as a failure and all the failure cases are lumped into a single bin at 1000 ps. It is observed that the VSM STT-LUT exhibits less failure rates despite having higher delay spread (σ/μ) for the success cases. The VSM exhibits 16% to 32% less failure rates compared to the CSM style. Moreover, comparing Fig. 9 (a) and (b) shows that transistor upsizing is much more effective in reducing delay and failure rate in the VSM style as compared to the CSM style. By 50% upsizing of the total active area, the failure rate of the VSM style goes down by 25% and that of the CSM style goes down by only 8%. Fig. 10 shows the LUT delay distributions and failure rates under intra-die MTJ variations for the two active areas. The VSM style again exhibits lower failure rates. The failure rate of the VSM style is lower by 29% at the 1× area and by 51% at the 1.5× area. Area upsizing is effective in reducing the failure rate in the VSM style. By upsizing the area by 50%, the VSM failure rate decreases by 19% and that of the CSM style increases by 17%. The enhanced reliability of the VSM is attributed to its two-stage signal amplification, first by the cross-coupled PMOSes in the current to voltage convertor and then by the sense amplifier (Fig. 3) . Fig. 11 shows the delay distribution and failure rates of the LUTs under inter-die V th variations. Due to the differential nature of the LUT circuits, both designs exhibit good tolerance to inter-die V th variations, as such variation do not cause mismatch among the transistors on the same circuit. It is observed that the VSM style shows less failure rates for inter-die variations as well. The VSM style shows 99% failure reduction compared to the CSM style. Again, we also observe the effectiveness of transistor upsizing on failure rate reduction for the VSM style as its failure rate goes down to 50% by upsizing whereas that of the CSM style reduces by only 18%. Fig. 12 shows the delay distributions under the inter-die MTJ variations. Neither of the LUTs shows any failures under the inter-die MTJ variations. Table 3 summarizes the numerical results of the LUTs for the two active areas. The two styles show comparable delays, but the VSM style exhibits 56% reduction in active power. The CSM style however exhibits 37% less standby power. The leakage difference is due to the fact that in the CSM style the sense amplifier is stacked on top of the selection tree offering additional stacking effect causing leakage reduction on the sense amplifier circuit. However, in the VSM style, the sense amplifier has its own spectate connections to the supply lines offering less stacking effect.
It is observed that transistor variations are much more influential in causing failures in the STT-LUTs than the MTJ variations. This is true for both intra-and inter-die variations. That is because transistor variations impact the sense amplifier whereas the MTJ variation does not affect the sense amplifier.
Under combined transistor and MTJ intra-die variations, the VSM style exhibits 17% to 28% less failure rates. Under combined transistor and MTJ inter-die variations, the VSM style exhibits 96% to 98% less failure rates. These results convincingly show that the VSM style is much more robust STT-LUT under process variations.
NBTI sensitivity analysis
The bias temperature instability (BTI) is considered a major reliability concern in nano-scale CMOS technologies [12] . It is classified into negative bias temperature instability (NBTI) and positive bias temperature instability (PBTI) [13] . NBTI impacts PMOS and PBTI impacts NMOS transistors. NTBI has been a major concern over the years especially with the emergence of high-K metal gates and the FinFET technology [14] . NBTI increases the PMOS threshold voltage resulting in current reduction and V th mismatch among PMOSes due to its bias and temperature dependence [15] . Therefore, NBTI results in reduction of the lifetime of a chip. This phenomenon is attributed to the Si/SiO 2 interface traps and the positive charges resulting from the oxide breakdown of Si-H bonds at the interface of Si/SiO 2 at high temperatures under a negative bias. It should be noted that while this paper focuses on the NBTI effect, the presented approach can be extended to analyze the impact of PBTI, or both BTI effects.
NBTI modeling
There have been several NBTI models proposed in the literature [16, 17, [22] [23] [24] . NBTI is stress bias dependent and is partially recoverable by reduction or removal of the stress bias. Therefore, there is a considerable difference between the NBTI caused by a constant DC stress and an AC stress-recovery pattern [22, 23] . Given the dynamic operation of the STT-LUTs, the transistors experience an AC stress pattern. We use a logarithmic NBTI model for cycle-to-cycle prediction of NBTI proposed in [23] . According to this model, the threshold voltage (V th ) shift due to NBTI is bias (V sg ) and temperature (T) dependent. V th drift due to NBTI is modeled as follows [23] :
where T ox is the oxide thickness, K is the Boltzmann constant, T is the temperature in Kelvin, t 0 is the initial time of a given cycle when the voltage V sg is applied, and t is the time duration the voltage V sg is kept, ΔV th (t 0 ) is initial threshold voltage shift which is the final threshold voltage shift from the previous cycle. A, B, C, β, φ 0 , E 0 , and k are Delay distributions under inter-die MTJ t ox and r variations (σ/μ = 0.1) for LUTs designed for same total active area of (a) 1× (b) 1.5×.
Table 3
Simulation results in 16 nm bulk CMOS at clock frequency = 0.5 GHz, V dd = 0.7 V, T = 110°C, V th variation: σ Vt = 23 mV, σ/μ = 0.1, MTJ t ox and r variation: σ/μ = 0. constants [23] . Under the constant DC stress, the following model predicts the NBTI over time [23] :
In this work, we have studied the NBTI induced V th increase at the worst case temperature of 110°C. For the stress time, we consider 0 (initial time), 1E+3 s (1 ks), and 1E+6 s (1 Ms). Given the bias dependence of NBTI, the actual V th increase depends on the PMOS biasing and activity pattern in a given circuit over long periods of time. For example, for a PMOS whose gate is connected to the clock signal (e.g. the perchance transistors MP0, MP7, and MP8 in Fig. 3 ) the activity pattern is very deterministic in the sense that the PMOS is under stress when the clock signal is low. Assuming a clock signal with 50% duty cycle, the precharge PMOSes are under full stress for half the clock period and under full recovery for the other half period. Eq. (9) is iteratively solved from one half clock period to another to estimate NBTI after the desired life-time. For other PMOS transistors, the stress time and voltage is data and bias dependent. For example, the output connected PMOSes in the VSM STT-LUT (MP3 or MP4 in Fig. 3 ) or CSM STT-LUT (MP0 or MP1 in Fig. 4) , are under stress anytime the corresponding output (OUT (Z) or OUT′ (Z′)) is low, and this make their aging dependent on the output data activity pattern which is related to the data stored in the LUT and the input activity pattern. Similarly, the aging of the PMOSes connected to the selection and reference trees (MP5 and MP6 in Fig. 3 ) is dependent on the data activity pattern. Given the differential nature of the sensing circuits, the PMOSes that are paired (i.e. (MP3, MP4) and (MP5, MP6) in Fig. 3 and (MP0, MP1) in Fig. 4 ) are expected to have same driving strength (i.e. sizing and V th ) for maximum variation tolerance and yield. However, the data activity dependence of NBTI aging may result in asymmetric aging and hence increased mismatch among the paired transistors. Hence, we consider two scenario of NBTI aging: symmetric vs. asymmetric NBTI aging of the paired transistors. In the symmetric case, the output has same probability of switching to one or zero in a given clock evaluation cycle and hence the pair transistors age at the same rate. For the asymmetric case, we consider a worst case scenario that the output has 100% probability of obtaining the same logical value at every clock evaluation cycle. Another factor that influences the rate of NBTI aging is the V sg bias when the transistor is under stress. For the precharge transistors whose gate is connected to the clock signal, their V sg bias is at maximum V dd when the clock signal is low. Similarly for the PMOSes whose gates are connected to the outputs in the VSM-LUT circuit (MP3 and MP4 in Fig. 3) , their V sg bias is maximum when under stress because the outputs are full swing. The V sg stress for the remaining PMOSes may not be full V dd because their gates may not experience full voltage swing. For example, nodes DEC and REF in Fig. 3 swing between V dd and 0.5V dd according to the simulation waveforms (Fig. 13) . Hence, when under stress, MP5 and MP6 in Fig. 3 are stressed at V sg = 0.5V dd . MP0 and MP1 in CSM STT-LUT (Fig. 4) experience stress at two possible V sg biases (V dd and 0.5V dd ). In the precharge mode, when clock is high, the outputs (OUT and OUT′) are tied together and reach 0.5V dd (see waveform in Fig. 6(b) ). In this case, both MP0 and MP1 are under stress at V sg = 0.5V dd . When clock switches low, one of the outputs switches to V dd and the other one to zero. Therefore, when clock is low, one of the PMOSes will experience no stress and the other one stress at full V sg = V dd . With this understanding and definitions, we can now discuss the symmetric and asymmetric NBTI aging for VSM and CSM ST-LUT circuits. Due to the differential nature of the read circuit in STT-LUTs, if the outputs switch uniformly in time between logic '0' and '1' states, the PMOSes in a pair will age symmetrically. If the output predominantly switches to only one logical state in each read cycle, then the aging between the PMOSes in a pair will be asymmetric (one will age more than the other).
VSM STT-LUT symmetric NBTI aging
The output waveforms corresponding to this stress pattern are shown in Fig. 13(a) . In one cycle, output (Z) switches to high and in the next cycle inverted output (Z′) does. MP3 (MP4) is under full stress (V sg = V dd ) anytime Z (Z′) is low. Hence, for 1.5 clock period, MP3 (MP4) is under full V dd stress and for 0.5 clock period, it is under full recovery (V sg = 0). MP5 and MP2 (MP6 and MP1) are under stress when the node REF (DEC) drops below V dd − V th . These nodes drop as low as 0.5V dd during the evaluation phase of the clock (Fig. 13(a) ). Hence, the NBTI aging of MP1, MP2, MP5 and MP6 should be estimated for a cyclic pattern composed of V sg = 0.5V dd stress for 0.5 clock period, and full recovery for 1.5 clock period.
VSM STT-LUT asymmetric NBTI aging
The output waveforms corresponding to this stress pattern are shown in Fig. 13(b) . Only one of the outputs (Z′) switches to high every cycle; while the other output (Z) stays low all the time. In this case, MP3 is constantly under full stress (V sg = V dd ), while MP4 is under full stress for half the clock period and under no stress (i.e. recovery) for the other half period. In this case, MP3 ages faster than MP4 and gains higher V th over time, hence asymmetric aging. Similarly, since the voltage of the node REF remains at V dd , MP2 and MP5 are not stressed at all and MP1 and MP6 are under V sg = 0.5V dd stress for half the clock period and no stress for the other half period. This also results in asymmetric aging and hence increased mismatch of MP2 (MP5) and MP1 (MP6).
CSM STT-LUT symmetric NBTI aging
The output waveforms corresponding to this stress pattern are shown in Fig. 14(a) . In one cycle, output (OUT) switches to high and the inverted output (OUT′) switches to low, and in the next cycle, the opposite transitions occur. Both MP0 and MP1 are under stress of V sg = 0.5V dd in the precharge phase (CLK = 1) which lasts half the clock period. MP0 (MP1) is under full stress V gs = V dd when OUT (OUT′) switches to zero, which lasts half the clock period. MP0 (MP1) is under no stress (i.e. recovery) when OUT (OUT′) switches to one, which lasts half the clock period. The period of the output waveform is twice that of the clock period. The net effect of NBTI aging is computed using the cycle-to-cycle NBTI model, where each cycle is composed of four voltage levels each lasting for half the clock period.
CSM STT-LUT asymmetric NBTI aging
The output waveforms corresponding to this stress pattern are shown in Fig. 14(b) . Only one of the outputs (OUT) switches to high every cycle; while the other output (OUT′) switches low. Both outputs remain at 0.5V dd in the precharge phase (CLK = 1). Hence, MP0 and MP1 are under V sg = 0.5V dd stress for half the clock period. MP1 is additionally under V sg = V dd stress for the other half period. MP0 however is under full recovery (V sg = 0) for the other half period. This asymmetric NBTI aging between MP0 and MP1 results in increased V th mismatch between these two transistors.
Failure rate analysis under NBTI
Now we perform Monte Carlo simulations as explained in Section 4 for failure rate measurements under intra-die V th variations; however, after applying symmetric and asymmetric scenarios of NBTI aging for PMOSes after life-times of 0, 1 ks, and 1 Ms. For the PMOS transistors, the NBTI induced V th shift is added as a fixed change to the V th in addition to the random change caused by the intra-die V th variations. We also perform this simulation for two circuit areas that were considered in Section 4. The results are shown in Fig. 15 . It is observed that the failure rate increases with NBTI aging in all cases. Moreover, as expected we note that the failure rate increases at a higher rate in the asymmetric scenario of aging as compared to the symmetric scenario for all the cases. We can argue that the symmetric scenario of aging represents the best case of impact of NBTI aging on the failure rate and the asymmetric scenario represents the worst case impact. In the 1 Ms lifetime, the failure rate increase is 0.9% to 8% for the VSM STT-LUT and 1.6% to 5.3% for the CSM STT-LUT. The increasing failure rates over time do not change the relative failure rate comparisons of the VSM and CSM STT-LUTs. The results in Fig. 15 reaffirm that the VSM STT-LUT is significantly more reliable against V th variations and NBTI aging than the CSM STT-LUT for a given circuit area, and the reliability improves by increasing the circuit area.
Conclusion
Reliability assessment of various circuit design styles is an important consideration in nano-scale CMOS/MTJ hybrid technologies. This paper performed a comparative reliability analysis of the STT-based LUTs and determined that the VSM style shows superior reliability as compared to the CSM style under same design area. This reliability enhancement is present against not only transistor variations but also MTJ variations. The VSM improved reliability also comes with less propagation delay and active power consumption. Moreover, the improved reliability of the VSM style is maintained under NBTI aging.
