Abstract-With the continuing scaling of MTJ, the high-speed reading of STT-RAM becomes increasingly difficult. Recently, a body-voltage sensing circuit (BVSC) has been proposed for boosting the sensing speed. This paper analyzes the effectiveness of using the reference calibration technique to compensate for the device mismatches and improve the read margin of BVSC. HSPICE simulation results show that a 2-bit reference calibration can improve the worst-case read margin in a 1-Mb memory by over 3 times. This leads to up to 30% higher yield across all process corners. In order to maintain the yield improvement even in the worst-case corner, independent calibration circuitry has to be deployed for each memory array.
I. INTRODUCTION
S PIN-TORQUE transfer RAMs (STT-RAMs) have been the subject of extensive research in the past several years. STT-RAM is often perceived as the "universal memory" due to its potential for high density, low energy, and high speed. Prototypes incorporating smaller cell size than SRAM, better performance than DRAM, non-volatility of Flash, and the endurance on the order of read/write cycles have been reported [1] - [8] . Moreover, the switching current reduction, driven by the dimension and critical current density scaling of the magnetic tunnel junction (MTJ), has been pushing down the power consumption of STT-RAM toward embedded and mobile applications [9] - [15] .
With the continuing scaling of MTJ, the high-speed reading of STT-RAM becomes increasingly difficult, not only because both the CMOS and MTJ variability keep increasing, but also the switching current of MTJ will reach the order of 10 , which can be very challenging for reliable high-speed sensing ( Fig. 1) [16] . To improve the sensing, our previous work [17] implements the concept of short pulse reading (SPR) [16] to allow a higher read current for better sensing speed. This paper [7] .
enhances the reliability of the proposed body-voltage sensing circuit (BVSC) [17] by adding the capability of calibrating the reference voltage level. The enhanced BVSC features shorter sensing time for higher sensing margin and less read disturbance as compared to prior sensing circuits, which makes the scheme suitable for future technology scaling. The key component in the BVSC approach is the body-connected load [17] , [18] . The body-connected load utilizes bodyvoltage modulation (BVM) to adjust the sensing voltage according to the sensing current. While the other types of sensing circuits that adopt the diode-connected or the current source load suffer from deficiency either in sensing margin or speed, the BVSC is optimized to support both features [17] . However, BVM is more sensitive to the threshold voltage variation as compared to gate-voltage modulation (GVM) [17] . As a result, a shifting of the sensing voltage in the worst case corner would deteriorate the effective sensing margin of BVSC. If the reference voltage level is to be fixed, the corresponding read margin would also be degraded. This paper explores the feasibility of using a reference calibration technique to recover the read margin loss due to process variations for BVSC. The main motivation is to improve the stability of BVSC for better yield of STT-RAMs. The next section briefly reviews the SPR concept and the BVSC with the definitions of sensing margin and read margin. Section III describes the main idea of the reference calibration in details. Section IV discusses simulation results showing that a simple 2-bit reference calibration improves the worst-case read margin in a 1-Mb memory by over 3 times, leading to a significant yield increase (up to 30%) across all corners. Conclusions are presented in Section V.
1549-8328 © 2013 IEEE Fig. 2 . Basic MTJ structure. Switching current from the fixed (free) to the free (fixed) layer switches the MTJ into a parallel (anti-parallel) state.
II. HIGH-SPEED READING OF STT-RAM THROUGH BVSC

A. STT-RAM and SPR
MTJ is the storage element of STT-RAM. It consists of two ferromagnetic layers separated by a thin nonconductive tunneling barrier (e.g., MgO) as shown in Fig. 2 . The thicker ferro magnet with fixed magnetic orientation is called the fixed layer or the pinned layer. The thinner layer with flexible magnetic orientation is called the free layer. The MTJ exhibits two resistive states determined by the relative magnetization directions of the fixed and free layers: a parallel (P) orientation produces a low resistance and an anti-parallel (AP) orientation results in a high resistance . The resistance difference between the two states is measured by the tunnel magneto-resistance ratio (TMR), defined as . A higher TMR indicates better readability and is thereby preferred by the reading operation.
In STT-RAMs, data is stored in MTJs in a magnetic form: "0" and "1" are represented by magnetization direction of the free layer. The switching of the MTJ can be controlled by a bi-directional writing current as shown in Fig. 2 : the current in the direction from the fixed (free) to the free (fixed) layer writes the MTJ into the AP (P) state. As shown in Fig. 3 , the switching probability of MTJ can be characterized as a function of both the switching current and the switching time (duration of the switching current) [19] . In STT-RAM design, the sensing current distribution has to be kept within the 0% switching probability region in order to avoid destructive read [17] . Note that 0% switching probability corresponds to low read currents for long read durations and high read currents for short read durations. Different from the low current reading (LCR) scheme, the SPR scheme senses the cell orientation with a current that is close in amplitude to the writing current but with much shorter pulse to improve the sensing speed without risking read disturbance [17] .
B. BVSC
A schematic of the sensing stage of BVSC is shown in Fig. 4 . In the sensing circuit, the resistance difference between the and states is captured by a sensing current difference , which is converted into a voltage difference through a load transistor . The conversion ratio from to is given by the small-signal resistance of the load transistor as
(1) Fig. 3 . The switching characteristic of the MTJ [19] with illustrations of the sensing current distribution in the short pulse reading (SPR) and the low current reading schemes. Note that the two sensing voltages and are strongly affected by the process and parametric variations in CMOS and MTJ devices. Using the sensing voltage statistics, we define the worst-case margin between and as the sensing margin (SM), given by (2) According to our previous study [17] , a small (as in a diode-connected load) leads to a high sensing speed but low sensing margin. On the other hand, a large (e.g., a current source load) results in slow speed and large variation of the sensing voltage, leading to limited sensing margin. In order to provide large sensing margin while maintaining sensing speed, BVSC uses a body-connected load, which has an effective 5-6 times bigger than that of the diode-connected load, but 2-3 orders of magnitude smaller than that of the current source load. As a result, BVSC offers balanced sensing speed vs. sensing margin tradeoff.
The sensing margin defined in (2) statistically characterizes the quality of resistance sensing in presence of device variability, but it does not show the exact voltage difference seeing at the sense amplifier input when reading a memory cell. In order to capture that, we define the read margin (RM) as the readability of the sensing voltage given a reference voltage and the input-referred offset of a sense amplifier. According to the resistance state of the memory cell being read, we have and defined as (3) and (4) Respectively. The overall RM is defined as the worst case of the two,
Ideally, should be equal to the common-mode level of the sensing voltage, as , and it can be generated by a voltage divider network that connects the of two sensing stages that sense and , respectively. However, for the memory cells in an array sharing the same sense amplifier, and are subject to process variations and could be independent variables, whereas, and have to be common factors. As a result, the optimal for each array is subject to the actual distribution of the sensing voltages and should be determined on a case-by-case basis. In order to find the optimal for maximizing the read margin, a reference calibration method is proposed and discussed in the following sections.
III. DEVICE VARIATION AND REFERENCE CALIBRATION
A. Impact of CMOS and MTJ Variations
The device variation of MTJ can be lumped into independent Gaussian variations of and TMR [20] , [21] . The effect of such variations together with the variation of access transistors on the sensing behavior can be visualized as shown in , can be obtained. The variation of each device contributes to the variation of , yet at different levels. In memory design, different memory arrays are usually driven by independent sensing stages. For each array, the variations of and tend to affect its statistics globally-they shift the mean of distribution of the whole array. On the other hand, the MTJ device variation tends to populate of each single cell around the mean locally. These combined global and local deviations may cause substantial yield loss if a fixed global is used. Another potential problem which may reduce the effectiveness of the body-connected load is that the P-N junction formed between the source and the body regions of the PMOS transistor can be turned on if is well below . This junction leakage would cause BVM to haveweaker control and the effective to increase (Fig. 5) . As a result, the effective sensing margin would be reduced if the operating point of is shifted beyond the inflection point shown in Fig. 5 due to process variations. This effect is the most prominent in the fast-NMOS slow-PMOS (FS) corner. Fig. 6 shows the histograms of in a 1 Mb memory at typical and FS corners. During sensing, is in deep saturation such that its drop will vary substantially if the sensing current changes due to process variation. On the other hand, during sensing, the sensing current at FS corner is sufficiently high such that may enter the linear region. As a result, the distribution shows relatively larger shift between the two corners (typical and FS) than that of the distribution as shown in Fig. 6 , and the effective sensing margin is degraded in the FS corner. Such shifting would cause read errors if a fixed is used. Therefore, a reference calibration scheme that can adaptively adjust according to the sensing voltage statistics is desired in order to best utilize the sensing margin, especially at the FS corner.
B. Reference Calibration
As multiple MTJ cells in the same array share a sensing circuit, the variations of and can be compensated by calibrating the level at the sense amplifier input. The generation of multiple reference levels can be implemented by using resistor taps as shown in Fig. 7 . By choosing the optimal level through the configuration bit (SEL), the read margin does not degrade as much from the device mismatch and hence the chance of reading errors reduces. This concept is illustrated in Fig. 8 . The distributions of and for reading the whole memory array are modeled as Gaussian distributions with different mean and standard deviation. The band bounded by dashed lines around represents the zone where reading errors would occur, if falls within the band, due to device mismatch of the sense amplifier and random noises. The width of the error zone can be characterized by of the sense amplifier and the target noise margin (NM). Note that since is usually generated from sensing a separate reference array other than regular memory arrays, may vary independently from . Fig. 8(a) illustrates an example where the distribution of the whole array is shifted toward the right-hand side of . In this case, the worst-case read margin of the whole array might be tiny or even negative and reading errors are very likely to occur. However, this can be fixed by calibrating the level as shown in Fig. 8(b) . As increases, the read margin for reading P and AP states become more balanced such that the worst case read margin is significantly improved.
For analytical purpose, the reading error is modeled mathematically as follows. Suppose that , and , are the mean and standard deviation of and distributions, respectively, and and represent the distance from the boundary of the error zone to and , respectively. Then, the probability for a read error to occur when reading a memory array with N MTJ cells is the complement probability of all the to be distributed out of the error zone, as given by (6) Fig. 9 plots the read error probability color map as a function of and . As and increase, which implies less device variability, the error probability goes down exponentially. For practical design, the sum of and is usually limited by the sensing margin and is assumed to be fixed. In this case, the minimum is achieved when (7) as indicated in Fig. 9 . Therefore, the primary goal of reference calibration is to provide the best available read margin for the worst-case reading by choosing the optimal level that satisfies (7). Ideally, this can be done with continuous tuning of the reference levels. In practice, there is a tradeoff between increasing the number of configuration bits (granularity of ) and chip area.
The detailed calibration algorithm is shown in Fig. 10 . The algorithm begins by setting to the middle node of the resistor taps, . The lower and upper bounds of the preferred is determined as the boundary between a successful and a failed read-after-write operation on data pattern "0" and "1," respectively. Therefore, the read and write of both data patterns are required to exercise all the cells in an array. In the algorithm, is first searched with data pattern "0." Since the first SEL value may lead to either a successful or a failed read, the searching of may follow different directions according to the first result. Differently, as long as is determined, we only need to search for in the other direction. Theoretically, all the levels within the bounds are error free. To maximize read margin, the optimum is selected by . A block diagram that illustrates the STT-RAM architecture with the reference calibration technique applied is shown in Fig. 11 . The calibrated control bits can be stored in an off-chip EEPROM which automatically loads control datainto the local registers for selecting the optimal during reset or initialization. Therefore, the calibration process only needs to be performed by once after the fabrication. In the case of the device having time-varying characteristics, the calibration control bits may also be updated by software periodically.
IV. SIMULATION RESULTS
A. Simulation Setup
In order to analyze the effectiveness of the reference calibration technique and determine the number of configuration bits for effective calibration, we simulate the reading of a 512-cell memory array using HSPICE Monte Carlo (MC) simulations with a 65-nm CMOS model. Both the across-chip variations and the chip-to-chip variations are enabled in the simulations. At each process corner, MC runs are conducted for statistics parameter extraction purposes [22] . The MTJ model and its variation parameters used in the simulations are summarized in Table I . The MTJ variation is modeled by the standard deviation of resistance-area (RA) ratio and TMR extracted from measurements [20] . A total of the MTJ variation is considered. Fig. 12 shows the read margin statistics of a 512-cell memory array extracted from MC runs in the nominal case. For Fig. 12(b)-(d) , the optimal in the simulation is determined using the algorithm shown in Fig. 10 . The simulation results show that a simple 2-bit reference calibration can effectively improve the worst case read margin by 3 times. With one extra calibration bit, another 30% improvement can be achieved. The improvement margin nearly saturates at 4 calibration bits. Clearly, the amount of improvement by reference calibration becomes much less significant when the calibration resolution exceeds 2 bits. This is because the reference calibration actually re-distributes the sensing margin around rather than enlarging it. Fig. 12 indicates that a 2-bit reference calibration is sufficient for the nominal case. Fig. 13 presents the yield of a 512-cell memory array calculated from (6) using parameters extracted from the MC simulations in different process corners. One sigma of the of the sense amplifier is assumed to be 11 mV. Note that the level that maximizes the read margin in the nominal case is chosen as the nominal operating point. However, in the worstcase corner (FS corner) the array yield is around 70% without any reference calibration. This indicates only tuning the operation point of is not sufficient for compensating the variations in the worst-case corner. If a 1-Mb memory is built using multiple such arrays, the overall yield would be lower than 1% in the FS corner. However, with a 2-bit reference calibration, the yield of a 1-Mb memory can be improved to over 99.7% across all corners. This result indicates that reference calibration can bevery effective in compensating within-die variations.
B. Read Margin and Yield Improvements
In addition, reference calibration can also be used to compensate for the device mismatch of sense amplifiers by shifting against the . Fig. 14 illustrates the yield improvement by reference calibration of a 512-cell memory array as a function of the variability of the sense amplifier. In the nominal case, the yield drops rapidly as the standard deviation of increases beyond 15 mV without reference calibration. With reference calibration, the yield stays nearly constant. This trend becomes more prominent in the FS corner. These results show that the reference calibration technique is able to relax the device matching requirements of the sense amplifier design without sacrificing the yield. Similar results have also been reported by a study on a self-reference scheme [23] . However, the self-reference scheme improves the reading robustness at the cost of lowering the sensing speed [23] , while our technique does not affect the sensing speed at all. 
C. Reference Sharing
For all the above-mentioned results, we assume each array of the memory has its own control bits for independent reference calibration. In order to minimize the area overhead of the calibration circuitry, we also study the feasibility of sharing a single calibrated across multiple arrays. Fig. 15 illustrates how the yield of a 1-Mb memory is affected by sharing the calibrated . As one would expect, sharing across more than 32 arrays results in a yield drop in the typical corner. In the FS corner, the yield drops rapidly as more than 1 array are sharing the same calibrated . Although increasing the calibration resolution helps to alleviate the drop rate, the yield loss is still significant. As the array size covered by the same calibrated increases, the chance of the tail of the sensing voltage distribution exceeding the calibrated also increases (Fig. 6) . As a result, the effectiveness of the reference calibration technique diminishes. This indicates that dedicated calibration circuitry has to be deployed for each memory array if the yield loss due to the worst-case corner is critical to the designers.
The area overhead of calibration circuitry may depend on the column mux ratio and the area utilization rate of the memory. In the case of a 2-bit calibration, the calibration circuitry introduced to each array only includes a short resistor ladder, a few transmission gates, and a few control bit registers. In addition, applying the reference calibration relaxes the device matching requirements of the sense amplifier (Fig. 14) , which allows for potential area saving from using smaller devices and may mitigate the area overhead. According to our estimation, the overall area overhead of reference calibration circuitry is limited to 10% of the peripheral circuitry. Such impact reduces as the column mux ratio and memory area utilization rate increases.
V. CONCLUSION
This paper presents a technique of using reference calibration as an enhancement to BVSC to enable fast and reliable reading of STT-RAM. The simulation results show that by applying a simple 2-bit reference calibration, the worst case read margin due to process variations can be improved by over 3 times, leading to a significant yield increase (up to 30%) across all corners. Moreover, the reference calibration technique improves the yield in the presence of device mismatch in the sense amplifier design. In practical design, where the yield loss due to the worst-case corner is critical to the designers, dedicated reference calibration circuitry should be employed to each memory array.
