A high reliability offset-tolerant sensing circuit is presented for deep submicron spin transfer torque magnetic tunnel junction (STT-MTJ) memory. This circuit, using a triple-stage sensing operation, is able to tolerate the increased process variations as technology scales down to the deep submicron nodes, thus improving significantly the sensing margin. Meanwhile, it clamps the bit-line voltage to a predefined small bias voltage to avoid any read disturbance during the sensing operations. By using the STMicroelectronics CMOS 40 nm design kit and a precise STT-MTJ compact model, Monte Carlo simulations have been carried out to evaluate its sensing performance.
Introduction: Spin transfer torque magnetic tunnel junction (STT-MTJ) memory has emerged as a promising candidate for the next generation high-density, low-power and scalable non-volatile random access memory technology [1] . The MTJ nanopillar is mainly composed of three layers: one oxide barrier sandwiched between two ferromagnetic (FM) layers as shown in Fig. 1a . It presents two resistance values (R P or R AP ) depending on the relative magnetisation orientations of the two FM layers. The resistance difference is characterised by the tunnel magneto-resistance ratio (TMR = (R AP − R P )/R P ). A typical STT-MTJ bit cell consists of a MTJ connected in series with an access transistor between a bit-line (BL) and a source-line [2] , as shown in Fig. 1b . Only a bidirectional spin polarised low current I write larger than a threshold value I C0 can switch the MTJ state, and a read current I read can be used to sense the MTJ state [1] . It is worth noting that I read should be sufficiently less than I C0 to avoid any read disturbance (RD) during the sensing operation. However, low I read leads to a small sensing margin (SM) accordingly due to the small TMR ratio and process variations. Therefore there exists a conflict, and it is of great importance to design the best trade-off between SM and RD for the sensing circuits. Conventional sensing circuits, such as the dynamic current-mode (DCM) sensing amplifier [3] , the self-reference sensing scheme [4] and the pre-charge sensing amplifier (PCSA) [5] , cannot overcome the conflict between SM and RD. Moreover, they suffer from low reliability owing to the reduced supply voltage and increased process variations, as technology scales down to the deep submicron nodes (e.g. 40 nm). In this Letter, we propose an offset-tolerant triple-stage sensing circuit to tolerate the process variations and overcome the conflict between SM and RD. Proposed sensing circuit: Fig. 2 shows the schematic of the proposed sensing circuit, which is mainly composed of two parts: a current conveyor and a comparator. One bit of information is stored in each data MTJ as R P or R AP . The reference cell is formed by paralleling two serially connected MTJs as shown in Fig. 2 , and its resistance R ref equals (R P + R AP )/2. The access transistors for both data and ref cells are with minimum feature size. The triple-stage sensing operation is detailed as follows. In the first stage, after a data cell is selected by enabling a word-line and a BL, a biasing voltage V bias is added to the operational amplifier A 0 of the current conveyor. The current flowing through the transistor N 0 is I cmtj = V bias /R cmtj ; here, R cmtj denotes the resistance of the selected data cell, which includes the resistance of both the MTJ R cmtj and the access transistor R cmos . Then, the voltage V o induced by the load PMOS P 0 (P 0 can also be replaced by a resistor 
Fig. 2 Proposed offset-tolerant triple-stage sensing circuit
Since we use the same sensing path to generate both I cmtj and I cref , we can eliminate the mismatch in the sensing circuit induced by the process variations, leading to offset-tolerant currents I cmtj and I cref . The difference ratio between I cmtj and I cref is expressed as follows:
The induced voltage V data and V ref can be further amplified by a suitable choice of P 0 or R load , thus increasing the SM. The process variation or mismatch in the comparator is very small compared to the amplified SM and can be further reduced by optimising the switched capacitors and the amplifier A 1 . Meanwhile, since we apply a small V bias ≤ 0.1 V at the current conveyor, the currents flowing through the data cell and ref cell are sufficiently small to avoid any RD. For example, assume I C0 ≃ 60 mA, R cmos ≃ 4.5 kV and R P ≃ 3.5 kV at 40 nm technology node, when the resistance × area (R × A) product is 5 Ω · μm 2 and the TMR ratio is 150%, respectively. Then the maximum current is I max ≃ 12.5 mA, which is far less than I C0 .
Monte Carlo simulation: As discussed above, two parameters, i.e. TMR ratio and V bias , have most important impacts on the sensing performance of the circuit. By using the STMicroelectronics CMOS 40 nm design kit [6] and a compact STT-MTJ model [7] , Monte Carlo simulations have been performed. Here, we consider 3σ and 1% variations, respectively, for the CMOS transistors and STT-MTJs. Figs. 3a and b show the average SM and read current of the circuit with respect to TMR ratio and V bias , respectively. The maximum read current is I read ≈ 10.81 mA and it leads to zero RD during the sensing operation. The total sensing time per bit is ∼4.3 ns, including two 2.0 ns pulses for the data and ref cell sensing at the first two stages and ∼0.3 ns for the third-stage amplifying. The total power consumption per bit sensing is ∼40.0 fJ, composed of ∼35.0 fJ for the first two stages and ∼5.0 fJ for the third stage. Fig. 4 shows the sensing error rate of the proposed circuit compared to the DCM sensing amplifier [3] and the PCSA [5] . 
