A reference column is employed to improve the read performance of phase change memory (PCM). In this way, a changeable reference current replaces the constant one; both the reference cell and the selected cell have the same bit line (BL) parasitic parameters and read transmission gate parasitic parameters in the read operation. Simulated in a 40 nm CMOS process, read access time of 4-Mb PCM is 30.65 ns with 190.9 ns improvement. Monte Carlo simulations show a 80.5 ns worst read access time compared to the conventional 1.58 µs.
Introduction
Phase change memory (PCM) has been attracting great interest as one of the most promising candidates for the next-generation nonvolatile memory devices [1] . The storage mechanism of PCM is based on reversible phase change transitions between a high resistance amorphous state (reset state for data "1") and a low resistance crystalline state (set state for data "0") of a chalcogenide alloy (Ge 2 Sb 2 Te 5 or GST) [2, 3] . As a continuous growth in scalability, parasitic parameters in the array have been a major obstacle in reducing the read access time [4] . This limit of parasitic parameters has been discussed a lot in various kinds of memories [5] , while that is not specific in PCM. In this paper, we will discuss a method for improving the read performance, which has been applied to the development of 4-Mb standalone PCM with 40 nm CMOS technology.
2 Conventional read circuit analysis Fig. 1(a) shows schematic of the read circuit of PCM which is a fully differential sense amplifier (SA). The red zone is the selected 1T1R cell, which comprises a GST element R GST and a selection transistor NM0. The 1T1R cell is selected by a word line (WL) and a bit line (BL) from numbers of cells in the array. The read current I read and the reference current I ref are generated by V clamp and a reference circuit, respectively. A ∼100 mV V RBL is low enough to avoid the additional set operation. In the SA, PM2 and NM4 copy I read , PM3 and NM3 copy I ref .
When reading a set cell, I read > I ref . So PM2 copies a larger current than NM3 does, V 1 will rise to near VDD; also NM4 copies a larger current than PM3 does, V 2 will drop to near 0 V; DO of the SR latch will output VDD or "1".
When we consider the read operation, the effect of parasitic parameters in the array can't be ignored. Parasitic parameters can be divided into two categories: bit line parasitic parameters and read transmission gate parasitic parameters. As shown in Fig. 1(b) , bit line parasitic parameters include parasitic capacitors of the selection transistors. Read transmission gate parasitic parameters include parasitic capacitors and resistors of read transmission gates. There are n 1T1R cells in one Local Bit Line (LBL) and m LBLs share one SA. Bit line parasitic parameters are related to n; read transmission gate parasitic parameters are related to m. In the read operation, SA has to charge parasitic capacitors in the array at first.
In this work, the GST resistance is modeled base on the conventional GST element used in a 40 nm PCM device [6] . The set resistance distribution is from a few KΩ to 100 KΩ and the reset resistance distribution is from 1 MΩ to dozens of MΩ.
The 64-Kb unit block of 4-Mb PCM contains 1024 X-selectors, 64 Y-selectors and one SA. Large numbers of X-selectors and Y-selectors result in considerable parasitic capacitors and resistors on BL. More parasitic elements result in a higher read current. If a constant reference current is still being used, the read access time will be longer. Fig. 2 (a) is the simulation results of conventional read access time. R GST are 30 KΩ (set) and 2 MΩ (reset), respectively. The read access time is 221.6 ns, which is determined by reset speed. Fig. 2 parameters. LBL 0 is connected to RBL1 through RTG 0 . RBL0 and RBL1 are connected to SAs. During the read operation, with the help of the selected EN, WL and RTGs, two cells are connected to the SA: a reference cell through RBL1, a cell to be measured through RBL0. In the reference side, there are one selected cell, n À 1 unselected cells, one selected RTG and m À 1 unselected RTGs; in the array, there are also one selected cell, n À 1 unselected cells, one selected RTG and m À 1 unselected RTGs. Different from the conventional methods which use a constant reference current [6, 7] , the reference cell and the selected cell have the same parasitic elements in the proposed method. Fig. 4(a) shows schematic of the proposed SA. Compared to the aforementioned method, NM6 and a new reference current I refnew are introduced. NM6 helps V clamp generate I refnew from the reference cell. Then I refnew is compared with I read . Fig. 4(b) shows schematic of the V clamp generator. V bg is the bandgap voltage. The proposed method uses one voltage (V clamp ) to generate I refnew and I read , which is different from previous methods which use two different voltages/circuits to generate I read and I ref [6, 7, 8] . In the proposed method, I refnew and I read are more likely to have a similar variation trend. The design is symmetric in terms of parasitic parameters and SA topology. A short current signal instead of a long current signal routing from the SA to reference cells is used [8] .
In many non-volatile memories, the reference current is generated by averaging the read-out current of paired high resistance state (HRS) and low resistance state (LRS) written reference cells [8, 9] . But large deviations of HRS and LRS will cause reference-currents deviation. Therefore, the single-reference scheme has wider sense margin if properly designed. Compared to the previous method [8, 9, 10] , no reference cells need to be programmed in the initialization phase.
Performance evaluation
Simulated in SMIC 40 nm CMOS process, the proposed method is utilized in 4-Mb PCM to verify performance. The 4-Mb PCM is used to replace SRAM and EEPROM in applications like printer and Bluetooth speaker. In the array and the reference column, we have n ¼ 1024, m ¼ 64. Fig. 5(a) Fig. 5(b) . Fig. 6 shows the relationship between I read and I refnew versus time. In this new scheme, I refnew first rise to its maximum current, then begin to drop as the natural exponential curve, finally reach a stable value. I refnew curve has been in the middle of set state I read and reset state I read early, since the reference cell and the selected cell have the same parasitic parameters. When reading a set selected cell: as I refnew is lower than I read , DO will output a correct result. When reading a reset selected cell: as I refnew is higher than I read , DO will output a correct result. When I refnew and I read are the same, V 1 and V 2 are the same, and SR latch output "0". This is why in most cases DO can always output the correct reset result. The read current is as small as 1.8 µA, which is the smallest read current of PCM in the world, to our best knowledge. According to the simulation results below, the disable of EN is set above 5.11 µs to ensure all the bits are correctly read. On the contrary, conventional disable of EN is set at 6.6 µs.
Monte Carlo simulation results of the read access time are shown in Fig. 7 . It can be used to compare the worst case of the conventional and proposed method. Monte Carlo simulations are performed using the industry compatible SMIC 40-nm model parameters. Accomplished in process & mismatch analysis, three times standard deviation (3) is used as the variances of parameters and mismatch of MOS and resistor. The GST device is modeled as the resistance distribution in order to reflect the effect of process variation [6] . In each simulation, 4000 trials were run. Variations of V clamp , parasitic capacitance of the GST element and parasitic elements of the bit line metal are also considered. Circuits in Fig. 3, Fig. 4(a) and Fig. 4(b) are all included in Monte Carlo simulation. The worst read access time of the proposed method (determined by set speed) is 80.5 ns. On the contrary, the worst read access time of the conventional method (determined by reset speed) is 1.58 µs. It is clearly that the worst read access time is largely reduced. Timing margin between the regular case and the worst case is reduced from 1.51 µs to 80 ns. Fig. 8 shows the comparisons of the read access time with the above two methods under different process corners and the worst R GST , respectively. The worst R GST is 100 KΩ (set) and 1 MΩ (reset), respectively. Fig. 9 is the simulation results of the read operation under both the process corner SS and the worst R GST . When SA begins to work, V 1 and V 2 are charged and begin to rise; the read current and the reference current begin to charge parasitic elements. As I read and I refnew are almost the same in this moment, V 1 and V 2 are the same. When V 1 and V 2 are the same, SR latch outputs "0". When reading a set cell, after distinguishing point of the proposed method, V 1 is nearly 2.5 V; V 2 is nearly 0 V. SR latch outputs "1". When reading a reset cell, after charging process, V 1 is nearly 0 V; V 2 is nearly 2.5 V. SR latch outputs "0". The worst variation of process corner and R GST won't influence a correct readout result. Fig. 10 and Fig. 11 are the simulation results of the read operation under both the worst variation of V TH and R GST . Increase or decrease of V TH of the transistors in the SA are marked by arrows in Fig. 10 (a) and Fig. 11(a) . Variation of V TH is about 70 mV, which is the worst variation of V TH in SMIC 40 nm CMOS process. In Fig. 10 , the selected cell is a set cell. R GST is 100 KΩ. The variations of V TH make I read decreases and I refnew increases. This results in a long pseudo reading zone in Fig. 10(b) . In Fig. 11 , the selected cell is a reset cell. R GST is 1 MΩ. The variations of V TH make I read increases and I refnew decreases. The worst variation of V TH and R GST won't influence a correct readout result.
The comparison between the proposed read method and the prior read methods is shown in Table I. In the read circuit of memory, the read access time will decrease as bits per SA (the number of cells sharing a SA) decrease or the read current increases. In Table I , we try to compare their performance in a same standard.
Conclusions
Parasitic elements on BL have always been a big limit for the read performance in various kinds of memories. This paper presents a novel read method for 4-Mb PCM in 40 nm CMOS process. The proposed method uses a reference column to generate a reference current. The reference column is composed of one reference cell and a certain number of 1T1R cells. The reference cell and the selected array cell have the same parasitic parameters in the read operation. Simulation results show a 30.65 ns read access time compared to the previous 221.6 ns. Monte Carlo simulation shows a 80.5 ns worst read access time compared to the conventional 1.58 µs. This method can be easily applied in the practical products and it is also useful in MRAM, RRAM and Flash. 
